1. By default all qualifiers are greedy which means that they match as more text as they can! However if you want to match several instances of the same pattern add the ? identifier
re.findall('.*? ,page)
2. By default newlines are not matched in std regexp so you can:
either remove the newlines with
re.findall('\begin{itemize}.*?\end{itemize}', page.replace('\n', ''))
or
re.findall('\begin{itemize}.*?\end{itemize}', page, re.DOTALL)
3. To get a number (float or integer) you can use again ? but this time to make a character optional:
re.findall('r'\d+\.?\d+',page)
re.findall('
2. By default newlines are not matched in std regexp so you can:
either remove the newlines with
re.findall('\begin{itemize}.*?\end{itemize}', page.replace('\n', ''))
or
re.findall('\begin{itemize}.*?\end{itemize}', page, re.DOTALL)
3. To get a number (float or integer) you can use again ? but this time to make a character optional:
re.findall('r'\d+\.?\d+',page)
No comments:
Post a Comment