Luca's pills for computer vision and machine learning experiments on linux: The 3 things you absolutely need to know about regexp

Monday, March 11, 2013

The 3 things you absolutely need to know about regexp

1. By default all qualifiers are greedy which means that they match as more text as they can! However if you want to match several instances of the same pattern add the ? identifier

re.findall('.*?,page)

2. By default newlines are not matched in std regexp so you can:

either remove the newlines with

re.findall('\begin{itemize}.*?\end{itemize}', page.replace('\n', ''))

or

re.findall('\begin{itemize}.*?\end{itemize}', page, re.DOTALL)

3. To get a number (float or integer) you can use again ? but this time to make a character optional:

re.findall('r'\d+\.?\d+',page)

Luca's pills for computer vision and machine learning experiments on linux

Monday, March 11, 2013

The 3 things you absolutely need to know about regexp

No comments:

Post a Comment