Python Fundamentals Tutorial: Regular Expressions

12. Regular Expressions (re)

While regular expression handling is very complete in Python, regular expressions are not a first-class language element as they are in Perl or JavaScript. Regular expression handling is found in the re module.

At the simplest level, there are module-level functions in re that can be used to search for regular expresssions. In many cases, calling the search() function and checking for the presence of a return value is enough. search() returns None if the pattern was not found.

[Note]Note

It is customary in Python regular expressions to pass the patterns as raw strings (r'pattern') to avoid escaping the special characters that are likely included in the pattern.

>>> text = 'All your base are belong to us.'

>>> re.search(r'o\s?u', text)
<_sre.SRE_Match object at 0x10041f718>

Take note of the match() function, which specifically only matches the beginning of the text being matched and does not search throughout for the pattern.

>>> re.match(r'o\s?u', text)
>>> re.match('All', text)
<_sre.SRE_Match object at 0x10041f718>

Regular expressions can also be used to split strings in more advanced ways than the string.split() method.

>>> re.split(r'o\s?u', text)
['All y', 'r base are belong t', 's.']

Using the findall() or finditer() methods, it is possible to process all the matching groups. findall() returns a list while finditer() returns an iterator.

>>> re.findall(r'o\s?u', text)
['ou', 'o u']

>>> re.finditer(r'o\s?u', text)
<callable-iterator object at 0x100516610>

If you need to use the same pattern multiple times, you can improve performance by compiling the regex and then using the methods of the regex object, rather than the module-level functions.

If groups are defined in the pattern, they can be accessed using the group() method of the returned Match object. Note that they are 1-indexed to conform to most other regex utilities.

>>> text = 'All your base are belong to us.'

>>> pattern = re.compile(r'you[r]?\s*(\S*)\s*are belong to us')

>>> match = pattern.search(text)

>>> match.group(1)
'base'

12.1. Lab

  1. Rename complexity-1-fileutil.py to fileutil.py
  2. Implement fileutil.py to pass the doctests
  3. Create a second file grep.py that accepts command line arguments and calls the function in grep.py
  4. Accept the following command-line args:

    1. -v, --invert-match select non-matching lines
    2. -E, --extended-regexp PATTERN is an extended regular expression
    3. And a list of files