While regular expression handling is very complete in Python, regular expressions are not a first-class language element as they are in Perl or JavaScript. Regular expression handling is found in the re
module.
At the simplest level, there are module-level functions in re
that can be used to search for regular expresssions. In many cases, calling the search()
function and checking for the presence of a return value is enough. search()
returns None
if the pattern was not found.
Note | |
---|---|
It is customary in Python regular expressions to pass the patterns as raw strings ( |
>>> text = 'All your base are belong to us.' >>> re.search(r'o\s?u', text) <_sre.SRE_Match object at 0x10041f718>
Take note of the match()
function, which specifically only matches the beginning of the text being matched and does not search throughout for the pattern.
>>> re.match(r'o\s?u', text) >>> re.match('All', text) <_sre.SRE_Match object at 0x10041f718>
Regular expressions can also be used to split strings in more advanced ways than the string.split()
method.
>>> re.split(r'o\s?u', text) ['All y', 'r base are belong t', 's.']
Using the findall()
or finditer()
methods, it is possible to process all the matching groups. findall()
returns a list while finditer()
returns an iterator.
>>> re.findall(r'o\s?u', text) ['ou', 'o u'] >>> re.finditer(r'o\s?u', text) <callable-iterator object at 0x100516610>
If you need to use the same pattern multiple times, you can improve performance by compiling the regex and then using the methods of the regex object, rather than the module-level functions.
If groups are defined in the pattern, they can be accessed using the group()
method of the returned Match object. Note that they are 1-indexed to conform to most other regex utilities.
>>> text = 'All your base are belong to us.' >>> pattern = re.compile(r'you[r]?\s*(\S*)\s*are belong to us') >>> match = pattern.search(text) >>> match.group(1) 'base'
- Rename complexity-1-fileutil.py to fileutil.py
- Implement fileutil.py to pass the doctests
- Create a second file grep.py that accepts command line arguments and calls the function in grep.py
Accept the following command-line args:
- -v, --invert-match select non-matching lines
- -E, --extended-regexp PATTERN is an extended regular expression
- And a list of files