Python Fundamentals Tutorial: Advanced Iteration

14. Advanced Iteration

14.1. List Comprehensions

Python provides list comprehension syntax to simplify the task of generating new lists from existing lists (or from any other iterable data type).

In the earlier example of writing to a file, the \n character was stored with each color in the list so that it could be passed to the writelines() method of the file object. Storing the data in this way would make other processing challenging, so a more common model would be to create a list with line feeds based on a list without them.

Without list comprehensions, this operation would look something like the following:

>>> colors = ['red', 'yellow', 'blue']
>>> color_lines = []
>>> for color in colors:
...     color_lines.append('{0}\n'.format(color))
...
>>> color_lines
['red\n', 'yellow\n', 'blue\n']

Many functional languages provide either a standalone map() function or map() list method to perform this task. While rarely used in Python, this function is available.

>>> colors = ['red', 'yellow', 'blue']
>>> color_lines = map(lambda c: '{0}\n'.format(c), colors)
>>> color_lines
['red\n', 'yellow\n', 'blue\n']

List comprehensions perform this task equally well, but provide additional functionality. To accomplish the same task with a list comprehension, use the following syntax:

>>> colors = ['red', 'yellow', 'blue']
>>> color_lines = ['{0}\n'.format(color) for color in colors]
>>> color_lines
['red\n', 'yellow\n', 'blue\n']

A conditional filter can also be included in the creation of the new list, like so:

>>> colors = ['red', 'yellow', 'blue']
>>> color_lines = ['{0}\n'.format(color) for color in colors if 'l' in color]
>>> color_lines
['yellow\n', 'blue\n']

More than one list can be iterated over, as well, which will create a pass for each combination in the lists.

>>> colors = ['red', 'yellow', 'blue']
>>> clothes = ['hat', 'shirt', 'pants']

>>> colored_clothes = ['{0} {1}'.format(color, garment) for color in colors for garment in clothes]

>>> colored_clothes
['red hat', 'red shirt', 'red pants', 'yellow hat', 'yellow shirt', 'yellow pants', 'blue hat', 'blue shirt', 'blue pants']

14.2. Generator Expressions

Storing a new list as the output of a list comprehension is not always optimal behavior. Particularly in a case where that list is intermediary or where the total size of the contents is quite large.

For such cases, a slightly modified syntax (replacing square brackets with parentheses) leads to the creation of a generator instead of a new list. The generator will produce the individual items in the list as each one is requested, which is generally while iterating over that new list.

>>> colors = ['red', 'yellow', 'blue']
>>> color_lines = ('{0}\n'.format(color) for color in colors)
>>> color_lines
<generator object <genexpr> at 0x10041ac80>
>>> color_lines.next()
'red\n'
>>> color_lines.next()
'yellow\n'
>>> color_lines.next()
'blue\n'
>>> color_lines.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

14.3. Generator Functions

The type of object created by the generator expression in the previous section is unsurprisingly called a generator. This is a term for a type of iterator that generates values on demand.

While the generator expression notation is very compact, there may be cases where there is more logic to be performed than can be effectively expressed with this notation. For such cases, a generator function can be used.

A generator function uses the yield statement in place of return and usually does so inside a loop. When the interpreter sees this statement, it will actually return a generator object from the function. Each time the next() function is called on the generator object, the function will be executed up to the next yield. When the function completes, the interpreter will raise a StopIteration error to the caller.

>>> def one_color_per_line():
...     colors = ['red', 'yellow', 'blue']
...     for color in colors:
...         yield '{0}\n'.format(color)
...
>>> gen = one_color_per_line()
>>> gen
<generator object one_color_per_line at 0x10041acd0>
>>> gen.next()
'red\n'
>>> gen.next()
'yellow\n'
>>> gen.next()
'blue\n'
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Note that a second call to the same generator function will return a new generator object (shown at a different address) as each generator should be capable of maintaining its own state.

>>> gen = one_color_per_line()
>>> gen
<generator object one_color_per_line at 0x10041ad20>

Of course, the more typical use case would be to allow the calls to next() to be handled by a for ... in loop.

>>> for line in one_color_per_line():
...     print line,
...
red
yellow
blue

14.4. Iteration Helpers: itertools

Iteration is a big part of the flow of Python and aside from the builtin syntax, there are some handy tools in the itertools package to make things easier. They also tend to make things run faster.

14.4.1. chain()

The chain() method accepts an arbitrary number of iterable objects as arguments and returns an iterator that will iterate over each iterable in turn. Once the first is exhausted, it will move onto the next.

Without the chain() function, iterating over two lists would require creating a copy with the contents of both or adding the contents of one to the other.

>>> l1 = ['a', 'b', 'c']
>>> l2 = ['d', 'e', 'f']
>>> l1.extend(l2)
>>> l1
['a', 'b', 'c', 'd', 'e', 'f']

It’s much more efficient to use the chain() function which only allocates additional storage for some housekeeping data in the iterator itself.

>>> import itertools
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['d', 'e', 'f']

>>> chained = itertools.chain(l1, l2)
>>> chained
<itertools.chain object at 0x100431250>

>>> [l for l in chained]
['a', 'b', 'c', 'd', 'e', 'f']

14.4.2. izip()

izip() is almost identical to the zip() builtin, in that it pairs up the contents of two lists into an iterable of 2-tuples. However, where zip() allocates a new list, izip() only returns an iterator.

>>> name = ['Jimmy', 'Robert', 'John Paul', 'John']
>>> instruments = ['Guitar', 'Vocals', 'Bass', 'Drums']

>>> zepp = zip(name, instruments)
>>> zepp
[('Jimmy', 'Guitar'), ('Robert', 'Vocals'), ('John Paul', 'Bass'), ('John', 'Drums')]

>>> zepp = itertools.izip(name, instruments)
>>> zepp
<itertools.izip object at 0x100430998>

>>> [musician for musician in zepp]
[('Jimmy', 'Guitar'), ('Robert', 'Vocals'), ('John Paul', 'Bass'), ('John', 'Drums')]

14.5. Lab

  1. Convert your grep function from the previous lab to use a generator function
  2. Each call to .next() on the iterator should return the next matching line