Python Fundamentals Tutorial: Advanced Types: Containers

5. Advanced Types: Containers

One of the great advantages of Python as a programming language is the ease with which it allows you to manipulate containers. Containers (or collections) are an integral part of the language and, as you’ll see, built in to the core of the language’s syntax. As a result, thinking in a Pythonic manner means thinking about containers.

5.1. Lists

The first container type that we will look at is the list. A list represents an ordered, mutable collection of objects. You can mix and match any type of object in a list, add to it and remove from it at will.

Creating Empty Lists. To create an empty list, you can use empty square brackets or use the list() function with no arguments.

>>> l = []
>>> l
[]
>>> l = list()
>>> l
[]

Initializing Lists. You can initialize a list with content of any sort using the same square bracket notation. The list() function also takes an iterable as a single argument and returns a shallow copy of that iterable as a new list. A list is one such iterable as we’ll see soon, and we’ll see others later.

>>> l = ['a', 'b', 'c']
>>> l
['a', 'b', 'c']
>>> l2 = list(l)
>>> l2
['a', 'b', 'c']

A Python string is also a sequence of characters and can be treated as an iterable over those characters. Combined with the list() function, a new list of the characters can easily be generated.

>>> list('abcdef')
['a', 'b', 'c', 'd', 'e', 'f']

Adding. You can append to a list very easily (add to the end) or insert at an arbitrary index.

>>> l = []
>>> l.append('b')
>>> l.append('c')
>>> l.insert(0, 'a')
>>> l
['a', 'b', 'c']
[Note]Note

While inserting at position 0 will work, the underlying structure of a list is not optimized for this behavior. If you need to do it a lot, use collections.deque, which is optimized for this behavior (at the expense of some pointer overhead) and has an appendleft() function.

Iterating. Iterating over a list is very simple. All iterables in Python allow access to elements using the for ... in statement. In this structure, each element in the iterable is sequentially assigned to the "loop variable" for a single pass of the loop, during which the enclosed block is executed.

>>> for letter in l:
...     print letter,
...
a b c
[Note]Note

The print statement adds a newline character when called. Using a trailing "," in the print statement prevents a newline character from being automatically appended.

Iterating with whileIt is also possible to use a while loop for this iteration. A while loop is most commonly used to perform an iteration of unknown length, either checking a condition on each entry or using a break statement to exit when a condition is met.

For the simplicity of the example, here we will use the list.pop() method to consume the list entries from the right.

>>> l = ['a', 'b', 'c']
>>> while len(l):
...     print l.pop(),
...
c b a

Iterating with an Index. In some instances, you will actually want to know the index of the item that you are accessing inside the for loop. You can handle this in a traditional form using the builtin len() and range() functions.

>>> len(l)
3
>>> range(3)
[0, 1, 2]

>>> for i in range(len(l)):
...     print i, l[i]
...
0 a
1 b
2 c

However, with a little more foundation, we will see a better way.

Access and Slicing. Accessing individual items in a list is very similar to accessing the elements of an array in many languages, often referred to as subscripting, or more accurately, using the subscript operator. One less common, but very useful addition, is the ability to use negative indexing, where alist[-1] returns the last element of alist. Note that 0 represents the first item in a list while -1 represents the last.

Slices are another extension of this subscripting syntax providing access to subsets of the list. The slice is marked with one or two colons (:) within the square bracket subscript.

In the single colon form, the first argument represents the starting index (inclusive) and the second argument represents the end index (exclusive). If the first is omitted (e.g. l[:2]), the start index is the beginning of the list. If the second argument is omitted (e.g. l[2:]) the end index is the last item in the list.

In the double colon form, the first two arguments are unchanged and the third represents stride. For example, l[::2] would take every second item from a list.

>>> l = list('abcdefgh')
>>> l
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

>>> l[3]
'd'
>>> l[-3]
'f'

>>> l[1:4]
['b', 'c', 'd']

>>> l[1:-1]
['b', 'c', 'd', 'e', 'f', 'g']

>>> l[1:-1:2]
['b', 'd', 'f']

Presence and Finding. Checking for the presence of an object in a list is performed with the in keyword. The index() method of a list returns the actual location of the object.

>>> chars = list('abcdef')
>>> chars
['a', 'b', 'c', 'd', 'e', 'f']
>>> 'g' in chars
False
>>> 'c' in chars
True
>>> chars.index('c')
2

5.2. Lab

  1. Create a new file class.py
  2. Use a list to store the first names of everyone in the class.
  3. Use a for loop to print Hello <name> to stdout for everyone.

5.3. Strings Revisited

Python’s string object not only acts as a sequence but has many useful methods for moving back and forth to other types of sequences. A very common use case in data processing is splitting strings into substrings based on a delimiter. This is done with the split() method, which returns a list of the components.

>>> s = 'abc.def'

>>> parts = s.split('.')
>>> parts
['abc', 'def']

The converse method to split() is join() which joins a list together separating each element by the contents of the string on which the method was called.

This method looks backwards to many people when they first see it (thinking that .join() should be a method of the list object). It is also important to realize that a string literal in Python is just another instance of a string object.

Using '/' from the following example, that string is a string object and the .join() method can be called on it. There is no point assigning it to a variable before using it, because it is of no value after that single call.

>>> new_string = '/'.join(parts)
>>> new_string
'abc/def'

Other standard sequence operations also apply.

>>> s = 'hello world'
>>> len(s)
11
>>> s[4]
'o'
>>> s[2:10:2]
'lowr'

A less sequence-oriented, but still quite common method for dealing with strings is trimming whitespace with the strip() method and its relatives: lstrip() and rstrip().

>>> s = '   abc   '
>>> s.strip()
'abc'

5.4. Tuples

A tuple is like an immutable list. It is slightly faster and smaller than a list, so it is useful. Tuples are also commonly used in core program features, so recognize them.

Creating. Similar to lists, empty tuples can be created with an empty pair of parentheses, or with the tuple() builtin function.

>>> t = ()
>>> t
()
>>> tuple()
()

Tuples can also be initialized with values (and generally are, since they are immutable). There is one important distinction to make due to the use of parentheses, which is that a 1-tuple (tuple with one item) requires a trailing comma to indicate that the desired result is a tuple. Otherwise, the interpreter will see the parentheses as nothing more than a grouping operation.

>>> t = ('Othello')
>>> t
'Othello'
>>> t = ('Othello',)
>>> t
('Othello',)

The behavior for a 2-tuple and beyond is nothing new. Note that the parentheses become optional at this point. The implication that any comma-separated list without parentheses becomes a tuple is both useful and important in Python programming.

>>> t = ('Othello', 'Iago')
>>> t
('Othello', 'Iago')
>>> t = 'Othello', 'Iago'
>>> t
('Othello', 'Iago')

Tuples can also be created by passing an iterable to the tuple() function.

>>> l = ['Othello', 'Iago']
>>> tuple(l)
('Othello', 'Iago')

Unpacking. A very common paradigm for accessing tuple content in Python is called unpacking. It can be used on lists as well, but since it requires knowledge of the size of the container, it is far more common with tuples.

By assigning a tuple to a list of variables that matches the count of items in the tuple, the variables are individually assigned ordered values from the tuple.

>>> t = ('Othello', 'Iago')
>>> hero, villain = t
>>> hero
'Othello'
>>> villain
'Iago'

An interesting and valuable side-effect of the natural use of tuples is the ability to elegantly swap variables.

>>> t = ('Othello', 'Iago')
>>> t
('Othello', 'Iago')

>>> hero, villain = t

>>> hero
'Othello'
>>> villain
'Iago'

>>> hero, villain = villain, hero

>>> hero
'Iago'
>>> villain
'Othello'

Accessing and Slicing. Tuples can be accessed and sliced in the same manner as lists. Note that tuple slices are tuples themselves.

>>> t[0]
'Othello'
>>> t = ('Othello', 'Iago', 'Desdemona')
>>> t[0::2]
('Othello', 'Desdemona')

Iterating. Tuples are iterable, in exactly the same manner as lists.

>>> t = ('Othello', 'Iago')
>>> for character in t:
...     print character
...
Othello
Iago

Since a tuple is iterable, a mutable copy is easily created using the list() builtin.

>>> t = ('Othello', 'Iago')
>>> list(t)
['Othello', 'Iago']

Indexed List Iteration Revisited. Now that you know how to unpack tuples, you can see a better way to iterate lists with an index. The builtin enumerate() function takes a single argument (an iterable) and returns an iterator of 2-tuples. Each 2-tuple contains an index and an item from the original iterable. These 2-tuples can be unpacked into separate loop variables as part of the for statement.

>>> l = ['a', 'b', 'c']
>>> for i, letter in enumerate(l):
...     print i, letter
...
0 a
1 b
2 c

5.5. Lab

  1. Update classmates.py to use a tuple of 3-tuples of first, last, role.
  2. Print to screen using a for loop and unpacking.

5.6. Dictionaries

A dictionary is an implementation of a key-value mapping that might go by the name "hashtable" or "associative array" in another language. Dictionaries are the building blocks of the Python language itself, so they are quite prevalent and also quite efficient.

[Warning]Warning

Dictionary order is undefined and implementation-specific. It can be different across interpreters, versions, architectures, and more. Even multiple executions in the same environment.

Creating. Following the analogy of the other container types, dictionaries are created using braces, i.e. {}. There is also a dict() builtin function that accepts an arbitrary set of keyword arguments.

[Note]Note

Unlike some similar languages, string keys in Python must always be quoted.

>>> characters = {'hero': 'Othello', 'villain': 'Iago', 'friend': 'Cassio'}
>>> characters
{'villain': 'Iago', 'hero': 'Othello', 'friend': 'Cassio'}

>>> characters = dict(hero='Othello', villain='Iago', friend='Cassio')
>>> characters
{'villain': 'Iago', 'hero': 'Othello', 'friend': 'Cassio'}

Accessing. Dictionary values can be accessed using the subscript operator except you use the key instead of an index as the subscript argument. The presence of keys can also be tested with the in keyword.

>>> if 'villain' in characters:
...     print characters['villain']
...
Iago

Adding. A new entry can be created where there is no existing key using the same subscripting notation and assignment.

>>> characters['beauty'] = 'Desdemona'

Modifying. Existing entries are modified in exactly the same manner.

>>> characters['villain'] = 'Roderigo'

>>> characters
{'villain': 'Roderigo', 'hero': 'Othello', 'beauty': 'Desdemona', 'friend': 'Cassio'}

Failed Lookups. If you use the subscript operator and the key is not found, a KeyError will be raised. If this behavior is not desired, using the get() method of the dictionary will return a supplied default value when the key is not found. If no default is provided, None is returned when the key is not found. The get() method does not alter the contents of the dictionary itself.

>>> characters['horse']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  KeyError: 'horse'
>>> characters.get('horse', 'Ed')
'Ed'
>>> characters
{'villain': 'Roderigo', 'hero': 'Othello', 'friend': 'Cassio', 'beauty': 'Desdemona'}
>>> characters.get('horse')
>>>

You can also set the value in the case that it wasn’t found using the setdefault() method.

>>> characters
{'villain': 'Roderigo', 'hero': 'Othello', 'friend': 'Cassio', 'beauty': 'Desdemona'}
>>> characters.setdefault('horse', 'Ed')
'Ed'
>>> characters
{'villain': 'Roderigo', 'horse': 'Ed', 'hero': 'Othello', 'friend': 'Cassio', 'beauty': 'Desdemona'}

Iterating. Because the dictionary has both keys and values, iterating has a few more options. A simple for … in statement will iterate over the keys, which is one method to access both.

>>> for role in characters:
...     print role, characters[role]
...
villain Roderigo
hero Othello
beauty Desdemona
friend Cassio

However, the items() method will return 2-tuples of key, value pairs, which can be unpacked in the for loop.

>>> characters.items()
[('villain', 'Roderigo'), ('hero', 'Othello'), ('friend', 'Cassio'), ('beauty', 'Desdemona')]
>>> for role, name in characters.items():
...     print role, name
...
villain Roderigo
hero Othello
friend Cassio
beauty Desdemona
[Note]Note

The .items() method returns a newly allocated list full of newly allocated tuples, that exists only for the duration of the iteration. Whenever possible, it is preferable to use the .iteritems() method, which returns a generator. This generator holds a reference to the original dictionary and produces individual 2-tuples on demand. The downside of the iterator is that it expects the state of the dictionary to remain consistent during iteration.

>>> characters.iteritems()
<dictionary-itemiterator object at 0x100473b50>
>>> for role, name in characters.iteritems():
...     print role, name
...
villain Roderigo
hero Othello
friend Cassio
beauty Desdemona

5.7. Lab

  1. One more time on classmates.py
  2. Insert the tuples into a dictionary using firstname as the key
  3. Ask for a firstname on the command-line and print the data
  4. If it’s not there, prompt for last name and role
  5. Add it
  6. Print the new list

5.8. Sets

A set is a mutable, unordered, unique collection of objects. It is designed to reflect the properties and behavior of a true mathematical set. A frozenset has the same properties as a set, except that it is immutable.

Creating. A new set is created using the set() builtin. This function without any parameters will return a new, empty set. It will also accept a single argument of an iterable, in which case it will return a new set containing one element for each unique element in the iterable.

>>> s = set()
>>> s
set([])
>>> s = set(['Beta', 'Gamma', 'Alpha', 'Delta', 'Gamma', 'Beta'])
>>> s
set(['Alpha', 'Beta', 'Gamma', 'Delta'])

Accessing. Sets are not designed for indexed access, so it is not possible to use subscript notation. Like a list, we can use the .pop() method to consume elements, but note that the order will be undefined.

>>> s
set(['Alpha', 'Beta', 'Gamma', 'Delta'])
>>> while len(s):
...     print s.pop(),
...
Alpha Beta Gamma Delta

Set Operations. Not surprisingly, the real value of sets shows itself in set operations. Sets use sensibly overloaded operators to calculate unions and intersections of sets. You can also call these methods by name.

>>> s1 = set(['Beta', 'Gamma', 'Alpha', 'Delta', 'Gamma', 'Beta'])
>>> s2 = set(['Beta', 'Alpha', 'Epsilon', 'Omega'])
>>> s1
set(['Alpha', 'Beta', 'Gamma', 'Delta'])
>>> s2
set(['Alpha', 'Beta', 'Omega', 'Epsilon'])

>>> s1.union(s2)
set(['Epsilon', 'Beta', 'Delta', 'Alpha', 'Omega', 'Gamma'])
>>> s1 | s2
set(['Epsilon', 'Beta', 'Delta', 'Alpha', 'Omega', 'Gamma'])

>>> s1.intersection(s2)
set(['Alpha', 'Beta'])
>>> s1 & s2
set(['Alpha', 'Beta'])

>>> s1.difference(s2)
set(['Gamma', 'Delta'])
>>> s1 - s2
set(['Gamma', 'Delta'])

>>> s1.symmetric_difference(s2)
set(['Epsilon', 'Delta', 'Omega', 'Gamma'])
>>> s1 ^ s2
set(['Epsilon', 'Delta', 'Omega', 'Gamma'])

5.9. Collection Transitions

As you saw previously, dict.items() returns a list of 2-tuples representing key-value pairs. Inversely, a list of 2-tuples can be passed to the dict() factory function to create a dictionary using the first item from each tuple as a key and the second item as the value. zip() takes n lists and returns one list of n-tuples.

>>> roles = characters.keys()
>>> roles
['villain', 'hero', 'beauty', 'friend']
>>> names = characters.values()
>>> names
['Roderigo', 'Othello', 'Desdemona', 'Cassio']

>>> tuples = zip(roles, names)
>>> tuples
[('villain', 'Roderigo'), ('hero', 'Othello'), ('beauty', 'Desdemona'), ('friend', 'Cassio')]

>>> new_characters = dict(tuples)
>>> new_characters
{'villain': 'Roderigo', 'hero': 'Othello', 'friend': 'Cassio', 'beauty': 'Desdemona'}