Revision 3 as of 2005-01-14 11:30:02

Clear message

Why lists are not used as dictionary keys

The existence of lists and tuples and the impossibility to use lists as dictionary keys is a constant source of irritation for Python newcomers. To understand why tuples but not lists can be used as dictionary keys let us first take a closer look at dictionaries.

How dictionaries work

Unlike strings, lists and tuples a dictionary is not a sequence but a mapping. A dictionary d maps key values to data values:

data = d[key]

To accomplish this the dictionary has to search the key in a key set and to return the associated data value. If the dictionary would search the key set sequentially this could take some time because the search time is proportional to the size of the key set: O(n). Fortunately the dictionary does better: It computes the storage location from the key and jumps directly to this storage location. The search time is the same no matter how large the key set is: O(1).

This search method is called hashing and the function that computes the storage location is called a hash function. Different hashes must be a proof of inequality (otherwise equal items could be inserted into different storage locations) leading to the following strict condition for a hash function hf:

for all i1, i2: hf(i1) != hf(i2) => i1 != i2

With Python's functions hash() and cmp() this reads:

(*) for all i1, i2: hash(i1) != hash(i2) => cmp(i1, i2) != 0

Example of a hash function:

   1 def hashfunc(string):
   2     return ord(string[0])

This function maps strings to integers in the range [0:256] and could be used to find strings in an array of length 256. What happens if two strings start with the same character? They have to be stored at the same array location and we have a hash collision. If too many strings are stored in the array then hash collisions occur frequently and the strings stored at the same location have to be searched sequentially. Of course this is not desired and the hash function has to be carefully designed in order to avoid frequent hash collisions. This means that equal hashes should be be a good hint for equality. Thus a hash function should satisfy in most cases:

for all i1, i2: hash(i1) == hash(i2) => cmp(i1, i2) == 0

Fortunately this is the case for Python's dictionary hash function so that dictionary search is very fast.

Pseudo code of a dictionary lookup

To get a complete picture here is a sketch of a dictionary lookup (data = d[key] expressed as: data = lookup(d, key)):

   1 def lookup(d, key):
   2     '''dictionary lookup is done in three steps:
   3        1. A hash value of the key is computed using a hash function.
   4 
   5        2. The hash value addresses a location in d.data which is
   6           supposed to be an array of collision lists. A collision
   7           list contains (key,value) pairs and should have only one
   8           entry.
   9 
  10        3. The collision list addressed by the hash value is searched
  11           sequentially until a pair is found with pair[0] == key. The
  12           return value of the lookup is then pair[1]. Equality (==) is
  13           defined by the function cmp(). The return value for equality
  14           is 0, for inequality some other value.
  15     '''
  16     h = hash(key)                  # step 1
  17     cl = d.data[h]                 # step 2
  18     for pair in cl:                # step 3
  19         if cmp(key, pair[0]) == 0:
  20             return pair[1]
  21     else:
  22         raise KeyError, "Key %s not found." % key

Data types usable as dictionary keys

The lookup function above shows which data types can be used as dictionary keys:


A data type must be a valid input for hash() and cmp() to be usable as a dictionary key. hash() and cmp() must satisfy the condition (*) above for this data type.


Why tuples are OK and lists are not

Is a list a suitable candidate for a dictionary key? hash(list) = id(list) and cmp(list1, list2) = cmp(id(list1), id(list2)) would satisfy condition (*). But this definition completely ignores list contents, causing big surprises for programmers:

To avoid these surprises dictionary lookup would have to use list contents instead of list id. Consider the (hypothetical, not working) code:

>>> d = dict()
>>> li = [1,2,3]
>>> d[li] = '123'
>>> d[[1,2,3]]
'123'

Now assume li is changed (e.g. li[2] = 4). There are two options to handle this. Let the dictionary ignore changes

>>> d[li]
KeyError: [1,2,4] not found in dictionary. (even worse: a previously existing [1,2,4] map is returned).

or let the dictionary follow changes

>>> d[[1,2,3]]
MovedError: Please address future lookups to [1,2,4] :)

Both are pretty unsatisfactory. Conclusion:


Dictionary lookup with mutable types like lists is a source of unpleasant surprises for the programmer and therefore impossible in Python.


This is the point where tuples come in. They have nearly the same interface as lists but cannot be changed once they have been created thereby avoiding the problems discussed for lists. Thus tuples can be used as dictionary keys.

Instances of user defined classes

What about instances of user defined classes?

User defined classes can be anything the programmer likes but yet their instances are usable as dictionary keys because there is a default: hash(object) = id(object) and cmp(object1, object2) = cmp(id(object1), id(object2)). The same setting has been discussed above for lists and has been found unsatisfactory. What is different here?

  1. It is a default setting which can be adapted to each class by defining/overriding the methods __hash__ and __cmp__.

  2. For objects identity is more important than for lists.

Unable to edit the page? See the FrontPage for instructions.