Differences between revisions 29 and 40 (spanning 11 versions)
Revision 29 as of 2010-10-27 13:13:19
Size: 4686
Comment: Add worst-case complexity for set.intersection.
Revision 40 as of 2023-01-19 22:35:03
Size: 5621
Editor: AndrewBadr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This page documents the time-complexity (aka "Big O" or "Big Oh") of various operations in current CPython.  Other Python implementations (or older or still-under development versions of CPython) may have slightly different performance characteristics.  However, it is generally safe to assume that they are not slower by more than a factor of O(log n). This page documents the time-complexity (aka "Big O" or "Big Oh") of various operations in current CPython. Other Python implementations (or older or still-under development versions of CPython) may have slightly different performance characteristics. However, it is generally safe to assume that they are not slower by more than a factor of O(log n).
Line 3: Line 3:

Generally, 'n' is the number of elements currently in the container.  'k' is either the value of a parameter or the number of elements in the parameter.
Generally, 'n' is the number of elements currently in the container. 'k' is either the value of a parameter or the number of elements in the parameter.
Line 9: Line 8:
Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move).  If you need to add/remove at both ends, consider using a collections.deque instead.
||<tablewidth="" tablestyle="">'''Operation''' ||'''Average Case''' ||'''[[http://en.wikipedia.org/wiki/Amortized_analysis|Amortized Worst Case]]''' ||
Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a collections.deque instead.
||<tablewidth="">'''Operation''' ||'''Average Case''' ||'''[[http://en.wikipedia.org/wiki/Amortized_analysis|Amortized Worst Case]]''' ||
Line 13: Line 12:
||Pop last ||O(1) ||O(1) ||
||Pop intermediate[2] ||O(n) ||O(n) ||
Line 24: Line 25:
||x in s||O(n)||||
||min(s), max(s)||O(n)||||
||x in s ||O(n) || ||
||min(s), max(s) ||O(n) || ||
||Get Length ||O(1) ||O(1) ||

Line 29: Line 33:
A deque (double-ended queue) is represented internally as a doubly linked list.  (Well, a list of arrays rather than objects, for greater efficiency.)  Both ends are accessible, but even looking at the middle is slow, and adding to or removing from the middle is slower still.
||<tablewidth="" tablestyle="">'''Operation''' ||'''Average Case''' ||'''Amortized Worst Case''' ||
A deque (double-ended queue) is represented internally as a doubly linked list. (Well, a list of arrays rather than objects, for greater efficiency.) Both ends are accessible, but even looking at the middle is slow, and adding to or removing from the middle is slower still.
||<tablewidth="">'''Operation''' ||'''Average Case''' ||'''Amortized Worst Case''' ||
Line 40: Line 44:
||Get Length ||O(1) ||O(1) ||


Line 44: Line 52:
||'''Operation'''||'''Average case'''||'''Worst Case'''||
||x in s||O(1)||O(n)||
||Union s|t||[[http://wiki.python.org/moin/TimeComplexity_(SetCode)|O(len(s)+len(t))]]||||
||Intersection s&t||O(min(len(s), len(t)) ||O(len(s) * len(t))||
||Difference s-t||O(len(s))||||
||s.difference_update(t)||O(len(t))||||
||Symmetric Difference s^t||?||||
 * As seen in the [[http://svn.python.org/projects/python/trunk/Objects/setobject.c|source code]] the complexities for set difference s-t or s.difference(t) ({{{set_difference()}}}) and in-place set difference s.difference_update(t) ({{{set_difference_update_internal()}}}) are different! The first one is O(len(s)) (for every element in s add it to the new set, if not in t). The second one is O(len(t)) (for every element in t remove it from s). So care must be taken as to which is preferred, depending on which one is the longest set and whether a new set is needed.
 * To perform set operations like s-t, both s and t need to be sets. However you can do the method equivalents even if t is any iterable, for example s.difference(l), where l is a list.
||'''Operation''' ||'''Average case''' ||'''Worst Case''' ||''' notes ''' ||
||x in s ||O(1) ||O(n) || ||
||Union s|t ||[[TimeComplexity_(SetCode)|O(len(s)+len(t))]] || || ||
||Intersection s&t ||O(min(len(s), len(t))) ||O(len(s) * len(t)) ||replace "min" with "max" if t is not a set ||
||Multiple intersection s1&s2&..&sn || ||(n-1)*O(l) where l is max(len(s1),..,len(sn)) || ||
||Difference s-t ||O(len(s)) || || ||
||s.difference_update(t) ||O(len(t)) || || ||
||Symmetric Difference s^t ||O(len(s)) ||O(len(s) * len(t)) || ||
||s.symmetric_difference_update(t) ||O(len(t)) ||O(len(t) * len(s)) || ||
Line 55: Line 63:


 * As seen in the [[https://github.com/python/cpython/blob/master/Objects/setobject.c|source code]] the complexities for set difference s-t or s.difference(t) ({{{set_difference()}}}) and in-place set difference s.difference_update(t) ({{{set_difference_update_internal()}}}) are different! The first one is O(len(s)) (for every element in s add it to the new set, if not in t). The second one is O(len(t)) (for every element in t remove it from s). So care must be taken as to which is preferred, depending on which one is the longest set and whether a new set is needed.
 * To perform set operations like s-t, both s and t need to be sets. However you can do the method equivalents even if t is any iterable, for example s.difference(l), where l is a list.
Line 56: Line 69:
The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon.  The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys. The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon. The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys.
Line 59: Line 72:
||<tablewidth="" tablestyle="">'''Operation''' ||'''Average Case''' ||'''Amortized Worst Case''' ||
||Copy[2] ||O(n) ||O(n) ||
||<tablewidth="">'''Operation''' ||'''Average Case''' ||'''Amortized Worst Case''' ||
||k in d ||O(1) ||O(n) ||
||
Copy[3] ||O(n) ||O(n) ||
Line 64: Line 78:
||Iteration[2] ||O(n) ||O(n) || ||Iteration[3] ||O(n) ||O(n) ||

Line 68: Line 84:
[1] = These operations rely on the "Amortized" part of "Amortized Worst Case".  Individual actions may take surprisingly long, depending on the history of the container. [1] = These operations rely on the "Amortized" part of "Amortized Worst Case". Individual actions may take surprisingly long, depending on the history of the container.
Line 70: Line 86:
[2] = For these operations, the worst case ''n'' is the maximum size the container ever achieved, rather than just the current size.  For example, if N objects are added to a dictionary, then N-1 are deleted, the dictionary will still be sized for N objects (at least) until another insertion is made. [2] = Popping the intermediate element at index '''k''' from a list of size '''n''' shifts all elements ''after'' '''k''' by one slot to the left using memmove. '''n - k''' elements have to be moved, so the operation is '''O(n - k)'''. The best case is popping the second to last element, which necessitates one move, the worst case is popping the first element, which involves '''n - 1''' moves. The average case for an average value of '''k''' is popping the element the middle of the list, which takes '''O(n/2) = O(n)''' operations.

[3] =
For these operations, the worst case ''n'' is the maximum size the container ever achieved, rather than just the current size. For example, if N objects are added to a dictionary, then N-1 are deleted, the dictionary will still be sized for N objects (at least) until another insertion is made.

This page documents the time-complexity (aka "Big O" or "Big Oh") of various operations in current CPython. Other Python implementations (or older or still-under development versions of CPython) may have slightly different performance characteristics. However, it is generally safe to assume that they are not slower by more than a factor of O(log n).

Generally, 'n' is the number of elements currently in the container. 'k' is either the value of a parameter or the number of elements in the parameter.

list

The Average Case assumes parameters generated uniformly at random.

Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a collections.deque instead.

Operation

Average Case

Amortized Worst Case

Copy

O(n)

O(n)

Append[1]

O(1)

O(1)

Pop last

O(1)

O(1)

Pop intermediate[2]

O(n)

O(n)

Insert

O(n)

O(n)

Get Item

O(1)

O(1)

Set Item

O(1)

O(1)

Delete Item

O(n)

O(n)

Iteration

O(n)

O(n)

Get Slice

O(k)

O(k)

Del Slice

O(n)

O(n)

Set Slice

O(k+n)

O(k+n)

Extend[1]

O(k)

O(k)

Sort

O(n log n)

O(n log n)

Multiply

O(nk)

O(nk)

x in s

O(n)

min(s), max(s)

O(n)

Get Length

O(1)

O(1)

collections.deque

A deque (double-ended queue) is represented internally as a doubly linked list. (Well, a list of arrays rather than objects, for greater efficiency.) Both ends are accessible, but even looking at the middle is slow, and adding to or removing from the middle is slower still.

Operation

Average Case

Amortized Worst Case

Copy

O(n)

O(n)

append

O(1)

O(1)

appendleft

O(1)

O(1)

pop

O(1)

O(1)

popleft

O(1)

O(1)

extend

O(k)

O(k)

extendleft

O(k)

O(k)

rotate

O(k)

O(k)

remove

O(n)

O(n)

Get Length

O(1)

O(1)

set

See dict -- the implementation is intentionally very similar.

Operation

Average case

Worst Case

notes

x in s

O(1)

O(n)

Union s|t

O(len(s)+len(t))

Intersection s&t

O(min(len(s), len(t)))

O(len(s) * len(t))

replace "min" with "max" if t is not a set

Multiple intersection s1&s2&..&sn

(n-1)*O(l) where l is max(len(s1),..,len(sn))

Difference s-t

O(len(s))

s.difference_update(t)

O(len(t))

Symmetric Difference s^t

O(len(s))

O(len(s) * len(t))

s.symmetric_difference_update(t)

O(len(t))

O(len(t) * len(s))

  • As seen in the source code the complexities for set difference s-t or s.difference(t) (set_difference()) and in-place set difference s.difference_update(t) (set_difference_update_internal()) are different! The first one is O(len(s)) (for every element in s add it to the new set, if not in t). The second one is O(len(t)) (for every element in t remove it from s). So care must be taken as to which is preferred, depending on which one is the longest set and whether a new set is needed.

  • To perform set operations like s-t, both s and t need to be sets. However you can do the method equivalents even if t is any iterable, for example s.difference(l), where l is a list.

dict

The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon. The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys.

Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn't affect the algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical program finishes.

Operation

Average Case

Amortized Worst Case

k in d

O(1)

O(n)

Copy[3]

O(n)

O(n)

Get Item

O(1)

O(n)

Set Item[1]

O(1)

O(n)

Delete Item

O(1)

O(n)

Iteration[3]

O(n)

O(n)

Notes

[1] = These operations rely on the "Amortized" part of "Amortized Worst Case". Individual actions may take surprisingly long, depending on the history of the container.

[2] = Popping the intermediate element at index k from a list of size n shifts all elements after k by one slot to the left using memmove. n - k elements have to be moved, so the operation is O(n - k). The best case is popping the second to last element, which necessitates one move, the worst case is popping the first element, which involves n - 1 moves. The average case for an average value of k is popping the element the middle of the list, which takes O(n/2) = O(n) operations.

[3] = For these operations, the worst case n is the maximum size the container ever achieved, rather than just the current size. For example, if N objects are added to a dictionary, then N-1 are deleted, the dictionary will still be sized for N objects (at least) until another insertion is made.

TimeComplexity (last edited 2023-01-19 22:35:03 by AndrewBadr)

Unable to edit the page? See the FrontPage for instructions.