Differences between revisions 1 and 5 (spanning 4 versions)
Revision 1 as of 2004-11-11 18:09:07
Size: 1059
Editor: 186
Comment: simple examples of generators
Revision 5 as of 2008-07-19 02:44:25
Size: 6344
Editor: pool-71-187-81-186
Comment: Discussion on performance improvement, including caveat.
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
= simple examples = = Improved Performance =

The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower memory usage.

This can be illustrated by comparing the `range` and `xrange` built-ins of Python 2.x.

Both `range` and `xrange` represent a range of numbers, and have the same function signature, but `range` returns a `list` while `xrange` returns a generator.

Say, we had to compute the sum of the first ''n'', say 1,000,000, non-negative numbers.

{{{
#!python
# Note: Python 2.x only
# using a non-generator
sum_of_first_n = sum(range(1000000))

# using a generator
sum_of_first_n = sum(xrange(1000000))
}}}

Note that both lines are identical in form, but the one using `range` is much more expensive.

When we use `range` we build a 1,000,000 element list in memory and then find its sum. This is a waste, considering that we use these 1,000,000 elements just to compute the sum.

This waste becomes more pronounced as the number of elements (our ''n'') becomes larger, the size of our elements become larger, or both.

On the other hand, when we use `xrange`, we do not incur the cost of building a 1,000,000 element list in memory. The generator created by xrange will generate each number, which `sum` will consume to accumulate the sum.

In the case of the "range" function, using it as an iterable is the dominant use-case, and this is reflected in Python 3.x, which makes the `range` built-in return a generator instead of a list.

It should however we noted that a generator will provide performance benefits only if do not intend to use that set of generated values more than once.

Consider the following example:

{{{
#!python
# Note: Python 2.x only
s = sum(xrange(1000000))
p = product(xrange(1000000))
}}}

Imagine that making a integer is a very expensive process. In the above code, we just performed the same expensive process twice. In cases like this, building a list in memory might be worth it (see example below):

{{{
#!python
# Note: Python 2.x only
nums = list(xrange(1000000))
s = sum(nums)
p = product(nums)
}}}

However, a generator might still be the only way, if the storage of these generated objects in memory is not practical, and it might be worth to pay the price of duplicated expensive computations.

= Examples =
Line 6: Line 60:
#!python
Line 16: Line 71:
#!python
Line 22: Line 78:
Line 26: Line 83:
#!python
Line 33: Line 91:
to be written: Generators made from classes?
to be written: Discussion of code simplification (similar to the discussion on performance improvement)

== Links ==

 * [http://www.python.org/peps/pep-0255.html PEP-255: Simple Iterators] -- the original
 * [http://www-106.ibm.com/developerworks/library/l-pycon.html?n-l-9271 Iterators and Simple Generators]
 * [http://www-106.ibm.com/developerworks/linux/library/l-cpyiter.html combinatorial functions in itertools]
 * [http://linuxgazette.net/100/pramode.html Python Generator Tricks] -- various infinite sequences, recursions, ...
 * [http://www-106.ibm.com/developerworks/linux/library/l-pythrd.html "weightless threads"] -- simulating threads using generators
 * [http://www-106.ibm.com/developerworks/library/x-tipgenr.html XML processing] -- yes, using generators
 * [http://c2.com/cgi/wiki?GeneratorsAreNotCoroutines C2:GeneratorsAreNotCoroutines] -- particulars on generators, coroutines, and continuations

See also: ["Iterator"]

= Discussion =

I once saw MikeOrr demonstrate Before and After examples. But, I forget how they worked.

Can someone demonstrate here?

He did something like: Show how a normal list operation could be written to use generators. Something like:

{{{
#!python
def double(L):
    return [x*2 for x in L]

eggs = double([1, 2, 3, 4, 5])
}}}

...he showed how that, or something like that, could be rewritten using iterators, generators.

It's been a while since I've seen it, I may be getting this all wrong.

-- LionKimbro [[DateTime(2005-04-02T19:12:19Z)]]

{{{
#!python
# explicitly write a generator function
def double(L):
    for x in L:
        yield x*2

# eggs will be a generator
eggs = double([1, 2, 3, 4, 5])

# the above is equivalent to ("generator comprehension"?)
eggs = (x*2 for x in [1, 2, 3, 4, 5])

# need to do this if you need a list
eggs = list(double([1, 2, 3, 4, 5]))

# the above is equivalent to (list comprehension)
eggs = [x*2 for x in [1, 2, 3, 4, 5]]
}}}

For the above example, a generator comprehension or list comprehension is sufficient unless you need to apply that in many places.

Also, a generator function will be cleaner and more clear, if the generated expressions are more complex, involve multiple steps, or depend on additional temporary state.

Consider the following example:

{{{
#!python
def unique(iterable, key=lambda x: x):
    seen = set()
    for elem, ekey in ((e, key(e)) for e in iterable):
        if ekey not in seen:
            yield elem
            seen.add(ekey)
}}}

Here, the temporary keys collector, ''seen'', is a temporary storage that will just be more clutter in the location where this generator will be used.

Even if we were to use this only once, it is worth writing a function (for the sake of clarity; remember that Python allows nested functions).

Generators are very useful and can help simplify your code and improve its performance.

Improved Performance

The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower memory usage.

This can be illustrated by comparing the range and xrange built-ins of Python 2.x.

Both range and xrange represent a range of numbers, and have the same function signature, but range returns a list while xrange returns a generator.

Say, we had to compute the sum of the first n, say 1,000,000, non-negative numbers.

   1 # Note: Python 2.x only
   2 # using a non-generator
   3 sum_of_first_n = sum(range(1000000))
   4 
   5 # using a generator
   6 sum_of_first_n = sum(xrange(1000000))

Note that both lines are identical in form, but the one using range is much more expensive.

When we use range we build a 1,000,000 element list in memory and then find its sum. This is a waste, considering that we use these 1,000,000 elements just to compute the sum.

This waste becomes more pronounced as the number of elements (our n) becomes larger, the size of our elements become larger, or both.

On the other hand, when we use xrange, we do not incur the cost of building a 1,000,000 element list in memory. The generator created by xrange will generate each number, which sum will consume to accumulate the sum.

In the case of the "range" function, using it as an iterable is the dominant use-case, and this is reflected in Python 3.x, which makes the range built-in return a generator instead of a list.

It should however we noted that a generator will provide performance benefits only if do not intend to use that set of generated values more than once.

Consider the following example:

   1 # Note: Python 2.x only
   2 s = sum(xrange(1000000))
   3 p = product(xrange(1000000))

Imagine that making a integer is a very expensive process. In the above code, we just performed the same expensive process twice. In cases like this, building a list in memory might be worth it (see example below):

   1 # Note: Python 2.x only
   2 nums = list(xrange(1000000))
   3 s = sum(nums)
   4 p = product(nums)

However, a generator might still be the only way, if the storage of these generated objects in memory is not practical, and it might be worth to pay the price of duplicated expensive computations.

Examples

For example, the RangeGenerator can be used to iterate over a large number of values, without creating a massive list (like range would)

   1 #the for loop will generate each i (i.e. 1,2,3,4,5, ...), add it to sum,  and throw it away
   2 #before the next i is generated.  This is opposed to iterating through range(...), which creates
   3 #a potentially massive list and then iterates through it.
   4 for i in irange(1000000):
   5    sum = sum+i

Generators can be composed. Here we create a generator on the squares of consecutive integers.

   1 #square is a generator 
   2 square = (i*i for i in irange(1000000))
   3 #add the squares
   4 for i in square:
   5    sum = sum +i

Here, we compose a square generator with the takewhile generator, to generate squares less than 100

   1 #add squares less than 100
   2 square = (i*i for i in count())
   3 bounded_squares = takewhile(lambda x : x< 100, square)
   4 for i in bounded_squares:
   5    sum += i

to be written: Generators made from classes? to be written: Discussion of code simplification (similar to the discussion on performance improvement)

See also: ["Iterator"]

Discussion

I once saw MikeOrr demonstrate Before and After examples. But, I forget how they worked.

Can someone demonstrate here?

He did something like: Show how a normal list operation could be written to use generators. Something like:

   1 def double(L):
   2     return [x*2 for x in L]
   3 
   4 eggs = double([1, 2, 3, 4, 5])

...he showed how that, or something like that, could be rewritten using iterators, generators.

It's been a while since I've seen it, I may be getting this all wrong.

-- LionKimbro DateTime(2005-04-02T19:12:19Z)

   1 # explicitly write a generator function
   2 def double(L):
   3     for x in L:
   4         yield x*2
   5 
   6 # eggs will be a generator
   7 eggs = double([1, 2, 3, 4, 5])
   8 
   9 # the above is equivalent to ("generator comprehension"?)
  10 eggs = (x*2 for x in [1, 2, 3, 4, 5])
  11 
  12 # need to do this if you need a list
  13 eggs = list(double([1, 2, 3, 4, 5]))
  14 
  15 # the above is equivalent to (list comprehension)
  16 eggs = [x*2 for x in [1, 2, 3, 4, 5]]

For the above example, a generator comprehension or list comprehension is sufficient unless you need to apply that in many places.

Also, a generator function will be cleaner and more clear, if the generated expressions are more complex, involve multiple steps, or depend on additional temporary state.

Consider the following example:

   1 def unique(iterable, key=lambda x: x):
   2     seen = set()
   3     for elem, ekey in ((e, key(e)) for e in iterable):
   4         if ekey not in seen:
   5             yield elem
   6             seen.add(ekey)

Here, the temporary keys collector, seen, is a temporary storage that will just be more clutter in the location where this generator will be used.

Even if we were to use this only once, it is worth writing a function (for the sake of clarity; remember that Python allows nested functions).

Generators (last edited 2020-12-04 19:05:45 by ChrisM)

Unable to edit the page? See the FrontPage for instructions.