Differences between revisions 4 and 5
Revision 4 as of 2005-04-02 23:59:32
Size: 2562
Editor: aaron
Comment:
Revision 5 as of 2008-07-19 02:44:25
Size: 6344
Editor: pool-71-187-81-186
Comment: Discussion on performance improvement, including caveat.
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

= Improved Performance =

The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower memory usage.

This can be illustrated by comparing the `range` and `xrange` built-ins of Python 2.x.

Both `range` and `xrange` represent a range of numbers, and have the same function signature, but `range` returns a `list` while `xrange` returns a generator.

Say, we had to compute the sum of the first ''n'', say 1,000,000, non-negative numbers.

{{{
#!python
# Note: Python 2.x only
# using a non-generator
sum_of_first_n = sum(range(1000000))

# using a generator
sum_of_first_n = sum(xrange(1000000))
}}}

Note that both lines are identical in form, but the one using `range` is much more expensive.

When we use `range` we build a 1,000,000 element list in memory and then find its sum. This is a waste, considering that we use these 1,000,000 elements just to compute the sum.

This waste becomes more pronounced as the number of elements (our ''n'') becomes larger, the size of our elements become larger, or both.

On the other hand, when we use `xrange`, we do not incur the cost of building a 1,000,000 element list in memory. The generator created by xrange will generate each number, which `sum` will consume to accumulate the sum.

In the case of the "range" function, using it as an iterable is the dominant use-case, and this is reflected in Python 3.x, which makes the `range` built-in return a generator instead of a list.

It should however we noted that a generator will provide performance benefits only if do not intend to use that set of generated values more than once.

Consider the following example:

{{{
#!python
# Note: Python 2.x only
s = sum(xrange(1000000))
p = product(xrange(1000000))
}}}

Imagine that making a integer is a very expensive process. In the above code, we just performed the same expensive process twice. In cases like this, building a list in memory might be worth it (see example below):

{{{
#!python
# Note: Python 2.x only
nums = list(xrange(1000000))
s = sum(nums)
p = product(nums)
}}}

However, a generator might still be the only way, if the storage of these generated objects in memory is not practical, and it might be worth to pay the price of duplicated expensive computations.
Line 39: Line 92:
to be written: Discussion of code simplification (similar to the discussion on performance improvement)
Line 73: Line 127:

{{{
#!python
# explicitly write a generator function
def double(L):
    for x in L:
        yield x*2

# eggs will be a generator
eggs = double([1, 2, 3, 4, 5])

# the above is equivalent to ("generator comprehension"?)
eggs = (x*2 for x in [1, 2, 3, 4, 5])

# need to do this if you need a list
eggs = list(double([1, 2, 3, 4, 5]))

# the above is equivalent to (list comprehension)
eggs = [x*2 for x in [1, 2, 3, 4, 5]]
}}}

For the above example, a generator comprehension or list comprehension is sufficient unless you need to apply that in many places.

Also, a generator function will be cleaner and more clear, if the generated expressions are more complex, involve multiple steps, or depend on additional temporary state.

Consider the following example:

{{{
#!python
def unique(iterable, key=lambda x: x):
    seen = set()
    for elem, ekey in ((e, key(e)) for e in iterable):
        if ekey not in seen:
            yield elem
            seen.add(ekey)
}}}

Here, the temporary keys collector, ''seen'', is a temporary storage that will just be more clutter in the location where this generator will be used.

Even if we were to use this only once, it is worth writing a function (for the sake of clarity; remember that Python allows nested functions).

Generators are very useful and can help simplify your code and improve its performance.

Improved Performance

The performance improvement from the use of generators is the result of the lazy (on demand) generation of values, which translates to lower memory usage.

This can be illustrated by comparing the range and xrange built-ins of Python 2.x.

Both range and xrange represent a range of numbers, and have the same function signature, but range returns a list while xrange returns a generator.

Say, we had to compute the sum of the first n, say 1,000,000, non-negative numbers.

   1 # Note: Python 2.x only
   2 # using a non-generator
   3 sum_of_first_n = sum(range(1000000))
   4 
   5 # using a generator
   6 sum_of_first_n = sum(xrange(1000000))

Note that both lines are identical in form, but the one using range is much more expensive.

When we use range we build a 1,000,000 element list in memory and then find its sum. This is a waste, considering that we use these 1,000,000 elements just to compute the sum.

This waste becomes more pronounced as the number of elements (our n) becomes larger, the size of our elements become larger, or both.

On the other hand, when we use xrange, we do not incur the cost of building a 1,000,000 element list in memory. The generator created by xrange will generate each number, which sum will consume to accumulate the sum.

In the case of the "range" function, using it as an iterable is the dominant use-case, and this is reflected in Python 3.x, which makes the range built-in return a generator instead of a list.

It should however we noted that a generator will provide performance benefits only if do not intend to use that set of generated values more than once.

Consider the following example:

   1 # Note: Python 2.x only
   2 s = sum(xrange(1000000))
   3 p = product(xrange(1000000))

Imagine that making a integer is a very expensive process. In the above code, we just performed the same expensive process twice. In cases like this, building a list in memory might be worth it (see example below):

   1 # Note: Python 2.x only
   2 nums = list(xrange(1000000))
   3 s = sum(nums)
   4 p = product(nums)

However, a generator might still be the only way, if the storage of these generated objects in memory is not practical, and it might be worth to pay the price of duplicated expensive computations.

Examples

For example, the RangeGenerator can be used to iterate over a large number of values, without creating a massive list (like range would)

   1 #the for loop will generate each i (i.e. 1,2,3,4,5, ...), add it to sum,  and throw it away
   2 #before the next i is generated.  This is opposed to iterating through range(...), which creates
   3 #a potentially massive list and then iterates through it.
   4 for i in irange(1000000):
   5    sum = sum+i

Generators can be composed. Here we create a generator on the squares of consecutive integers.

   1 #square is a generator 
   2 square = (i*i for i in irange(1000000))
   3 #add the squares
   4 for i in square:
   5    sum = sum +i

Here, we compose a square generator with the takewhile generator, to generate squares less than 100

   1 #add squares less than 100
   2 square = (i*i for i in count())
   3 bounded_squares = takewhile(lambda x : x< 100, square)
   4 for i in bounded_squares:
   5    sum += i

to be written: Generators made from classes? to be written: Discussion of code simplification (similar to the discussion on performance improvement)

See also: ["Iterator"]

Discussion

I once saw MikeOrr demonstrate Before and After examples. But, I forget how they worked.

Can someone demonstrate here?

He did something like: Show how a normal list operation could be written to use generators. Something like:

   1 def double(L):
   2     return [x*2 for x in L]
   3 
   4 eggs = double([1, 2, 3, 4, 5])

...he showed how that, or something like that, could be rewritten using iterators, generators.

It's been a while since I've seen it, I may be getting this all wrong.

-- LionKimbro DateTime(2005-04-02T19:12:19Z)

   1 # explicitly write a generator function
   2 def double(L):
   3     for x in L:
   4         yield x*2
   5 
   6 # eggs will be a generator
   7 eggs = double([1, 2, 3, 4, 5])
   8 
   9 # the above is equivalent to ("generator comprehension"?)
  10 eggs = (x*2 for x in [1, 2, 3, 4, 5])
  11 
  12 # need to do this if you need a list
  13 eggs = list(double([1, 2, 3, 4, 5]))
  14 
  15 # the above is equivalent to (list comprehension)
  16 eggs = [x*2 for x in [1, 2, 3, 4, 5]]

For the above example, a generator comprehension or list comprehension is sufficient unless you need to apply that in many places.

Also, a generator function will be cleaner and more clear, if the generated expressions are more complex, involve multiple steps, or depend on additional temporary state.

Consider the following example:

   1 def unique(iterable, key=lambda x: x):
   2     seen = set()
   3     for elem, ekey in ((e, key(e)) for e in iterable):
   4         if ekey not in seen:
   5             yield elem
   6             seen.add(ekey)

Here, the temporary keys collector, seen, is a temporary storage that will just be more clutter in the location where this generator will be used.

Even if we were to use this only once, it is worth writing a function (for the sake of clarity; remember that Python allows nested functions).

Generators (last edited 2020-12-04 19:05:45 by ChrisM)

Unable to edit the page? See the FrontPage for instructions.