Differences between revisions 1 and 2
Revision 1 as of 2004-12-11 14:40:40
Size: 18946
Comment: intermediate save - much more needs to be done to clean this up
Revision 2 as of 2004-12-11 15:04:51
Size: 19525
Comment: basic markup conversion
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Line 52: Line 53:
From Guido van Rossum <guido@python.org> <mailto:guido@python.org> From Guido van Rossum
Line 59: Line 60:
An alternative way to speed up sorts is to construct a list of tuples
whose
first element is a sort key that will sort properly using the
default
comparison, and whose second element is the original list
element. This is the so-called Schwartzian Transform
<
http://www.google.com/search?q=Schwartzian+Transform&ie=UTF-8&oe=UTF-8>.
An alternative way to speed up sorts is to construct a list of tuples whose
first element is a sort key that will sort properly using the default
comparison, and whose second element is the original list element. This is
the so-called
[
http://www.google.com/search?q=Schwartzian+Transform&ie=UTF-8&oe=UTF-8
Schwartzian Transform]
.
Line 68: Line 70:
{{{
Line 72: Line 75:
}}}
Line 76: Line 80:
{{{
Line 81: Line 86:
}}}
Line 84: Line 90:
{{{
Line 96: Line 103:
}}}
Line 113: Line 121:
{{{
Line 116: Line 125:

Use |s = "".join(list)| instead. The former is a very common and
}}}

Use {{{s = "".join(list)}}} instead. The former is a very common and
Line 121: Line 131:
{{{
Line 124: Line 135:
}}}
Line 127: Line 139:
{{{
Line 129: Line 142:
}}}
Line 132: Line 146:
{{{
Line 133: Line 148:
}}}
Line 136: Line 152:
{{{
Line 137: Line 154:
}}}
Line 141: Line 159:
{{{
Line 142: Line 161:
}}}
Line 154: Line 174:
Python supports a couple of looping constructs. The |for| statement is
most commonly used. It loops over the elements of a sequence, assigning
each
to the loop variable. If the body of your loop is simple, the
interpreter
overhead of the |for| loop itself can be a substantial
amount of the
overhead. This is where the |map
<
http://www.python.org/doc/lib/built-in-funcs.html>| function is handy.
You can think of |map| as a |for| moved into C code. The only
restriction is that the "loop body" of |map| must be a function call.
Python supports a couple of looping constructs. The {{{for}}} statement is
most commonly used. It loops over the elements of a sequence, assigning each
to the loop variable. If the body of your loop is simple, the interpreter
overhead of the {{{for}}} loop itself can be a substantial amount of the
overhead. This is where the
[
http://www.python.org/doc/lib/built-in-funcs.html map] function is handy.
You can think of {{{map}}} as a {{{for}}} moved into C code. The only
restriction is that the "loop body" of {{{map}}} must be a function call.
Line 166: Line 186:
{{{
Line 169: Line 190:

you can use |map| to push the loop from the interpreter into compiled C
}}}

you can use {{{map}}} to push the loop from the interpreter into compiled C
Line 173: Line 195:
{{{
Line 174: Line 197:
}}}
Line 178: Line 202:
newlist = [s.upper() for s in list]

It's generally not any faster than the for loop version, however.

Guido van Rossum wrote a much more detailed examination of loop
optimization <http://www.python.org/doc/essays/list2str.html> that is
{{{
newlist = [s.upper() for s in oldlist]
}}}

Generator expressions were added to Python in version 2.4. They function
more-or-less like list comprehensions or {{{map}}} but avoid the overhead of
generating the entire list at once. Instead, they return a generator object
which can be iterated over bit-by-bit:

{{{
newlist = (s.upper() for s in oldlist)
}}}

Which method is appropriate will depend on what version of Python you're
using and the characteristics of the data you are manipulating.

Guido van Rossum wrote a much more detailed examination of [
http://www.python.org/doc/essays/list2str.html loop optimization] that is
Line 190: Line 226:
Suppose you can't use |map| or a list comprehension? You may be stuck Suppose you can't use {{{map}}} or a list comprehension? You may be stuck
Line 192: Line 228:
|newlist.append| and |word.upper| are function references that are {{{newlist.append}}} and {{{word.upper}}} are function references that are
Line 196: Line 232:
{{{
Line 201: Line 238:
}}}
Line 205: Line 243:
definitions of |append| and |upper|. definitions of {{{append}}} and {{{upper}}}.
Line 211: Line 249:
The final speedup available to us for the non-|map| version of the |for| The final speedup available to us for the non-{{{map}}} version of the {{{for}}}
Line 213: Line 251:
cast as a function, append|| and |upper| become local variables. Python cast as a function, append{{{}}} and {{{upper}}} become local variables. Python
Line 216: Line 254:
{{{
Line 223: Line 262:
}}}
Line 226: Line 266:
|/usr/share/dict/words| (38,470 words at that time) to upper case:

Version  Time (seconds)
Basic loop 3.47
Eliminate dots 2.45
Local variable & no dots 1.79
Using |map| function 0.54

Eliminating the loop overhead by using |map| is often going to be the
most efficient option. When the complexity of your loop precludes its
use other techniques are available to speed up your loops, however.
{{{/usr/share/dict/words}}} (38,470 words at that time) to upper case:

{{{
Version                         Time (seconds)
Basic loop                      3.47
Eliminate dots                  2.45
Local variable & no dots        1.79
Using map function              0.54
}}}
Line 246: Line 284:
{{{
Line 251: Line 290:

Except for the first time, each time a word is seen the |if| statement's
}}}

Except for the first time, each time a word is seen the {{{if}}} statement's
Line 256: Line 296:
value will occur many times it is cheaper to use a |try| statement: value will occur many times it is cheaper to use a {{{try}}} statement:

{{{
Line 264: Line 305:
}}}
Line 266: Line 308:
default |except| clause to avoid trying to recover from an exception you
really can't handle by the statement(s) in the |try| clause.
default {{{except}}} clause to avoid trying to recover from an exception you
really can't handle by the statement(s) in the {{{try}}} clause.
Line 273: Line 315:
{{{
Line 276: Line 319:
}}}
Line 286: Line 330:
|import| statements can be executed just about anywhere. It's often {{{import}}} statements can be executed just about anywhere. It's often
Line 293: Line 337:
Consider the following two snippets of code (originally from Greg
McFarlane, I believe - I found it unattributed in a comp.lang.python
<news:comp.lang.python>/python-list@python.org
<mailto:python-list@python.org> posting and later attributed to him in
another
source):
Consider the following two snippets of code (originally from Greg McFarlane,
I believe - I found it unattributed in a comp.lang.python
python-list@python.org posting and later attributed to him in another
source):

{{{
Line 305: Line 349:
}}}
Line 308: Line 353:
{{{
Line 314: Line 360:

|
doit2| will run much faster than |doit1|, even though the reference to
the string module is global in |doit2|. Here's a Python interpreter
session run using Python 2.3 and the new |timeit| module, which shows
how
much faster the second is than the first:
}}}

{{{
doit2}}} will run much faster than {{{doit1}}}, even though the reference
to
the string module is global in {{{doit2}}}. Here's a Python interpreter
session run using Python 2.3 and the new {{{timeit}}} module, which shows how
much faster the second is than the first:

{{{
Line 335: Line 383:
}}}
Line 339: Line 388:
{{{
Line 344: Line 394:

Here's the proof from |timeit|:
}}}

Here's the proof from {{{timeit}}}:

{{{
Line 353: Line 405:
}}}
Line 366: Line 419:
{{{
Line 378: Line 432:
}}}
Line 381: Line 436:
{{{
Line 392: Line 448:
}}}
Line 395: Line 452:
{{{
Line 405: Line 463:
}}}
Line 407: Line 466:
than the first. Had |doit| been written in C the difference would likely
have been even greater (exchanging a Python |for| loop for a C |for|
than the first. Had {{{doit}}} been written in C the difference would likely
have been even greater (exchanging a Python {{{for}}} loop for a C {{{for}}}
Line 420: Line 479:
function in the |sys| module, |setcheckinterval|, which you can call to function in the {{{sys}}} module, {{{setcheckinterval}}}, which you can call to
Line 435: Line 494:
    % timeit.py -s 'x = 47' 'x * 2'
    1000000 loops, best of 3: 0.574 usec per loop
    % timeit.py -s 'x = 47' 'x << 1'
    1000000 loops, best of 3: 0.524 usec per loop
    % timeit.py -s 'x = 47' 'x + x'
    1000000 loops, best of 3: 0.382 usec per loop
{{{
% timeit.py -s 'x = 47' 'x * 2'
1000000 loops, best of 3: 0.574 usec per loop
% timeit.py -s 'x = 47' 'x << 1'
1000000 loops, best of 3: 0.524 usec per loop
% timeit.py -s 'x = 47' 'x + x'
1000000 loops, best of 3: 0.382 usec per loop
}}}
Line 444: Line 505:
{{{
Line 453: Line 515:
}}}
Line 456: Line 519:
    % for prog in mult add shift ; do
    < for i in 1 2 3 ; do
    < echo -n "$prog: "
    < /usr/bin/time ./$prog
    < done
    < echo
    < done
    mult: 6.12 real 5.64 user 0.01 sys
    mult: 6.08 real 5.50 user 0.04 sys
    mult: 6.10 real 5.45 user 0.03 sys

    add: 6.07 real 5.54 user 0.00 sys
    add: 6.08 real 5.60 user 0.00 sys
    add: 6.07 real 5.58 user 0.01 sys

    shift: 6.09 real 5.55 user 0.01 sys
    shift: 6.10 real 5.62 user 0.01 sys
    shift: 6.06 real 5.50 user 0.01 sys
{{{
% for prog in mult add shift ; do
< for i in 1 2 3 ; do
< echo -n "$prog: "
< /usr/bin/time ./$prog
< done
< echo
< done
mult: 6.12 real 5.64 user 0.01 sys
mult: 6.08 real 5.50 user 0.04 sys
mult: 6.10 real 5.45 user 0.03 sys

add: 6.07 real 5.54 user 0.00 sys
add: 6.08 real 5.60 user 0.00 sys
add: 6.07 real 5.58 user 0.01 sys

shift: 6.09 real 5.55 user 0.01 sys
shift: 6.10 real 5.62 user 0.01 sys
shift: 6.06 real 5.50 user 0.01 sys
}}}
Line 485: Line 550:
    while (<>) {
 print;
    }
{{{
while (<>) {
    print;
}
}}}
Line 491: Line 558:
    #!/usr/bin/env python

    
import fileinput

    for line in fileinput.input():
 print line,
{{{
import fileinput

for line in fileinput.input():
    print line,
}}}
Line 510: Line 577:
|timeit| module, which is new in Python 2.3. {{{timeit}}} module, which is new in Python 2.3.
Line 516: Line 583:
The profile module
<
http://www.python.org/doc/current/lib/module-profile.html> is included
as a standard module in the Python distribution. Using it to profile the
execution of a set of functions is quite easy. Suppose your main
function is called |main|
, takes no arguments and you want to execute it
under the control of the profile module. In its simplest form you just
execute
The
[
http://www.python.org/doc/current/lib/module-profile.html profile module]
is included
as a standard module in the Python distribution. Using
it to profile the
execution of a set of functions is quite easy. Suppose
your main function is called {{{main}}}
, takes no arguments and you want to
execute it under the control of the profile module. In its simplest form you
just execute

{{{
Line 526: Line 594:

When |main()| returns, the profile module will print a table of function
}}}

When {{{main()}}} returns, the profile module will print a table of function
Line 537: Line 606:
New in Python 2.2, the hotshot package
<http://www.python.org/doc/current/lib/module-hotshot.html> is intended
New in Python 2.2, the
[http://www.python.org/doc/current/lib/module-hotshot.html hotshot package] is intended
Line 542: Line 611:
is performing. There is also a |hotshotmain.py| program in the
distributions |Tools/scripts| directory which makes it easy to run your
is performing. There is also a {{{hotshotmain.py}}} program in the
distributions {{{Tools/scripts}}} directory which makes it easy to run your
Line 550: Line 619:
The trace module is a spin-off of the profile module I wrote originally The
[http://www.python.org/doc/current/lib/module-trace.html
trace module]
is a spin-off of the profile module I wrote originally
Line 559: Line 630:
{{{
Line 560: Line 632:
}}}

Python Performance Tips

This page is devoted to various tips and tricks that help improve the performance of your Python programs. Wherever the information comes from someone else, I've tried to identify the source.

Python has has changed in some significant ways since I first wrote my "fast python" page in about 1996, which means that some of the orderings will have changed. I migratg it to the Python wiki in hopes others will help maintain it.

Note: You should always test these tips with your application and the version of Python you intend to use and not just blindly accept that one method is faster than another.

Also new since this was originally written are packages like [http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ Pyrex], [http://psyco.sourceforge.net/ Psyco], [http://www.scipy.org/site_content/weave Weave] and [http://pyinline.sourceforge.net/ PyInline], which can dramatically improve your application's performance by making it easier to push performance-critical code into C or machine language.

Contents

  • [#datatype Choose the Right Data Structure (tbd)]
  • [#sorting Sorting]
  • [#stringcat String concatenation]
  • [#loops Loops]
  • [#dots Avoiding dots...]
  • [#local Local Variables]
  • [#initdict Initializing Dictionary Elements]
  • [#import Import Statement Overhead]
  • [#aggregate Data Aggregation]
  • [#periodic Doing Stuff Less Often]
  • [#notc Python is not C]
  • [#profiling Profiling Code]
    • [#profile Profile Module]
    • [#hotshot Hotshot Profiler (tbd)]
    • [#trace Trace Module]

Anchor(datatype)

Choose the Right Data Structure

TBD.

Anchor(sorting)

Sorting

From Guido van Rossum

Sorting lists of basic Python objects is generally pretty efficient. The sort method for lists takes an optional comparison function as an argument that can be used to change the sorting behavior. This is quite convenient, though it can really slow down your sorts.

An alternative way to speed up sorts is to construct a list of tuples whose first element is a sort key that will sort properly using the default comparison, and whose second element is the original list element. This is the so-called [http://www.google.com/search?q=Schwartzian+Transform&ie=UTF-8&oe=UTF-8 Schwartzian Transform].

Suppose, for example, you have a list of tuples that you want to sort by the n-th field of each tuple. The following function will do that.

def sortby(somelist, n):
    nlist = [(x[n], x) for x in somelist]
    nlist.sort()
    return [val for (key, val) in nlist]

Matching the behavior of the current list sort method (sorting in place) is easily achieved as well:

def sortby_inplace(somelist, n):
    somelist[:] = [(x[n], x) for x in somelist]
    somelist.sort()
    somelist[:] = [val for (key, val) in somelist]
    return

Here's an example use:

>>> somelist = [(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')]
>>> somelist.sort()
>>> somelist 
[(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')]
>>> nlist = sortby(somelist, 2)
>>> sortby_inplace(somelist, 2)
>>> nlist == somelist
True
>>> nlist = sortby(somelist, 1) 
>>> sortby_inplace(somelist, 1)
>>> nlist == somelist
True

Anchor(stringcat)

String Concatenation

Strings in Python are immutable. This fact frequently sneaks up and bites novice Python programmers on the rump. Immutability confers some advantages and disadvantages. In the plus column, strings can be used a keys in dictionaries and individual copies can be shared among multiple variable bindings. (Python automatically shares one- and two-character strings.) In the minus column, you can't say something like, "change all the 'a's to 'b's" in any given string. Instead, you have to create a new string with the desired properties. This continual copying can lead to significant inefficiencies in Python programs.

Avoid this:

s = ""
for substring in list:
    s += substring

Use s = "".join(list) instead. The former is a very common and catastrophic mistake when building large strings. Similarly, if you are generating bits of a string sequentially instead of:

s = ""
for x list:
    s += some_function(x)

use

slist = [some_function(elt) for elt in somelist]
s = "".join(slist)

Avoid:

out = "<html>" + head + prologue + query + tail + "</html>"

Instead, use

out = "<html>%s%s%s%s</html>" % (head, prologue, query, tail)

Even better, for readability (this has nothing to do with efficiency other than yours as a programmer), use dictionary substitution:

out = "<html>%(head)s%(prologue)s%(query)s%(tail)s</html>" % locals()

This last two are going to be much faster, especially when piled up over many CGI script executions, and easier to modify to boot. In addition, the slow way of doing things got slower in Python 2.0 with the addition of rich comparisons to the language. It now takes the Python virtual machine a lot longer to figure out how to concatenate two strings. (Don't forget that Python does all method lookup at runtime.)

Anchor(loops)

Loops

Python supports a couple of looping constructs. The for statement is most commonly used. It loops over the elements of a sequence, assigning each to the loop variable. If the body of your loop is simple, the interpreter overhead of the for loop itself can be a substantial amount of the overhead. This is where the [http://www.python.org/doc/lib/built-in-funcs.html map] function is handy. You can think of map as a for moved into C code. The only restriction is that the "loop body" of map must be a function call.

Here's a straightforward example. Instead of looping over a list of words and converting them to upper case:

newlist = []
for word in oldlist:
    newlist.append(word.upper())

you can use map to push the loop from the interpreter into compiled C code:

newlist = map(str.upper, oldlist)

List comprehensions were added to Python in version 2.0 as well. They provide a syntactically more compact way of writing the above for loop:

newlist = [s.upper() for s in oldlist]

Generator expressions were added to Python in version 2.4. They function more-or-less like list comprehensions or map but avoid the overhead of generating the entire list at once. Instead, they return a generator object which can be iterated over bit-by-bit:

newlist = (s.upper() for s in oldlist)

Which method is appropriate will depend on what version of Python you're using and the characteristics of the data you are manipulating.

Guido van Rossum wrote a much more detailed examination of [ http://www.python.org/doc/essays/list2str.html loop optimization] that is definitely worth reading.

Anchor(dots)

Avoiding dots...

Suppose you can't use map or a list comprehension? You may be stuck with the for loop. The for loop example has another inefficiency. Both newlist.append and word.upper are function references that are reevaluated each time through the loop. The original loop can be replaced with:

upper = str.upper
newlist = []
append = newlist.append
for word in list:
    append(upper(word))

This technique should be used with caution. It gets more difficult to maintain if the loop is large. Unless you are intimately familiar with that piece of code you will find yourself scanning up to check the definitions of append and upper.

Anchor(local)

Local Variables

The final speedup available to us for the non-map version of the for loop is to use local variables wherever possible. If the above loop is cast as a function, append and upper become local variables. Python accesses local variables much more efficiently than global variables.

def func():
    upper = str.upper
    newlist = []
    append = newlist.append
    for word in words:
        append(upper(word))
    return newlist

At the time I originally wrote this I was using a 100MHz Pentium running BSDI. I got the following times for converting the list of words in /usr/share/dict/words (38,470 words at that time) to upper case:

Version                         Time (seconds)
Basic loop                      3.47
Eliminate dots                  2.45
Local variable & no dots        1.79
Using map function              0.54

Anchor(initdict)

Initializing Dictionary Elements

Suppose you are building a dictionary of word frequencies and you've already broken your text up into a list of words. You might execute something like:

wdict = {}
has_key = wdict.has_key
for word in words:
    if not has_key(word): wdict[word] = 0
    wdict[word] = wdict[word] + 1

Except for the first time, each time a word is seen the if statement's test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:

wdict = {}
for word in words:
    try:
        wdict[word] += 1
    except KeyError:
        wdict[word] = 1

It's important to catch the expected KeyError exception, and not have a default except clause to avoid trying to recover from an exception you really can't handle by the statement(s) in the try clause.

A third alternative became available with the release of Python 2.x. Dictionaries now have a get() method which will return a default value if the desired key isn't found in the dictionary. This simplifies the loop:

wdict = {}
for word in words:
    wdict[word] = wdict.get(word, 0) + 1

When I originally wrote this section, there were clear situations where one of the first two approaches was faster. It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.

Anchor(import)

Import Statement Overhead

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

Consider the following two snippets of code (originally from Greg McFarlane, I believe - I found it unattributed in a comp.lang.python python-list@python.org posting and later attributed to him in another source):

def doit1():
    import string             ###### import statement inside function
    string.lower('Python')

for num in range(100000):
    doit1()

or:

import string             ###### import statement outside function
def doit2():
    string.lower('Python')

for num in range(100000):
    doit2()

doit2 will run much faster than doit1, even though the reference to the string module is global in doit2. Here's a Python interpreter session run using Python 2.3 and the new timeit module, which shows how much faster the second is than the first:

>>> def doit1():
...   import string
...   string.lower('Python')
... 
>>> import string
>>> def doit2():
...   string.lower('Python')
... 
>>> import timeit
>>> t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
>>> t.timeit()
11.479144930839539
>>> t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
>>> t.timeit()
4.6661689281463623

String methods were introduced to the language in Python 2.0. These provide a version that avoids the import completely and runs even faster:

def doit3():
    'Python'.lower()

for num in range(100000):
    doit3()

Here's the proof from timeit:

>>> def doit3():
...   'Python'.lower()
... 
>>> t = timeit.Timer(setup='from __main__ import doit3', stmt='doit3()')
>>> t.timeit()
2.5606080293655396

The above example is obviously a bit contrived, but the general principle holds.

Anchor(aggregate)

Data Aggregation

Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates. Here's a contrived example written in Python.

import time
x = 0
def doit1(i):
    global x
    x = x + i

list = range(100000)
t = time.time()
for i in list:
    doit1(i)

print "%.3f" % (time.time()-t)

vs.

import time
x = 0
def doit2(list):
    global x
    for i in list:
        x = x + i

list = range(100000)
t = time.time()
doit2(list)
print "%.3f" % (time.time()-t)

Here's the proof in the pudding using an interactive session:

>>> t = time.time()
>>> doit2(list)
>>> print "%.3f" % (time.time()-t)
0.204
>>> t = time.time()
>>> for i in list:
...     doit1(i)
... 
>>> print "%.3f" % (time.time()-t)
0.758

Even written in Python, the second example runs about four times faster than the first. Had doit been written in C the difference would likely have been even greater (exchanging a Python for loop for a C for loop as well as removing most of the function calls).

Anchor(periodic)

Doing Stuff Less Often

The Python interpreter performs some periodic checks. In particular, it decides whether or not to let another thread run and whether or not to run a pending call (typically a call established by a signal handler). Most of the time there's nothing to do, so performing these checks each pass around the interpreter loop can slow things down. There is a function in the sys module, setcheckinterval, which you can call to tell the interpreter how often to perform these periodic checks. Prior to the release of Python 2.3 it defaulted to 10. In 2.3 this was raised to 100. If you aren't running with threads and you don't expect to be catching many signals, setting this to a larger value can improve the interpreter's performance, sometimes substantially.

Anchor(notc)

Python is not C

It is also not Perl, Java, C++ or Haskell. Be careful when transferring your knowledge of how other languages perform to Python. A simple example serves to demonstrate:

% timeit.py -s 'x = 47' 'x * 2'
1000000 loops, best of 3: 0.574 usec per loop
% timeit.py -s 'x = 47' 'x << 1'
1000000 loops, best of 3: 0.524 usec per loop
% timeit.py -s 'x = 47' 'x + x'
1000000 loops, best of 3: 0.382 usec per loop

Now consider the similar C programs (only the add version is shown):

#include <stdio.h>

int
main (int argc, char **argv) {
        int i = 47;
        int loop;
        for (loop=0; loop<500000000; loop++)
                i + i;
}

and the execution times:

% for prog in mult add shift ; do
< for i in 1 2 3 ; do
<   echo -n "$prog: "
<   /usr/bin/time ./$prog
< done
< echo
< done
mult:         6.12 real         5.64 user         0.01 sys
mult:         6.08 real         5.50 user         0.04 sys
mult:         6.10 real         5.45 user         0.03 sys

add:          6.07 real         5.54 user         0.00 sys
add:          6.08 real         5.60 user         0.00 sys
add:          6.07 real         5.58 user         0.01 sys

shift:        6.09 real         5.55 user         0.01 sys
shift:        6.10 real         5.62 user         0.01 sys
shift:        6.06 real         5.50 user         0.01 sys

Note that there is a significant advantage in Python to adding a number to itself instead of multiplying it by two or shifting it left by one bit. In C on all modern computer architectures, each of the three arithmetic operations are translated into a single machine instruction which executes in one cycle, so it doesn't really matter which one you choose.

A common "test" new Python programmers often perform is to translate the common Perl idiom

while (<>) {
    print;
}

into Python code that looks something like

import fileinput

for line in fileinput.input():
    print line,

and use it to conclude that Python must be much slower than Perl. As others have pointed out numerous times, Python is slower than Perl for some things and faster for others. Relative performance also often depends on your experience with the two languages.

Anchor(profiling)

Profiling Code

The first step to speeding up your program is learning where the bottlenecks lie. It hardly makes sense to optimize code that is never executed or that already runs fast. I use two modules to help locate the hotspots in my code, profile and trace. In later examples I also use the timeit module, which is new in Python 2.3.

Anchor(profile)

Profile Module

The [http://www.python.org/doc/current/lib/module-profile.html profile module] is included as a standard module in the Python distribution. Using it to profile the execution of a set of functions is quite easy. Suppose your main function is called main, takes no arguments and you want to execute it under the control of the profile module. In its simplest form you just execute

import profile
profile.run('main()')

When main() returns, the profile module will print a table of function calls and execution times. The output can be tweaked using the Stats class included with the module. In Python 2.4 profile will allow the time consumed by Python builtins and functions in extension modulesto be profiled as well.

Anchor(hotshot)

Hotshot Module

New in Python 2.2, the [http://www.python.org/doc/current/lib/module-hotshot.html hotshot package] is intended as a replacement for the profile module. The underlying module is written in C, so using hotshot should result in a much smaller performance hit, and thus a more accurate idea of how your application is performing. There is also a hotshotmain.py program in the distributions Tools/scripts directory which makes it easy to run your program under hotshot control from the command line.

Anchor(trace)

Trace Module

The [http://www.python.org/doc/current/lib/module-trace.html trace module] is a spin-off of the profile module I wrote originally to perform some crude statement level test coverage. It's been heavily modified by several other people since I released my initial crude effort. As of Python 2.0 you should find trace.py in the Tools/scripts directory of the Python distribution. Starting with Python 2.3 it's in the standard library (the Lib directory). You can copy it to your local bin directory and set the execute permission, then execute it directly. It's easy to run from the command line to trace execution of whole scripts:

% trace.py -t spam.py eggs

There's no separate documentation, but you can execute "pydoc trace" to view the inline documentation.

PythonSpeed/PerformanceTips (last edited 2023-03-30 15:21:14 by FrankHenigman)

Unable to edit the page? See the FrontPage for instructions.