Differences between revisions 1 and 27 (spanning 26 versions)
Revision 1 as of 2004-12-24 08:07:52
Size: 3706
Editor: B62-49
Comment:
Revision 27 as of 2005-01-26 23:01:58
Size: 11537
Editor: dsl-61-29-106-50
Comment: Added an additional lambda section titled "Inappropriate Use of Lambda"
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
DubiousPython

Following a suggestion (not yet linkable on google groups) of FernadoPerez on comp.lang.python, I am inaugurating this wiki page. Fernado's suggestion was that there should somewhere be a collection of bad python practices together with an explanation of the badness and a preferred alternative. As a hobbyist uni-lingual programmer, I can say that I'd certainly find such a resources useful. I can kick it off, but I'm afraid that I likely have more to offer on the dubious than the preferred side of the ledger.

Anyway, I thought it was a good idea, and a wiki page seems the best way to distribute the lifting. I'm hopeful this will get it started, but don't have any investment in the form or content of what is here; refactor at will. -- BrianvandenBroek
This page follows a [http://groups.google.ca/groups?hl=en&lr=&selm=mailman.8378.1103832808.5135.python-list%40python.org&prev=/groups%3Fq%3Ddubious%2BFernando%26hl%3Den%26lr%3D%26group%3Dcomp.lang.python.*%26selm%3Dmailman.8378.1103832808.5135.python-list%2540python.org%26rnum%3D2 suggestion] of FernadoPerez on comp.lang.python. His suggestion was that there should somewhere be a collection of bad python practices together with an explanation of the badness and a preferred alternative. This seems a good job for a wiki-page, so ...

[[TableOfContents(2)]]
Line 14: Line 11:
{{{#!python
Line 15: Line 13:
>>> new_string = ""
>>> for s in string_list:
>>> new_string = "" 
>>> for s in string_list: 
Line 18: Line 16:

>>> print new_string
>>> print new_string
onebigstringinpieces
>>>}}}

=== The Problem ===
This is slow and resource heavy. Each time through the {{{for}}} loop, a new string is built and the old one is discarded. That might not matter so much for such a small case, but as the number of elements to be joined creeps up, so too does the inefficiency.

''The above code is perfectly fine. Its readible, and it works. Worrying about speed/performance when it is not an issue is one of the worst programming practices. "Premature Optimization is the Root of All Evil" -- see Wiki:PrematureOptimization''

''Training yourself to NEVER use some possibly-tempting idiom that is NEVER right is not premature: JUST SAY NO, learn to see this way to build up strings as ugly and always wrong, and live happily ever after!''


=== Preferred Alternatives ===
==== String Formatting ====
For short cases where the number of strings to be joined is known, you can use string formatting as follows:

{{{#!python
>>> print '%s%s%s%s%s' % tuple(string_list)
onebigstringinpieces}}}

This is much more efficient, but is also rather more limited in the range of circumstances to which it applies. It could be made more general by constructing the formatting string as a function of `len(string_list)`, but this would be a bit dubious, too. It's also less readible, less maintainible. A better alternative is found in the next section.


==== The join method of strings ====
The join method of the string type lets you perform the concatenation as follows:

{{{#!python
>>> print "".join(string_list)
onebigstringinpieces}}}

This is quite efficient and perfectly general as it applies to any arbitrary list of strings. (You don't need to know the list length in advance.)

The major thing to puzzle the newcomer here is why `"".join(some_list)` rather than `some_list.join()`. The way to think of this is that you are using the string "" to join the elements of some_list. Hence,

{{{#!python
>>> print 'JOINT'.join(string_list)
oneJOINTbigJOINTstringJOINTinJOINTpieces}}}

That said, some do consider this aspect of the join method of strings odd enough to count as a PythonWart.

If the `''.join(some_list)` syntax really bothers you, one option is to bind the method to a different name, e.g.

{{{#!python
>>> join = ''.join
>>> print join(string_list)
Line 22: Line 63:

=== The Problem ===
This is slow and resource heavy. Each time through the for loop, a new string is built and the old on is discarded. That might not matter so much for such a small case, but as the number of elements to be joined creeps up, so too does the inefficiency.

=== Preferred Alternatives ===
==== String Formatting ====
For short cases where the number of strings to be joined is know, you can use string formatting as follows:

>>> print '%s%s%s%s%s' %tuple(string_list)
onebigstringinpieces

This is much more efficient, but is also rather more limited in the range of circumstances to which it applies. It could be made more general by constructing the formatting string as a function of len(string_list), but this would be a bit dubious, too, when we have

==== The join method of strings ====
The join method of the string type lets you perform the concatenation as follows:

>>> print "".join(string_list)
onebigstringinpieces

This is quite efficient and perfectly general as it applies to any arbitrary list of strings. (You don't need to know the list length in advance.)

The major thing to puzzle the newcomer here is why "".join(some_list) rather than some_list.join(). The way to think of this is that you are using the string "" to join the elements of some_list. Hence,

>>> print 'JOINT'.join(string_list)
oneJOINTbigJOINTstringJOINTinJOINTpieces

That said, some do consider this aspect of the join method of strings odd enough to count as a PythonWart.

== Boolean Redundancy ==
(Please name me better!)
>>> underscore_join = '_'.join
>>> print underscore_join(string_list)
one_big_string_in_pieces}}}

''The audience should be an expert in the idioms of a language when considering readability.'' The
join is simple to this crowd.



== Overly Verbose Conditionals ==
Line 56: Line 76:
{{{#!python
Line 57: Line 78:
    # continue process

and
    # do something}}}

{{{#!python
Line 65: Line 85:
        return False

=== The Problem ===
There is a (very slight) speed of execution inefficiency in these examples. But much more important is the speed of entry and understanding inefficiency. All other things being equal, extra typing is evil. And, unless some gain in clarity is purchased by the extra characters, the more characters in the code, the longer that code will take to understand. (The programming time you save could well be your own!)

=== Preferred Alternatives ===
        return False}}}

{{{#!python
if len(somecontainer) > 0:
    # do something}}}


=== The Problem ===
There is a slight speed of execution inefficiency in these examples. The first example has the overhead of an extra method lookup (`bool.__eq__`) and an extra name lookup (`True`). The second example has the overhead of an extra branch statement. The third example has the overhead of two extra method lookups (`somecontainer.__len__` and `int.__cmp__`).

But much more important is the speed of entry and understanding inefficiency. All other things being equal, extra typing is evil. And, unless some substantial gain in clarity is purchased by the extra characters, the more characters in the code, the longer that code will take to understand. (The programming time you save could well be your own!)

=== Preferred Alternatives ===
{{{#!python
Line 72: Line 99:
    # continue process
    # do something}}}

{{{#!python
Line 75: Line 103:
    return count > 10

    
    return count > 10}}}

Most non-empty containers evaluate to True in a boolean context, so no test on len() is generally necessary:

{{{#!python
if somecontainer:
    # do something}}}

Some containers (e.g. numarray.array) do not evaluate this way. In these cases, the preferred idiom is:

{{{#!python
if len(somecontainer):
    # do something}}}


== Overuse of lambda ==
Lambda forms allow anonymous functions to be created and used as part of an expression. However, when a function is already named, wrapping this function in a lambda can decrease readability and affect program efficiency.

=== Dubious Way ===
{{{#!python
dict(a=lambda x: str(x),
     b=lambda x: some_dict.get(x))}}}

=== The Problem ===
Using a lambda when a function is already named incurs the extra overhead of one function call, which is generally undesirable as function calls are relatively expensive in Python. Even setting execution efficiency aside, the lambda-less versions are preferred because they are generally more concise and easier to read.

=== Preferred Alternatives ===
{{{#!python
dict(a=str,
     b=some_dict.get)}}}

== Inappropriate use of Lambda ==
There are situations where the usage of Lambda is completely inappropriate. The most inappropriate usage is where a lambda is used to create a named function. Especially when that named function uses recursion.

=== Dubious Way ===
{{{#!python
bstr = lambda n, l=16: n<0 and binarystr((2L<<l)+n) or n and bstr(n>>1).lstrip('0')+str(n&1) or '0'}}}

=== The Problem ===
This code was presented on comp.lang.python as a solution to "How do I turn a number into a string of 1's and 0's. The person who posted it copied it verbatim out of the python cookbook, and didn't realise there were significant problems with it.

The code is compressed to the level that it is practically impossible at a first glance to realise if there is anything wrong with it at all. Lets break it down to see the errors. Rewritten as a real function.

{{{#!python
def bstr(n, length=16):
    if n == 0:
        return '0'
    if n<0:
        return binarystr((long(2)<<length)+n)
    return bstr(n>>1).lstrip('0') + str(n&1)}}}

The code calls itself by the wrong name in the if n<0 branch! It calls 'binarystr' recursively instead of 'bstr'.

=== The Solutions ===
Don't give a lambda a name, even if it only has localised and limited usage. If it needs to be given a name, it has a right to be a normal function. Use 'def' and make a named function.

Don't write lambdas that use 'and' and 'or' with shortcut evaluation in order to return the correct value. The example presented is a good indication of why that can lead to subtly buggy code. Instead, use if statements, and your code will be more readable, and you will be able to spot bugs far more easily.

== Overuse of Regular Expressions ==
Regular Expressions provde a powerful tool for doing complicated string searches. However, for simple string searches, regular expressions are often overkill.

(Note that such overuse of regular expressions is often the result of converting Perl code to Python code.)

=== Dubious Way ===
{{{#!python
matcher = re.compile(r'defg')
if matcher.search(s):
    do_something()}}}

{{{#!python
matcher = re.compile(r'(\S+)')
words = matcher.findall(s)}}}

=== The Problem ===
For simple tasks, using a regular expression can add unnecessary overhead from compiling the regular expression and using the match object to search. When applicable, using string methods can often be faster and more concise.

=== Preferred Alternatives ===

{{{#!python
if 'defg' in s:
    do_something()}}}

Note that in pre-2.3 Pythons 'in' only worked with single character strings. In Python 2.3 and above, 'in' works with multi-character substrings as above.

{{{#!python
words = s.split()}}}


== Counting Items without Enumerate ==
It is often useful to keep track of the index of an item in an iterable, for example, for reporting the line number of a string in a file. As of Python 2.3 the preferred way to do this is using the builtin enumerate instead of a manually updated count variable. In versions of Python before 2.3,
this is actually not all that dubious... ;-)

=== Dubious Way ===
{{{#!python
count = 0
for item in iterable:
    try:
        do_something(item)
    except Exception:
        raise Exception('error on item %r' % count)
    count += 1}}}

=== The Problem ===
While timings are comparable, manual update is more verbose and has a greater risk of programming error if the programmer forgets to update the count variable.

=== Preferred Alternatives ===

UsingEnumerate:

{{{#!python
for count, item in enumerate(iterable):
    try:
        do_something(item)
    except Exception:
        raise Exception('error on item %r' % count)}}}


== Premature Optimization ==
''Note that this is at the bottom not because it is less significant than any of the other problems but because, unlike the above idioms, is not a problem specific to Python, but a general programming problem.''

Premature Optimization is spending effort on execution efficiency before determining which parts of the code are actually significant to the program efficiency.

=== Dubious Way ===
This occurs in a variety of contexts, all of which involve spending extra time making code run faster before first writing a simple, concise implementation that produces the correct results.

Example of over-optimizing attribute accesses:
{{{#!python
app = lst.append
for item in iterable:
    app(func(item))}}}

=== The Problem ===
While a correctly applied optimization can indeed speed up code, optimizing code that is only seldom used can waste significant development time, and can make code harder to read. Optimizations should only be sought when a programmer has isolated (using a profiler, etc.) a significant bottleneck in program efficiency. Write correct code first, then make it fast (if necessary).

=== Preferred Way ===
Don't optimize until necessary.

Example of non-optimized but more readable attribute access:
{{{#!python
for item in iterable:
    lst.append(func(item))}}}

''The above examles look equally readable to me. In fact, the binding of a function to an
object
{{{
app = lst.append
}}}

enhances readiblity and maintainability if used more than once:

{{{
app("a")
app("b")
}}}

is preferred over
{{{
lst.append("a")
lst.append("b")
}}}
since the binding of lst and append happens twice - copy and paste.''

This page follows a [http://groups.google.ca/groups?hl=en&lr=&selm=mailman.8378.1103832808.5135.python-list%40python.org&prev=/groups%3Fq%3Ddubious%2BFernando%26hl%3Den%26lr%3D%26group%3Dcomp.lang.python.*%26selm%3Dmailman.8378.1103832808.5135.python-list%2540python.org%26rnum%3D2 suggestion] of FernadoPerez on comp.lang.python. His suggestion was that there should somewhere be a collection of bad python practices together with an explanation of the badness and a preferred alternative. This seems a good job for a wiki-page, so ...

TableOfContents(2)

String Concatenation

String concatenation is building up a relatively lengthy string from a collection of strings.

Dubious Way

Newcomers to Python often try to build strings up like this:

   1 >>> string_list = ['one', 'big', 'string', 'in', 'pieces']
   2 >>> new_string = "" 
   3 >>> for s in string_list: 
   4         new_string = new_string + s
   5 >>> print new_string 
   6 onebigstringinpieces 
   7 >>>

The Problem

This is slow and resource heavy. Each time through the for loop, a new string is built and the old one is discarded. That might not matter so much for such a small case, but as the number of elements to be joined creeps up, so too does the inefficiency.

The above code is perfectly fine. Its readible, and it works. Worrying about speed/performance when it is not an issue is one of the worst programming practices. "Premature Optimization is the Root of All Evil" -- see PrematureOptimization

Training yourself to NEVER use some possibly-tempting idiom that is NEVER right is not premature: JUST SAY NO, learn to see this way to build up strings as ugly and always wrong, and live happily ever after!

Preferred Alternatives

String Formatting

For short cases where the number of strings to be joined is known, you can use string formatting as follows:

   1 >>> print '%s%s%s%s%s' % tuple(string_list)
   2 onebigstringinpieces

This is much more efficient, but is also rather more limited in the range of circumstances to which it applies. It could be made more general by constructing the formatting string as a function of len(string_list), but this would be a bit dubious, too. It's also less readible, less maintainible. A better alternative is found in the next section.

The join method of strings

The join method of the string type lets you perform the concatenation as follows:

   1 >>> print "".join(string_list)
   2 onebigstringinpieces

This is quite efficient and perfectly general as it applies to any arbitrary list of strings. (You don't need to know the list length in advance.)

The major thing to puzzle the newcomer here is why "".join(some_list) rather than some_list.join(). The way to think of this is that you are using the string "" to join the elements of some_list. Hence,

   1 >>> print 'JOINT'.join(string_list)
   2 oneJOINTbigJOINTstringJOINTinJOINTpieces

That said, some do consider this aspect of the join method of strings odd enough to count as a PythonWart.

If the ''.join(some_list) syntax really bothers you, one option is to bind the method to a different name, e.g.

   1 >>> join = ''.join
   2 >>> print join(string_list)
   3 onebigstringinpieces
   4 >>> 
   5 >>> underscore_join = '_'.join
   6 >>> print underscore_join(string_list)
   7 one_big_string_in_pieces

The audience should be an expert in the idioms of a language when considering readability. The join is simple to this crowd.

Overly Verbose Conditionals

Among the most common tasks in programming is to test if a condition obtains and act accordingly. It is common for newcomers to Python to adopt an all-together overly verbose idiom for this.

Dubious Way

   1 if (count > 10) == True:
   2     # do something

   1 def count_tester(count):
   2     if count > 10:
   3         return True
   4     else:
   5         return False

   1 if len(somecontainer) > 0:
   2     # do something

The Problem

There is a slight speed of execution inefficiency in these examples. The first example has the overhead of an extra method lookup (bool.__eq__) and an extra name lookup (True). The second example has the overhead of an extra branch statement. The third example has the overhead of two extra method lookups (somecontainer.__len__ and int.__cmp__).

But much more important is the speed of entry and understanding inefficiency. All other things being equal, extra typing is evil. And, unless some substantial gain in clarity is purchased by the extra characters, the more characters in the code, the longer that code will take to understand. (The programming time you save could well be your own!)

Preferred Alternatives

   1 if count > 10:
   2     # do something

   1 def count_tester(count):
   2     return count > 10

Most non-empty containers evaluate to True in a boolean context, so no test on len() is generally necessary:

   1 if somecontainer:
   2     # do something

Some containers (e.g. numarray.array) do not evaluate this way. In these cases, the preferred idiom is:

   1 if len(somecontainer):
   2     # do something

Overuse of lambda

Lambda forms allow anonymous functions to be created and used as part of an expression. However, when a function is already named, wrapping this function in a lambda can decrease readability and affect program efficiency.

Dubious Way

   1 dict(a=lambda x: str(x),
   2      b=lambda x: some_dict.get(x))

The Problem

Using a lambda when a function is already named incurs the extra overhead of one function call, which is generally undesirable as function calls are relatively expensive in Python. Even setting execution efficiency aside, the lambda-less versions are preferred because they are generally more concise and easier to read.

Preferred Alternatives

   1 dict(a=str,
   2      b=some_dict.get)

Inappropriate use of Lambda

There are situations where the usage of Lambda is completely inappropriate. The most inappropriate usage is where a lambda is used to create a named function. Especially when that named function uses recursion.

Dubious Way

   1 bstr = lambda n, l=16: n<0 and binarystr((2L<<l)+n) or n and bstr(n>>1).lstrip('0')+str(n&1) or '0'

The Problem

This code was presented on comp.lang.python as a solution to "How do I turn a number into a string of 1's and 0's. The person who posted it copied it verbatim out of the python cookbook, and didn't realise there were significant problems with it.

The code is compressed to the level that it is practically impossible at a first glance to realise if there is anything wrong with it at all. Lets break it down to see the errors. Rewritten as a real function.

   1 def bstr(n, length=16):
   2     if n == 0:
   3         return '0'
   4     if n<0:
   5         return binarystr((long(2)<<length)+n)
   6     return bstr(n>>1).lstrip('0') + str(n&1)

The code calls itself by the wrong name in the if n<0 branch! It calls 'binarystr' recursively instead of 'bstr'.

The Solutions

Don't give a lambda a name, even if it only has localised and limited usage. If it needs to be given a name, it has a right to be a normal function. Use 'def' and make a named function.

Don't write lambdas that use 'and' and 'or' with shortcut evaluation in order to return the correct value. The example presented is a good indication of why that can lead to subtly buggy code. Instead, use if statements, and your code will be more readable, and you will be able to spot bugs far more easily.

Overuse of Regular Expressions

Regular Expressions provde a powerful tool for doing complicated string searches. However, for simple string searches, regular expressions are often overkill.

(Note that such overuse of regular expressions is often the result of converting Perl code to Python code.)

Dubious Way

   1 matcher = re.compile(r'defg')
   2 if matcher.search(s):
   3     do_something()

   1 matcher = re.compile(r'(\S+)')
   2 words = matcher.findall(s)

The Problem

For simple tasks, using a regular expression can add unnecessary overhead from compiling the regular expression and using the match object to search. When applicable, using string methods can often be faster and more concise.

Preferred Alternatives

   1 if 'defg' in s:
   2     do_something()

Note that in pre-2.3 Pythons 'in' only worked with single character strings. In Python 2.3 and above, 'in' works with multi-character substrings as above.

   1 words = s.split()

Counting Items without Enumerate

It is often useful to keep track of the index of an item in an iterable, for example, for reporting the line number of a string in a file. As of Python 2.3 the preferred way to do this is using the builtin enumerate instead of a manually updated count variable. In versions of Python before 2.3, this is actually not all that dubious... ;-)

Dubious Way

   1 count = 0
   2 for item in iterable:
   3     try:
   4         do_something(item)
   5     except Exception:
   6         raise Exception('error on item %r' % count)
   7     count += 1

The Problem

While timings are comparable, manual update is more verbose and has a greater risk of programming error if the programmer forgets to update the count variable.

Preferred Alternatives

UsingEnumerate:

   1 for count, item in enumerate(iterable):
   2     try:
   3         do_something(item)
   4     except Exception:
   5         raise Exception('error on item %r' % count)

Premature Optimization

Note that this is at the bottom not because it is less significant than any of the other problems but because, unlike the above idioms, is not a problem specific to Python, but a general programming problem.

Premature Optimization is spending effort on execution efficiency before determining which parts of the code are actually significant to the program efficiency.

Dubious Way

This occurs in a variety of contexts, all of which involve spending extra time making code run faster before first writing a simple, concise implementation that produces the correct results.

Example of over-optimizing attribute accesses:

   1 app = lst.append
   2 for item in iterable:
   3     app(func(item))

The Problem

While a correctly applied optimization can indeed speed up code, optimizing code that is only seldom used can waste significant development time, and can make code harder to read. Optimizations should only be sought when a programmer has isolated (using a profiler, etc.) a significant bottleneck in program efficiency. Write correct code first, then make it fast (if necessary).

Preferred Way

Don't optimize until necessary.

Example of non-optimized but more readable attribute access:

   1 for item in iterable:
   2     lst.append(func(item))

The above examles look equally readable to me. In fact, the binding of a function to an object

app = lst.append

enhances readiblity and maintainability if used more than once:

app("a")
app("b")

is preferred over

lst.append("a")
lst.append("b")

since the binding of lst and append happens twice - copy and paste.

DubiousPython (last edited 2010-07-20 17:56:20 by 65-125-135-157)

Unable to edit the page? See the FrontPage for instructions.