Revision 12 as of 2009-11-04 13:10:09

Clear message

Porting Python Code to 3.0

Rough outline of what needs to be covered

Martin's notes from psycopg2

  • bytes vs. strings 1. Design decisions needs to be taken what exactly must be represented as bytes, and what as strings. In many cases, that was easy for psycopg, except for the question how SQL queries are represented. It appears clear that they are plain text, however, the Postgres API requires them to be transmitted in the connection encoding. I still decided to represent them internally in Unicode, but converting them to the connection encoding as early as possible probably would have worked as well.
  • the buffer object is gone; I use memoryview in 3.x.
  • various tests where in the code of the form if version_major == and version_minor > 4 (say, or 5) This will break for 3.x; you have to write if (version_major == 2 and version_minor > 4) or version_major > 2
  • Python code 1: I used the 2to3 support in distutils
  • Python code 2: setup.py needs to run in both versions. I had to replace popen2 with subprocess if available. Also, map() now returns an iterator, which I explicitly convert into list, and so on.
  • Python code 3: the test suite doesn't get installed, and hence not auto-converted with 2to3 support. I explicitly added a 2to3 conversion into the test runner, which copies the py3 version of the test into a separate directory.

Manual changes (not done by 2to3)

  • os.path.walk => os.walk: see issue4601.
  • rfc822 => email

Tips & Tricks

  • As of Python 2.6 the 2.x line includes a "bytes = str" alias in the builtins. Along with the bytes literal syntax, this allows binary data stored in a str instance to be clearly flagged so that it will use the correct type when the code is run in 3.x (either directly or via the 2to3 conversion tool). The reason it is done this way rather than backporting the 3.x bytes type is that most 2.x APIs that expect immutable binary data expect it as an 8-bit str instance - trying to pass in a backported 3.x bytes type wouldn't have the desired effect. To support versions earlier than 2.6, it is possible to define the alias at the type of the module:
    try:
    bytes # Forward compatibility with Py3k
    except NameError:
    bytes = str
  • When using "from __future__ import unicode_literals" in a module, it may also be useful to insert "str = unicode" near the top of the module. This will ensure that "isinstance('', str)" remains true in that module:
    try:
    str = unicode
    except NameError:
    pass # Forward compatibility with Py3k

Unable to edit the page? See the FrontPage for instructions.