Differences between revisions 5 and 49 (spanning 44 versions)
Revision 5 as of 2008-12-08 12:47:45
Size: 1981
Comment: Ops, fix link markup
Revision 49 as of 2011-03-23 06:23:21
Size: 10743
Editor: techtonik
Comment: add 2to3 links
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:

Rough outline of what needs to be covered
=========================================

* Using 2to3. *(-> Benjamin)*
* Bringing code up to shape to work with 2.6 and 2to3->3.0.
* What about 2.3/2.4/2.5 compatibility?
There are three ways to support Python 3:

1) maintain a Python 2 base and use 2to3_ to generate Python 3 code
2) make code run unmodified in both Python 2 and Python 3
3) maintain separate releases for Python 2 and Python 3

Each approach has its strengths and weaknesses.


Approach 1: Maintain a Python 2 base and use 2to3 to generate Python 3 code
===========================================================================

Make sure the code runs in Python 2.6 and use 2to3_

This approach does *not* require that support for versions before 2.6 needs to be dropped.
Even the need to test with 2.6 is not strict - one could just as well use this approach on
code that has never been tested with 2.6, and is only known to work on 2.5.

2to3_ is a Python program that reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code. The standard library contains a rich set of fixers that will handle almost all code. 2to3_ supporting library *lib2to3* is, however, a flexible and generic library, so it is possible to write your own fixers for 2to3_. *lib2to3* could also be adapted to custom applications in which Python code needs to be edited automatically. For more information about 2to3_, see: http://doc.python.org/library/2to3.html

Usage of 2to3_ can be integrated into the installation process: 2to3_ can be run as a build step in distutils. With distutils, the following fragments are needed in a setup.py::

  try:
      from distutils.command.build_py import build_py_2to3 as build_py
  except ImportError:
      # 2.x
      from distutils.command.build_py import build_py
  ...
  setup(...
    cmdclass = {'build_py':build_py}
  )

This will leave intact all source files, and convert them to Python 3 in the build area, if setup.py is run on Python 3. 'setup.py install' will then copy the 3.x version of the code into the target directory. If run on 2.x, nothing will change at all (as 2.x doesn't provide the build_py_2to3 class).

For distribute, this approach can be expressed as::

  setup(...
    use_2to3=True
  )

Manual changes (not done by 2to3_):

* os.path.walk => os.walk: see issue4601_.
* rfc822 => email

.. _issue4601: http://bugs.python.org/issue4601

Additional resources:

* http://diveintopython3.org/porting-code-to-python-3-with-2to3.html
* http://packages.python.org/distribute/python3.html
* http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide
* http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
* http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting
* http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html
* http://python3porting.com/strategies.html

Strings and Bytes
-----------------

Design decisions needs to be taken what exactly must be represented as bytes, and what as strings (Unicode data). In many cases, this is easy. However, data sent or received over pipes or sockets are usually in binary mode, so explicit conversions may be necessary. For example, the Postgres API requires SQL queries to be transmitted in the connection encoding. It is probably easiest to convert the queries to the connection encoding as early as possible.

The standard IO streams (sys.stdin, sys.stdout, sys.stderr), are in text mode by default in Python 3. Calling `sys.stdout.write(some_binary_data)` will fail. To write binary data to standard out, you must do the following::

    sys.stdout.flush()
    sys.stdout.buffer.write(some_binary_data)

The flush() clears out any text (unicode) data that is stored in the TextIOWrapper, and the buffer attribute provides access to the lower-level binary stream. If you are only writing binary data, you can remove the text layer with `sys.stdout = sys.stdout.detach()` in Python 3.1; in Python 3.0, do `sys.stdout = sys.stdout.buffer`.


Approach 2: Make code run unmodified in both Python 2 and Python 3
==================================================================

Supporting code that runs in both Python 2.6 and Python 3 is not much more difficult than porting to Python 3 (see above for more details). However, supporting older versions--like 2.3, 2.4, and 2.5--is much more difficult. This requires avoiding any syntax that only works in one version or the other, so it often involves resorting to hacks. These hacks make the code more awkward than it would be in Python 3 or in Python 2, but many people find that this is better than supporting only one version or the other. When support for Python 2 is eventually dropped, these hacks can be removed.

Python 2.6 introduces forward-compatibility for many new Python 3 features. The Python 3 builtins are accessible by doing an import from `future_builtins`. To use the new print function, use `from __future__ import print_function`, and to make string literals be interpreted as unicode, use `from __future__ import unicode_literals`. These forward-compatibility features, and many others, are described in greater detail in `What's New in Python 2.6`_:

.. _What's New in Python 2.6: http://docs.python.org/whatsnew/2.6.html

Additional resources:

* http://pydev.blogspot.com/2008/11/making-code-work-in-python-2-and-3.html
* http://mail.mems-exchange.org/durusmail/qp/441/
* http://mikewatkins.ca/2008/11/29/python-2-and-3-metaclasses/

Six is a library which has helpers to simplify writing codebases that support both Python 2 and 3. See http://pypi.python.org/pypi/six.


Strings and Unicode
-------------------

As of Python 2.6 the 2.x line includes a "bytes = str" alias in the builtins. Along with the bytes literal syntax, this allows binary data stored in a str instance to be clearly flagged so that it will use the correct type when the code is run in 3.x (either directly or via the 2to3_ conversion tool). The reason it is done this way rather than backporting the 3.x bytes type is that most 2.x APIs that expect immutable binary data expect it as an 8-bit str instance - trying to pass in a backported 3.x bytes type wouldn't have the desired effect. To support versions earlier than 2.6, it is possible to define the alias at the top of the module::

   try:
     bytes # Forward compatibility with Py3k
   except NameError:
     bytes = str

When using "from __future__ import unicode_literals" in a module, it may also be useful to insert "str = unicode" near the top of the module. This will ensure that "isinstance('', str)" remains true in that module::

   try:
     str = unicode
   except NameError:
     pass # Forward compatibility with Py3k


Print Statement/Function
------------------------

Since `print` is a statement in Python 2 and a function in Python 3, it cannot be used in code that runs in both Python 2.5 and Python 3. Instead, use the `write` method. Replace this::

    print 'hello world'

with this::

    import sys
    sys.stdout.write('hello world\n')

or if you really like to be tricky, with this::

    print('hello world')

This uses the print function in Python 3 but still works with the print statement in Python 2. Just don't use anything fancy, like writing to a file with `>>`.


Exceptions
----------

Python before 2.6 does not have the `as` keyword, so Python 2 and Python 3 have incompatible syntax for accessing the value of an exception. Compatible code should use the following idiom to save the value of an exception::

    import sys
    try:
        open('/path/to/some/file')
    except IOError:
        _, e, _ = sys.exc_info()

Relative Imports
----------------

Python 3 makes a distinction between relative and absolute imports. In Python 2.5, use `from __future__ import absolute_import` to get the same behavior as Python 3. To support older versions as well, only use absolute imports. Replace a relative import::

    from xyz import abc

with an absolute import::

    from mypackage.xyz import abc


Integer Division
----------------

Make sure to use `from __future__ import division` (introduced in Python 2.2) to get the non-truncating behavior, which is default in Python 3.



Approach 3: Maintain Separate Releases for Python 2 and Python 3
================================================================

Convert a Python 2 tree to Python 3 with the 2to3_ tool: http://doc.python.org/library/2to3.html
Line 18: Line 170:

- bytes vs. strings 1. Design decisions needs to be taken what exactly
  must be represented as bytes, and what as strings. In many cases, that
  was easy for psycopg, except for the question how SQL queries are
  represented. It appears clear that they are plain text, however, the
  Postgres API requires them to be transmitted in the connection
  encoding. I still decided to represent them internally in Unicode,
  but converting them to the connection encoding as early as possible
  probably would have worked as well.
Line 35: Line 178:
- Python code 1: I used the 2to3 support in distutils
Line 43: Line 184:
  not auto-converted with 2to3 support. I explicitly added a 2to3   not auto-converted with 2to3_ support. I explicitly added a 2to3_
Line 47: Line 188:

Differences in specific libraries
=================================

These are porting notes on differences in third party libraries when run on Python 2 and Python 3.

* `PyQt4 </PyQt4>`_
Line 50: Line 199:
* `PortingDjangoToPy3k`_ * a bunch of very useful reference code: http://code.google.com/p/python-incompatibility/
* `PortingDjangoTo3k`_
Line 52: Line 202:
* `Early2to3Migrations` * `Early2to3Migrations`_
* `Stephan Deibel's blog post on code usable from Python 2.0 through 3.0 <http://pythonology.blogspot.com/2009/02/making-code-run-on-python-20-through-30.html>`_
* `Python code that demonstrates differences between 2.5 and 3.0, and ways of making the same code run on 2.6 and 3.x <http://python-incompatibility.googlecode.com/>`_
* Georg Brandl's blog post collecting some information: http://pythonic.pocoo.org/2008/12/14/python-3-porting-resources
* A blog post about porting experience: `psycopg2 porting to Python 3: a report <http://initd.org/psycopg/articles/2011/01/24/psycopg2-porting-python-3-report/>`_
* Reference Card: Moving from Python 2 to Python 3: http://ptgmedia.pearsoncmg.com/imprint_downloads/informit/promotions/python/python2python3.pdf
* `Lennart Regebro's overview of Porting strategies <http://python3porting.com/strategies.html>`_

Porting Python Code to 3.0

There are three ways to support Python 3:

  1. maintain a Python 2 base and use 2to3 to generate Python 3 code
  2. make code run unmodified in both Python 2 and Python 3
  3. maintain separate releases for Python 2 and Python 3

Each approach has its strengths and weaknesses.

Approach 1: Maintain a Python 2 base and use 2to3 to generate Python 3 code

Make sure the code runs in Python 2.6 and use 2to3

This approach does not require that support for versions before 2.6 needs to be dropped. Even the need to test with 2.6 is not strict - one could just as well use this approach on code that has never been tested with 2.6, and is only known to work on 2.5.

2to3 is a Python program that reads Python 2.x source code and applies a series of fixers to transform it into valid Python 3.x code. The standard library contains a rich set of fixers that will handle almost all code. 2to3 supporting library lib2to3 is, however, a flexible and generic library, so it is possible to write your own fixers for 2to3. lib2to3 could also be adapted to custom applications in which Python code needs to be edited automatically. For more information about 2to3, see: http://doc.python.org/library/2to3.html

Usage of 2to3 can be integrated into the installation process: 2to3 can be run as a build step in distutils. With distutils, the following fragments are needed in a setup.py:

try:
    from distutils.command.build_py import build_py_2to3 as build_py
except ImportError:
    # 2.x
    from distutils.command.build_py import build_py
...
setup(...
  cmdclass = {'build_py':build_py}
)

This will leave intact all source files, and convert them to Python 3 in the build area, if setup.py is run on Python 3. 'setup.py install' will then copy the 3.x version of the code into the target directory. If run on 2.x, nothing will change at all (as 2.x doesn't provide the build_py_2to3 class).

For distribute, this approach can be expressed as:

setup(...
  use_2to3=True
)

Manual changes (not done by 2to3):

  • os.path.walk => os.walk: see issue4601.
  • rfc822 => email

Additional resources:

Strings and Bytes

Design decisions needs to be taken what exactly must be represented as bytes, and what as strings (Unicode data). In many cases, this is easy. However, data sent or received over pipes or sockets are usually in binary mode, so explicit conversions may be necessary. For example, the Postgres API requires SQL queries to be transmitted in the connection encoding. It is probably easiest to convert the queries to the connection encoding as early as possible.

The standard IO streams (sys.stdin, sys.stdout, sys.stderr), are in text mode by default in Python 3. Calling sys.stdout.write(some_binary_data) will fail. To write binary data to standard out, you must do the following:

sys.stdout.flush()
sys.stdout.buffer.write(some_binary_data)

The flush() clears out any text (unicode) data that is stored in the TextIOWrapper, and the buffer attribute provides access to the lower-level binary stream. If you are only writing binary data, you can remove the text layer with sys.stdout = sys.stdout.detach() in Python 3.1; in Python 3.0, do sys.stdout = sys.stdout.buffer.

Approach 2: Make code run unmodified in both Python 2 and Python 3

Supporting code that runs in both Python 2.6 and Python 3 is not much more difficult than porting to Python 3 (see above for more details). However, supporting older versions--like 2.3, 2.4, and 2.5--is much more difficult. This requires avoiding any syntax that only works in one version or the other, so it often involves resorting to hacks. These hacks make the code more awkward than it would be in Python 3 or in Python 2, but many people find that this is better than supporting only one version or the other. When support for Python 2 is eventually dropped, these hacks can be removed.

Python 2.6 introduces forward-compatibility for many new Python 3 features. The Python 3 builtins are accessible by doing an import from future_builtins. To use the new print function, use from __future__ import print_function, and to make string literals be interpreted as unicode, use from __future__ import unicode_literals. These forward-compatibility features, and many others, are described in greater detail in What's New in Python 2.6:

Additional resources:

Six is a library which has helpers to simplify writing codebases that support both Python 2 and 3. See http://pypi.python.org/pypi/six.

Strings and Unicode

As of Python 2.6 the 2.x line includes a "bytes = str" alias in the builtins. Along with the bytes literal syntax, this allows binary data stored in a str instance to be clearly flagged so that it will use the correct type when the code is run in 3.x (either directly or via the 2to3 conversion tool). The reason it is done this way rather than backporting the 3.x bytes type is that most 2.x APIs that expect immutable binary data expect it as an 8-bit str instance - trying to pass in a backported 3.x bytes type wouldn't have the desired effect. To support versions earlier than 2.6, it is possible to define the alias at the top of the module:

try:
  bytes # Forward compatibility with Py3k
except NameError:
  bytes = str

When using "from __future__ import unicode_literals" in a module, it may also be useful to insert "str = unicode" near the top of the module. This will ensure that "isinstance('', str)" remains true in that module:

try:
  str = unicode
except NameError:
  pass # Forward compatibility with Py3k

Exceptions

Python before 2.6 does not have the as keyword, so Python 2 and Python 3 have incompatible syntax for accessing the value of an exception. Compatible code should use the following idiom to save the value of an exception:

import sys
try:
    open('/path/to/some/file')
except IOError:
    _, e, _ = sys.exc_info()

Relative Imports

Python 3 makes a distinction between relative and absolute imports. In Python 2.5, use from __future__ import absolute_import to get the same behavior as Python 3. To support older versions as well, only use absolute imports. Replace a relative import:

from xyz import abc

with an absolute import:

from mypackage.xyz import abc

Integer Division

Make sure to use from __future__ import division (introduced in Python 2.2) to get the non-truncating behavior, which is default in Python 3.

Approach 3: Maintain Separate Releases for Python 2 and Python 3

Convert a Python 2 tree to Python 3 with the 2to3 tool: http://doc.python.org/library/2to3.html

Martin's notes from psycopg2

  • the buffer object is gone; I use memoryview in 3.x.
  • various tests where in the code of the form if version_major == and version_minor > 4 (say, or 5) This will break for 3.x; you have to write if (version_major == 2 and version_minor > 4) or version_major > 2
  • Python code 2: setup.py needs to run in both versions. I had to replace popen2 with subprocess if available. Also, map() now returns an iterator, which I explicitly convert into list, and so on.
  • Python code 3: the test suite doesn't get installed, and hence not auto-converted with 2to3 support. I explicitly added a 2to3 conversion into the test runner, which copies the py3 version of the test into a separate directory.

Differences in specific libraries

These are porting notes on differences in third party libraries when run on Python 2 and Python 3.

PortingPythonToPy3k (last edited 2019-10-26 22:24:27 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.