Writing code that runs under both Python2 and 3

The intent of this page is to provide specific guidelines in a quick reference format for writing code that is compatible with both Python2 and Python3. The idea is that you can check this single page once you're familiar with the basic concepts and approaches but need a refresher on a specific coding techniques.

Most entries here will link to a more in-depth explanation of the basic recipe given here in case you need more than a simple refresher on the subject. At the bottom of this page, you will find various resources for diving more into aspects of supporting Python 3, from the pure-Python, C extension module, packaging, and other perspectives.

Before you start

Here are recommendations for you to follow before you start porting.

I cannot overemphasize the last point. Without a clear separation in your mind and data model between bytes and strings, your port will likely be much more painful than it needs to be. This is the biggest distinction between Python 2 and Python 3. Where Python 2 let you be sloppy, with its 8-bit strings that served as both data and ASCII strings, with automatic (but error prone) conversions between 8-bit strings and unicodes, in Python 3 there are only bytes and strings (i.e. unicodes), with no automatic conversion between the two. This is A Good Thing.

Pure Python Source

Basic compatibility

Put the following at the top of all your Python files:

from __future__ import absolute_import, division, print_function, unicode_literals

This turns on some important compatibility flags.

In your code, make these changes:

Commentary: Some folks don't like to import unicode_literals because it has the potential to change your API (e.g. possibly returning unicodes where before it returned 8-bit strings in Python 2). One developer recommends making this change in your tests only, or only doing this if you have exceptionally good test coverage. YMMV.

built-ins

def mock_it(builtin_name):
   name = ('builtins.%s' if sys.version_info >= (3,) else '__builtin__.%s') % builtin_name
   return mock.patch(name)

[more info]

New Style Classes

In Python 2, you have classic classes and new style classes, whereas in Python 3 you have only the latter. One way of declaring a class to be new style in Python 2 is to inherit from built-in object. Such classes will also be new style in Python 3, but with the added cruft of an unnecessary base class. A better way is to add the following to the top of the modules containing your new style classes:

__metaclass__ = type

As above, this will have no effect in Python 3, but it's much easier to remove one line of cruft than a ton of useless subclasses. Of course, in Python 2, this will make all the classes in that module new style, but since you'll have to use new style classes in Python 3 anyway, and because new style classes are better, just go ahead and covert all your classes now.

See below for how to define classes with different metaclasses than the default.

codecs

from codecs import getencoder
encoder = getencoder('rot-13')
rot13string = encoder(mystring)[0]

[more info]

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='UTF-8', line_buffering=True)

dictionaries

doctests

from __future__ import absolute_import, print_function, unicode_literals
def setUp(testobj):
    testobj.globs['absolute_import'] = absolute_import
    testobj.globs['print_function'] = print_function
    testobj.globs['unicode_literals'] = unicode_literals

[more info]

def print_bytes(obj)
    if bytes is not str:
        obj = repr(obj)[2:-1]
    print(obj)

gettext

kwargs = {}
if sys.version_info >= (3,):
    kwargs['unicode'] = True
gettext.install(domain, LOCALEDIR, **kwargs)

iterators

metaclasses

# Define the Enum class using metaclass syntax compatible with both Python 2
# and Python 3.
Enum = EnumMetaclass(str('Enum'), (), {
    '__doc__': 'The public API Enum class.',
    })

Here EnumMetaclass is the metaclass (duh!) and Enum is the class you're creating which has the custom metaclass. You pass in the base classes (of which there are none, hence the empty tuple) and the dictionary of attributes for the class you're creating. The use of str() here is a bit odd, but it's because the enum module uses from __future__ import unicode_literals for Python 2, but in Python 2, class names must be str/bytes. We can't use the b'' prefix though because in Python 3, class names must be unicodes.

operators

from collections import Sequence
return isinstance(obj, Sequence)

raise

strings/bytes/unicodes

subprocess

zope.interfaces

Python extension modules

Compatibility macros

C types

There are lots of differences you need to be aware of when defining types in C extensions. A few important ones:

PyArg_Parse()

PyCObject

reprs

#define REPRV(obj) \
    (PyUnicode_Check(obj) ? (obj) : NULL), \
    (PyUnicode_Check(obj) ? NULL : PyBytes_AS_STRING(obj))

and use it like this:

return PyUnicode_FromFormat("...%V...", REPRV(parent_repr));

[more info]

Third party packages

Resources

PortingToPy3k/BilingualQuickRef (last edited 2015-04-17 23:27:44 by BarryWarsaw)

Unable to edit the page? See the FrontPage for instructions.