Text handling in Python 3

Python 3 uses two very different types:

Choosing Between "bytes" and "str"

When choosing the type you want to use to work with text you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

Iterating over "bytes"

It's important to note that the bytes iterator generates integers and not characters:

>>> for item in b'abc':
...   print item
97
98
99

Comparing "bytes"

Comparing one bytes object to another works as expected:

>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False

However, it is important to note that the bytes type is completely distinct from the str type in Python 3, and comparisons between them do not work:

>>> b'xyz' == 'xyz'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare bytes and str

This should make clearly evident some incomplete transitions. But it also means that you really can't mix then very well:

>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes

As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120

Hashing "bytes"

bytes is mutable, and as a result, it's not hashable. Among other things, this means that bytes objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:

Other solutions include:

Historical information

For historical information that may be useful in porting or maintaining remaining Python 2 systems, please see previous page revisions.

BytesStr (last edited 2019-10-19 22:00:22 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.