Differences between revisions 2 and 13 (spanning 11 versions)
Revision 2 as of 2007-08-10 23:23:48
Size: 989
Editor: neu67-4-88-160-66-91
Comment:
Revision 13 as of 2019-10-19 22:00:22
Size: 2343
Comment: Remove Python 2-specific information, leaving a link to previous revision for accessibility
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== Python 2.x == = Text handling in Python 3 =
Line 3: Line 3:
Python 2.x has two types to store a string:
 * str: bytes string procced as character string which is a mistake
 * unicode: character string (unicode)
Python 3 uses two very different types:
 * {{{bytes}}}: intended to represent raw byte data. For more information on this type, please consult [[http://www.python.org/dev/peps/pep-0358/|PEP 358]].
 * {{{str}}}: a unicode character string
Line 7: Line 7:
Both classes has same methods and are very similar. == Choosing Between "bytes" and "str" ==
Line 9: Line 9:
== Python 3000 == When choosing the type you want to use to work with text you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:
Line 11: Line 11:
Python 3000 use two very different types:
 * bytes: bytes string which can be see as a list of [0..255] integers
 * str: character string (unicode), exactly the same type than Python 2.x "unicode"
 * a network socket manipulates bytes
 * a text parser manipulates characters (uses lower, strip, etc. methods)
Line 15: Line 14:
== old str and new bytes == == Iterating over "bytes" ==
Line 17: Line 16:
Differences between Python 2.x "str" and Python 3000 "bytes":
 * str is immutable, bytes is mutable
 * bytes "lacks" many methods: strip, lstrip, rstrip, lower, upper, etc.
It's important to note that the {{{bytes}}} iterator generates integers and not characters:
Line 21: Line 18:
== choose between bytes and str == {{{
>>> for item in b'abc':
... print item
97
98
99
}}}
Line 23: Line 26:
When you migration from Python 2.x to Python 3000, you have to ask youself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:
 * a network socket manipulate bytes
 * a text parser manipulates characters (use lower, strip, etc. methods)
== Comparing "bytes" ==

Comparing one {{{bytes}}} object to another works as expected:

{{{
>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False
}}}

However, it is important to note that the {{{bytes}}} type is completely distinct from the {{{str}}} type in Python 3, and comparisons between them do ''not'' work:

{{{
>>> b'xyz' == 'xyz'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare bytes and str
}}}

This should make clearly evident some incomplete transitions. But it also means that you really can't mix then very well:

{{{
>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes
}}}


As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

{{{
>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120
}}}


=== Hashing "bytes" ===

{{{bytes}}} is mutable, and as a result, it's not hashable. Among other things, this means that {{{bytes}}} objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:
 * use {{{buffer(value)}}}

Other solutions include:
 * create an immutable {{{frozenbytes}}} type
 * avoid using hash


= Historical information =

For historical information that may be useful in porting or maintaining remaining Python 2 systems, please see [[https://wiki.python.org/moin/BytesStr?action=recall&rev=12|previous page revisions]].

Text handling in Python 3

Python 3 uses two very different types:

  • bytes: intended to represent raw byte data. For more information on this type, please consult PEP 358.

  • str: a unicode character string

Choosing Between "bytes" and "str"

When choosing the type you want to use to work with text you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

  • a network socket manipulates bytes
  • a text parser manipulates characters (uses lower, strip, etc. methods)

Iterating over "bytes"

It's important to note that the bytes iterator generates integers and not characters:

>>> for item in b'abc':
...   print item
97
98
99

Comparing "bytes"

Comparing one bytes object to another works as expected:

>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False

However, it is important to note that the bytes type is completely distinct from the str type in Python 3, and comparisons between them do not work:

>>> b'xyz' == 'xyz'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare bytes and str

This should make clearly evident some incomplete transitions. But it also means that you really can't mix then very well:

>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes

As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120

Hashing "bytes"

bytes is mutable, and as a result, it's not hashable. Among other things, this means that bytes objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:

  • use buffer(value)

Other solutions include:

  • create an immutable frozenbytes type

  • avoid using hash

Historical information

For historical information that may be useful in porting or maintaining remaining Python 2 systems, please see previous page revisions.

BytesStr (last edited 2019-10-19 22:00:22 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.