Differences between revisions 9 and 10
Revision 9 as of 2007-08-31 18:01:25
Size: 3644
Comment: comparisons between bytes and str *really* dont work
Revision 10 as of 2007-08-31 21:17:51
Size: 3597
Editor: 164
Comment: Bytes can be stripped.
Deletions are marked like this. Additions are marked like this.
Line 19: Line 19:
 * {{{bytes}}} "lacks" many methods present in {{{str}}}: {{{strip()}}}, {{{lstrip()}}}, {{{rstrip()}}}, {{{lower()}}}, {{{upper()}}}, {{{splitlines()}}}, etc.  * {{{bytes}}} "lacks" many methods present in {{{str}}}: {{{lower()}}}, {{{upper()}}}, {{{splitlines()}}}, etc.

Strings in Python 2.x

Python 2.x has two types that can be used to store a string:

  • str: raw byte data; each element represents a single byte, which can range in value from 0-255. This is the default type for string literals, which is widely considered to be a mistake due to the encoding problems it raises (for more information, see Joel Spolsky's [http://www.joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets]).

  • unicode: a string in which each element represents a unicode character.

Both classes have the same methods and are very similar.

Strings in Python 3000

Python 3000 uses two very different types:

  • bytes: similar, but not identical, to Python 2.x's str type. It is intended to represent raw byte data. For more information on this type, please consult [http://www.python.org/dev/peps/pep-0358/ PEP 358].

  • str: a unicode character string which is exactly the same type as Python 2.x's unicode type.

Differences Between Python 2.x's "str" and Python 3000's "bytes"

Differences between Python 2.x's str and Python 3000's bytesinclude:

  • str is immutable, whereas bytes is mutable.

  • bytes "lacks" many methods present in str: lower(), upper(), splitlines(), etc.

  • indexing an item of a bytes object yields an integer, not a bytes object, whereas indexing an item of a Python 2.x str yields another str instance.

Choosing Between "bytes" and "str" in Python 3000

When you migrate from Python 2.x to Python 3000, you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

  • a network socket manipulates bytes
  • a text parser manipulates characters (use lower, strip, etc. methods)

Iterating over "bytes"

It's important to note that the bytes iterator generates integers and not characters:

>>> for item in b'abc':
...   print item
97
98
99

Comparing "bytes"

Comparing one bytes object to another works as expected:

>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False

However, it is important to note that the bytes type is completely distinct from the str type in Python 3000, and comparisons between them do not work:

>>> b'xyz' == 'xyz'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare bytes and str

This should make clearly evident some incomplete transitions. But you also means that you really cant mix then very well:

>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes

As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120

This behaviour is different than Python 2.x:

# In Python 2.x
>>> "xyz"[0]
'x'
>>> type("xyz"), type("xyz"[0])
(<type 'str'>, <type 'str'>)

Hashing "bytes"

bytes is mutable, and as a result, it's not hashable. Among other things, this means that bytes objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:

  • use buffer(value)

Other solutions include:

  • create an immutable frozenbytes type

  • avoid using hash

BytesStr (last edited 2019-10-19 22:00:22 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.