Differences between revisions 7 and 8
Revision 7 as of 2007-08-23 18:59:34
Size: 3217
Editor: 72
Comment: grammar fixes, formatting changes, clarifying examples, etc.
Revision 8 as of 2007-08-23 19:17:32
Size: 3228
Editor: 72
Comment: Removed references to the "str8" data type because it will be removed.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== Python 2.x == == Strings in Python 2.x ==
Line 9: Line 9:
== Python 3000 == == Strings in Python 3000 ==
Line 22: Line 22:
== Choosing Between "bytes" and "str" == == Choosing Between "bytes" and "str" in Python 3000 ==
Line 84: Line 84:
 * use {{{str8(value)}}}

Strings in Python 2.x

Python 2.x has two types that can be used to store a string:

  • str: raw byte data; each element represents a single byte, which can range in value from 0-255. This is the default type for string literals, which is widely considered to be a mistake due to the encoding problems it raises (for more information, see Joel Spolsky's [http://www.joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets]).

  • unicode: a string in which each element represents a unicode character.

Both classes have the same methods and are very similar.

Strings in Python 3000

Python 3000 uses two very different types:

  • bytes: similar, but not identical, to Python 2.x's str type. It is intended to represent raw byte data. For more information on this type, please consult [http://www.python.org/dev/peps/pep-0358/ PEP 358].

  • str: a unicode character string which is exactly the same type as Python 2.x's unicode type.

Differences Between Python 2.x's "str" and Python 3000's "bytes"

Differences between Python 2.x's str and Python 3000's bytesinclude:

  • str is immutable, whereas bytes is mutable.

  • bytes "lacks" many methods present in str: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc.

  • indexing an item of a bytes object yields an integer, not a bytes object, whereas indexing an item of a Python 2.x str yields another str instance.

Choosing Between "bytes" and "str" in Python 3000

When you migrate from Python 2.x to Python 3000, you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

  • a network socket manipulates bytes
  • a text parser manipulates characters (use lower, strip, etc. methods)

Iterating over "bytes"

It's important to note that the bytes iterator generates integers and not characters:

>>> for item in b'abc':
...   print item
97
98
99

Comparing "bytes"

Comparing one bytes object to another works as expected:

>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False

However, it is important to note that the bytes type is completely distinct from the str type in Python 3000, and comparisons between them do not work:

>>> b'xyz' == 'xyz'
False

As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120

This behaviour is different than Python 2.x:

# In Python 2.x
>>> "xyz"[0]
'x'
>>> type("xyz"), type("xyz"[0])
(<type 'str'>, <type 'str'>)

Hashing "bytes"

bytes is mutable, and as a result, it's not hashable. Among other things, this means that bytes objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:

  • use buffer(value)

Other solutions include:

  • create an immutable frozenbytes type

  • avoid using hash

BytesStr (last edited 2019-10-19 22:00:22 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.