Differences between revisions 1 and 12 (spanning 11 versions)
Revision 1 as of 2007-08-10 23:15:06
Size: 877
Editor: neu67-4-88-160-66-91
Comment:
Revision 12 as of 2008-11-15 14:00:02
Size: 3600
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Python 2.x has two types to store a string:
 * str: bytes string procced as character string which is a mistake
 * unicode: character string (unicode)
== Strings in Python 2.x ==
Line 5: Line 3:
Both classes has same methods and are very similar. Python 2.x has two types that can be used to store a string:
 * {{{str}}}: raw byte data; each element represents a single byte, which can range in value from 0-255. This is the default type for string literals, which is widely considered to be a mistake due to the encoding problems it raises (for more information, see Joel Spolsky's [[http://www.joelonsoftware.com/articles/Unicode.html|The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets]]).
 * {{{unicode}}}: a string in which each element represents a unicode character.
Line 7: Line 7:
Python 3000 use two very different types:
 * bytes: bytes string which can be see as a list of [0..255] integers
 * str: character string (unicode), exactly the same type than Python 2.x "unicode"
Both classes have the same methods and are very similar.
Line 11: Line 9:
Differences between Python 2.x "str" and Python 3000 "bytes":
 * str is immutable, bytes is mutable
 * bytes "lacks" many methods: strip, lstrip, rstrip, lower, upper, etc.
== Strings in Python 3000 ==
Line 15: Line 11:
When you migration from Python 2.x to Python 3000, you have to ask youself: do I manipulate characters or integers (bytes)? A is a character and 65 is an integer. Examples:
 * a network socket manipulate bytes
Python 3000 uses two very different types:
 * {{{bytes}}}: similar, but not identical, to Python 2.x's {{{str}}} type. It is intended to represent raw byte data. For more information on this type, please consult [[http://www.python.org/dev/peps/pep-0358/|PEP 358]].
 * {{{str}}}: a unicode character string which is exactly the same type as Python 2.x's {{{unicode}}} type.

== Differences Between Python 2.x's "str" and Python 3000's "bytes" ==

Differences between Python 2.x's {{{str}}} and Python 3000's {{{bytes}}}include:
 * {{{str}}} is immutable, whereas {{{bytes}}} is mutable.
 * {{{bytes}}} "lacks" many methods present in {{{str}}}: {{{lower()}}}, {{{upper()}}}, {{{splitlines()}}}, etc.
 * indexing an item of a {{{bytes}}} object yields an ''integer'', not a bytes object, whereas indexing an item of a Python 2.x {{{str}}} yields another {{{str}}} instance.

== Choosing Between "bytes" and "str" in Python 3000 ==

When you migrate from Python 2.x to Python 3000, you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

 * a network socket manipulates bytes
Line 18: Line 28:

== Iterating over "bytes" ==

It's important to note that the {{{bytes}}} iterator generates integers and not characters:

{{{
>>> for item in b'abc':
... print item
97
98
99
}}}

== Comparing "bytes" ==

Comparing one {{{bytes}}} object to another works as expected:

{{{
>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False
}}}

However, it is important to note that the {{{bytes}}} type is completely distinct from the {{{str}}} type in Python 3000, and comparisons between them do ''not'' work:

{{{
>>> b'xyz' == 'xyz'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare bytes and str
}}}

This should make clearly evident some incomplete transitions. But it also means that you really cant mix then very well:

{{{
>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes
}}}


As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

{{{
>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120
}}}

This behaviour is different than Python 2.x:

{{{
# In Python 2.x
>>> "xyz"[0]
'x'
>>> type("xyz"), type("xyz"[0])
(<type 'str'>, <type 'str'>)
}}}

=== Hashing "bytes" ===

{{{bytes}}} is mutable, and as a result, it's not hashable. Among other things, this means that {{{bytes}}} objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:
 * use {{{buffer(value)}}}

Other solutions include:
 * create an immutable {{{frozenbytes}}} type
 * avoid using hash

Strings in Python 2.x

Python 2.x has two types that can be used to store a string:

Both classes have the same methods and are very similar.

Strings in Python 3000

Python 3000 uses two very different types:

  • bytes: similar, but not identical, to Python 2.x's str type. It is intended to represent raw byte data. For more information on this type, please consult PEP 358.

  • str: a unicode character string which is exactly the same type as Python 2.x's unicode type.

Differences Between Python 2.x's "str" and Python 3000's "bytes"

Differences between Python 2.x's str and Python 3000's bytesinclude:

  • str is immutable, whereas bytes is mutable.

  • bytes "lacks" many methods present in str: lower(), upper(), splitlines(), etc.

  • indexing an item of a bytes object yields an integer, not a bytes object, whereas indexing an item of a Python 2.x str yields another str instance.

Choosing Between "bytes" and "str" in Python 3000

When you migrate from Python 2.x to Python 3000, you have to ask yourself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

  • a network socket manipulates bytes
  • a text parser manipulates characters (use lower, strip, etc. methods)

Iterating over "bytes"

It's important to note that the bytes iterator generates integers and not characters:

>>> for item in b'abc':
...   print item
97
98
99

Comparing "bytes"

Comparing one bytes object to another works as expected:

>>> b'xyz' == b'xyz'
True
>>> b'xyz' == b'abc'
False

However, it is important to note that the bytes type is completely distinct from the str type in Python 3000, and comparisons between them do not work:

>>> b'xyz' == 'xyz'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare bytes and str

This should make clearly evident some incomplete transitions. But it also means that you really cant mix then very well:

>>> L = ["1", b"1"]
>>> "1" in L
True
>>> "2" in L
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: can't compare str and bytes

As mentioned earlier, getting an item of a bytes returns an integer, not a bytes object:

>>> b'xyz'[0] == b'x'
False
>>> b'xyz'[0]
120

This behaviour is different than Python 2.x:

# In Python 2.x
>>> "xyz"[0]
'x'
>>> type("xyz"), type("xyz"[0])
(<type 'str'>, <type 'str'>)

Hashing "bytes"

bytes is mutable, and as a result, it's not hashable. Among other things, this means that bytes objects can't be used as keys in dictionaries.

Hacks and workarounds for this include:

  • use buffer(value)

Other solutions include:

  • create an immutable frozenbytes type

  • avoid using hash

BytesStr (last edited 2019-10-19 22:00:22 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.