Diff for "BytesStr" - Python Wiki

Differences between revisions 4 and 5

Python 2.x

Python 2.x has two types to store a string:

str: bytes string procced as character string which is a mistake
unicode: character string (unicode)

Both classes has same methods and are very similar.

Python 3000

Python 3000 use two very different types:

bytes: bytes string which can be see as a list of [0..255] integers
str: character string (unicode), exactly the same type than Python 2.x "unicode"

old str and new bytes

Differences between Python 2.x "str" and Python 3000 "bytes":

str is immutable, bytes is mutable
bytes "lacks" many methods: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc.
getting an item of a bytes gives an integer and not a bytes object (b'xyz'[0] is the integer 120) where old str gives also str type

choose between bytes and str

When you migration from Python 2.x to Python 3000, you have to ask youself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

a network socket manipulate bytes
a text parser manipulates characters (use lower, strip, etc. methods)

bytes and loops (for)

The following code will display 97, 98, 99 since the bytes iterator generates integer and not character!

 for item in b'abc':
    print item

compare bytes

>>> b'xyz' == b'xyz'    # case 1
True
>>> b'xyz' == 'xyz'     # case 2
False
>>> b'xyz'[0] == b'x'   # case 3
False
>>> b'xyz'[0]
120

Case 2 shows that bytes and unicode are never equals since they are different types. Case 3 shows an important point: getting an item of a bytes returns an integer (120) and not a bytes (len=1). This behaviour is different than Python 2.x:

# In Python 2.x
>>> "xyz"[0]
'x'
>>> type("xyz"), type("xyz"[0])
(<type 'str'>, <type 'str'>)

open issues

hash(bytes)

bytes is mutable and so it's not hashable. Hacks/Workaorounds:

use buffer(value)
use str8(value)

-  ⇤ ← Revision 4 as of 2007-08-11 00:16:39 → 
  Size: 2108
  Editor: neu67-4-88-160-66-91
  Comment:
+   ← Revision 5 as of 2007-08-11 02:59:57 → ⇥
  Size: 2132
  Editor: neu67-4-88-160-66-91
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 19:
- * bytes "lacks" many methods: strip, lstrip, rstrip, lower, upper, etc.
+ * bytes "lacks" many methods: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc.

Page

User