2108
Comment:
|
2132
|
Deletions are marked like this. | Additions are marked like this. |
Line 19: | Line 19: |
* bytes "lacks" many methods: strip, lstrip, rstrip, lower, upper, etc. | * bytes "lacks" many methods: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc. |
Python 2.x
Python 2.x has two types to store a string:
- str: bytes string procced as character string which is a mistake
- unicode: character string (unicode)
Both classes has same methods and are very similar.
Python 3000
Python 3000 use two very different types:
- bytes: bytes string which can be see as a list of [0..255] integers
- str: character string (unicode), exactly the same type than Python 2.x "unicode"
old str and new bytes
Differences between Python 2.x "str" and Python 3000 "bytes":
- str is immutable, bytes is mutable
- bytes "lacks" many methods: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc.
- getting an item of a bytes gives an integer and not a bytes object (b'xyz'[0] is the integer 120) where old str gives also str type
choose between bytes and str
When you migration from Python 2.x to Python 3000, you have to ask youself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:
- a network socket manipulate bytes
- a text parser manipulates characters (use lower, strip, etc. methods)
bytes and loops (for)
The following code will display 97, 98, 99 since the bytes iterator generates integer and not character!
for item in b'abc': print item
compare bytes
>>> b'xyz' == b'xyz' # case 1 True >>> b'xyz' == 'xyz' # case 2 False >>> b'xyz'[0] == b'x' # case 3 False >>> b'xyz'[0] 120
Case 2 shows that bytes and unicode are never equals since they are different types. Case 3 shows an important point: getting an item of a bytes returns an integer (120) and not a bytes (len=1). This behaviour is different than Python 2.x:
# In Python 2.x >>> "xyz"[0] 'x' >>> type("xyz"), type("xyz"[0]) (<type 'str'>, <type 'str'>)
open issues
hash(bytes)
bytes is mutable and so it's not hashable. Hacks/Workaorounds:
- use buffer(value)
- use str8(value)
Other solutions:
- create frozenbytes type
- avoid using hash
Hash is used when bytes is a dictionary key.