Differences between revisions 4 and 5
Revision 4 as of 2007-08-11 00:16:39
Size: 2108
Editor: neu67-4-88-160-66-91
Comment:
Revision 5 as of 2007-08-11 02:59:57
Size: 2132
Editor: neu67-4-88-160-66-91
Comment:
Deletions are marked like this. Additions are marked like this.
Line 19: Line 19:
 * bytes "lacks" many methods: strip, lstrip, rstrip, lower, upper, etc.  * bytes "lacks" many methods: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc.

Python 2.x

Python 2.x has two types to store a string:

  • str: bytes string procced as character string which is a mistake
  • unicode: character string (unicode)

Both classes has same methods and are very similar.

Python 3000

Python 3000 use two very different types:

  • bytes: bytes string which can be see as a list of [0..255] integers
  • str: character string (unicode), exactly the same type than Python 2.x "unicode"

old str and new bytes

Differences between Python 2.x "str" and Python 3000 "bytes":

  • str is immutable, bytes is mutable
  • bytes "lacks" many methods: strip(), lstrip(), rstrip(), lower(), upper(), splitlines(), etc.
  • getting an item of a bytes gives an integer and not a bytes object (b'xyz'[0] is the integer 120) where old str gives also str type

choose between bytes and str

When you migration from Python 2.x to Python 3000, you have to ask youself: do I manipulate characters or bytes (integers)? "A" is a character and 65 is an integer. Examples:

  • a network socket manipulate bytes
  • a text parser manipulates characters (use lower, strip, etc. methods)

bytes and loops (for)

The following code will display 97, 98, 99 since the bytes iterator generates integer and not character!

 for item in b'abc':
    print item

compare bytes

>>> b'xyz' == b'xyz'    # case 1
True
>>> b'xyz' == 'xyz'     # case 2
False
>>> b'xyz'[0] == b'x'   # case 3
False
>>> b'xyz'[0]
120

Case 2 shows that bytes and unicode are never equals since they are different types. Case 3 shows an important point: getting an item of a bytes returns an integer (120) and not a bytes (len=1). This behaviour is different than Python 2.x:

# In Python 2.x
>>> "xyz"[0]
'x'
>>> type("xyz"), type("xyz"[0])
(<type 'str'>, <type 'str'>)

open issues

hash(bytes)

bytes is mutable and so it's not hashable. Hacks/Workaorounds:

  • use buffer(value)
  • use str8(value)

Other solutions:

  • create frozenbytes type
  • avoid using hash

Hash is used when bytes is a dictionary key.

BytesStr (last edited 2019-10-19 22:00:22 by FrancesHocutt)

Unable to edit the page? See the FrontPage for instructions.