Differences between revisions 7 and 8
Revision 7 as of 2006-03-01 20:52:28
Size: 1166
Editor: outgw
Comment: oops
Revision 8 as of 2006-03-24 09:09:20
Size: 1476
Editor: aaron
Comment: string of bytes; isinstance basestring
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= The str type is not a string of characters = == History of Python Strings ==
Line 3: Line 3:
Python was started by Guido van Rossum in December of 1989, Unicode was  started in 1991. It is hard to expect that Python developers could introduce Unicode strings since early versions. Trying to "reinvent" Unicode was not an option either since Unicode is a really huge work. Python developers simply introduced strings as they existed in C and many other languages of that time. In the C language a string is a sequence of bytes, and so is Python `str` type. [[Python]] was started by GuidoVanRossum in December of 1989, Unicode was started in 1991. It is hard to expect that Python developers could introduce [[Unicode]] strings since early versions. Trying to "reinvent" Unicode was not an option either since Unicode is a really huge work. Python developers simply introduced strings as they existed in C and many other languages of that time. In the C language a string is a sequence of bytes, and so is Python `str` type.
Line 5: Line 5:
There is no consensus how to call these strings now, in the age of Unicode. Some people call them byte strings, some call them generic strings, and others call them 8-bit strings, but what is more confusing for a unicode newbie is that a lot of people simply call them strings most of the time. If you want to understand python unicode you have to understand the difference between byte strings and unicode strings. ["Python3.0"] will clear up this confusion by getting rid of byte strings and introducing the new type `bytes`. [[#b 1]] == Python Strings Today ==
Line 7: Line 7:
=== References === There is no consensus how to call these strings now, in the age of Unicode. Some people call them byte strings, some call them generic strings, and others call them 8-bit strings, but what is more confusing for a unicode newbie is that a lot of people simply call them strings most of the time.
Line 9: Line 9:
 [[Anchor(b)]][1] [http://python.org/peps/pep-0358.html PEP 358 -- The "bytes" Object] If you want to understand python unicode you have to understand the difference between byte strings and unicode strings.

In ["Python2.4"], {{{str}}} is a string of bytes, and {{{unicode}}} is internally represented unicode. {{{basestring}}} is a parent class for {{{unicode}}} and {{{str}}}.

["Python3.0"] will clear up this confusion by getting rid of byte strings and introducing the new type `bytes`. The strings you surround with quotation marks will all be unicode strings, automatically. [[#b 1]]

=== See Also ===

CategoryUnicode

 * [[Anchor(b)]][1] [http://python.org/peps/pep-0358.html PEP 358 -- The "bytes" Object]

History of Python Strings

Python was started by GuidoVanRossum in December of 1989, Unicode was started in 1991. It is hard to expect that Python developers could introduce Unicode strings since early versions. Trying to "reinvent" Unicode was not an option either since Unicode is a really huge work. Python developers simply introduced strings as they existed in C and many other languages of that time. In the C language a string is a sequence of bytes, and so is Python str type.

Python Strings Today

There is no consensus how to call these strings now, in the age of Unicode. Some people call them byte strings, some call them generic strings, and others call them 8-bit strings, but what is more confusing for a unicode newbie is that a lot of people simply call them strings most of the time.

If you want to understand python unicode you have to understand the difference between byte strings and unicode strings.

In ["Python2.4"], str is a string of bytes, and unicode is internally represented unicode. basestring is a parent class for unicode and str.

["Python3.0"] will clear up this confusion by getting rid of byte strings and introducing the new type bytes. The strings you surround with quotation marks will all be unicode strings, automatically. #b 1

See Also

CategoryUnicode

StrIsNotAString (last edited 2008-11-15 13:59:46 by localhost)

Unable to edit the page? See the FrontPage for instructions.