Differences between revisions 11 and 12
Revision 11 as of 2007-07-13 04:07:01
Size: 1480
Editor: cscfpc15
Comment:
Revision 12 as of 2008-11-15 13:59:46
Size: 1484
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
["Python"] was started by GuidoVanRossum in December of 1989, Unicode was started in 1991. It is hard to expect that Python developers could introduce ["Unicode"] strings since early versions. Trying to "reinvent" Unicode was not an option either since Unicode is a really huge work. Python developers simply introduced strings as they existed in C and many other languages of that time. In the C language a string is a sequence of bytes, and so is Python `str` type. [[Python]] was started by GuidoVanRossum in December of 1989, Unicode was started in 1991. It is hard to expect that Python developers could introduce [[Unicode]] strings since early versions. Trying to "reinvent" Unicode was not an option either since Unicode is a really huge work. Python developers simply introduced strings as they existed in C and many other languages of that time. In the C language a string is a sequence of bytes, and so is Python `str` type.
Line 11: Line 11:
In ["Python2.4"], {{{str}}} is a string of bytes, and {{{unicode}}} is internally represented unicode. {{{basestring}}} is a parent class for {{{unicode}}} and {{{str}}}. In [[Python2.4]], {{{str}}} is a string of bytes, and {{{unicode}}} is internally represented unicode. {{{basestring}}} is a parent class for {{{unicode}}} and {{{str}}}.
Line 13: Line 13:
["Python3.0"] will clear up this confusion by getting rid of byte strings and introducing the new type `bytes`. The strings you surround with quotation marks will all be unicode strings, automatically. [[#b 1]] [[Python3.0]] will clear up this confusion by getting rid of byte strings and introducing the new type `bytes`. The strings you surround with quotation marks will all be unicode strings, automatically. [[[#b|1]]]
Line 17: Line 17:
 * [[Anchor(b)]][1] [http://python.org/peps/pep-0358.html PEP 358 -- The "bytes" Object]  * <<Anchor(b)>>[1] [[http://python.org/peps/pep-0358.html|PEP 358 -- The "bytes" Object]]

History of Python Strings

Python was started by GuidoVanRossum in December of 1989, Unicode was started in 1991. It is hard to expect that Python developers could introduce Unicode strings since early versions. Trying to "reinvent" Unicode was not an option either since Unicode is a really huge work. Python developers simply introduced strings as they existed in C and many other languages of that time. In the C language a string is a sequence of bytes, and so is Python str type.

Python Strings Today

There is no consensus how to call these strings now, in the age of Unicode. Some people call them byte strings, some call them generic strings, and others call them 8-bit strings, but what is more confusing for a unicode newbie is that a lot of people simply call them strings most of the time.

If you want to understand python unicode you have to understand the difference between byte strings and unicode strings.

In Python2.4, str is a string of bytes, and unicode is internally represented unicode. basestring is a parent class for unicode and str.

Python3.0 will clear up this confusion by getting rid of byte strings and introducing the new type bytes. The strings you surround with quotation marks will all be unicode strings, automatically. 1]

See Also


CategoryUnicode

StrIsNotAString (last edited 2008-11-15 13:59:46 by localhost)

Unable to edit the page? See the FrontPage for instructions.