Differences between revisions 4 and 5
Revision 4 as of 2007-07-13 04:04:44
Size: 829
Editor: cscfpc15
Comment:
Revision 5 as of 2008-11-15 14:00:50
Size: 829
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Python users who are new to Unicode sometimes are attracted by default encoding returned by sys.getdefaultencoding(). The first thing you should know about default encoding is that you don't need to care about it. Its value should be 'ascii' and it is used when converting byte strings ["StrIsNotAString"] to unicode strings. As in this example: Python users who are new to Unicode sometimes are attracted by default encoding returned by sys.getdefaultencoding(). The first thing you should know about default encoding is that you don't need to care about it. Its value should be 'ascii' and it is used when converting byte strings [[StrIsNotAString]] to unicode strings. As in this example:
Line 6: Line 6:
When you concatenate byte string "abc" with unicode string u"bcd" Python will first convert "abc" into u"abc" by calling "abc".decode(sys.getdefaultencoding()). If you put non-ascii characters into byte string then .decode(sys.getdefaultencoding()) method will fail with {{{UnicodeDecodeError}}}, therefore byte strings should not contain non-ascii characters. In ["Python3.0"] sys.getdefaultencoding will be removed. When you concatenate byte string "abc" with unicode string u"bcd" Python will first convert "abc" into u"abc" by calling "abc".decode(sys.getdefaultencoding()). If you put non-ascii characters into byte string then .decode(sys.getdefaultencoding()) method will fail with {{{UnicodeDecodeError}}}, therefore byte strings should not contain non-ascii characters. In [[Python3.0]] sys.getdefaultencoding will be removed.

Python users who are new to Unicode sometimes are attracted by default encoding returned by sys.getdefaultencoding(). The first thing you should know about default encoding is that you don't need to care about it. Its value should be 'ascii' and it is used when converting byte strings StrIsNotAString to unicode strings. As in this example:

   1 a = "abc" + u"bcd"

When you concatenate byte string "abc" with unicode string u"bcd" Python will first convert "abc" into u"abc" by calling "abc".decode(sys.getdefaultencoding()). If you put non-ascii characters into byte string then .decode(sys.getdefaultencoding()) method will fail with UnicodeDecodeError, therefore byte strings should not contain non-ascii characters. In Python3.0 sys.getdefaultencoding will be removed.


CategoryUnicode

DefaultEncoding (last edited 2008-11-15 14:00:50 by localhost)

Unable to edit the page? See the FrontPage for instructions.