Differences between revisions 1 and 2
Revision 1 as of 2007-07-13 02:50:11
Size: 846
Editor: cscfpc15
Comment:
Revision 2 as of 2007-07-13 02:51:14
Size: 848
Editor: cscfpc15
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Paradoxically, a {{{UnicodeDecodeError}}} happens when _encoding_. The cause of it seems to be the encoding-specific {{{encode()}}} functions that normally expect a parameter of type {{{unicode}}}. It appears that on seeing an {{{str}}} parameter, the {{{encode()}}} functions "up-convert" it into {{{unicode}}} before applying its own encoding. It also appears that the "up-conversion" makes no assumption of {{{str}}} parameter's encoding, assuming it to be {{{ascii}}}. Hence a decoding failure inside an encoder. Paradoxically, a {{{UnicodeDecodeError}}} happens when _encoding_. The cause of it seems to be the encoding-specific {{{encode()}}} functions that normally expect a parameter of type {{{unicode}}}. It appears that on seeing an {{{str}}} parameter, the {{{encode()}}} functions "up-convert" it into {{{unicode}}} before applying their own encoding. It also appears that the "up-conversion" makes no assumption of {{{str}}} parameter's encoding, assuming it to be {{{ascii}}}. Hence a decoding failure inside an encoder.

Paradoxically, a UnicodeDecodeError happens when _encoding_. The cause of it seems to be the encoding-specific encode() functions that normally expect a parameter of type unicode. It appears that on seeing an str parameter, the encode() functions "up-convert" it into unicode before applying their own encoding. It also appears that the "up-conversion" makes no assumption of str parameter's encoding, assuming it to be ascii. Hence a decoding failure inside an encoder.

   1 >>> "a".encode("utf-8")
   2 'a'
   3 >>> u"\u0411".encode("utf-8")
   4 '\xd0\x91'
   5 >>> "\xd0\x91".encode("utf-8")
   6 Traceback (most recent call last):
   7   File "<stdin>", line 1, in <module>
   8 UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)


CategoryUnicode

UnicodeDecodeError (last edited 2008-11-15 13:59:56 by localhost)

Unable to edit the page? See the FrontPage for instructions.