888
Comment:
|
891
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Paradoxically, a {{{UnicodeDecodeError}}} happens when _encoding_. The cause of it seems to be the coding-specific {{{encode()}}} functions that normally expect a parameter of type {{{unicode}}}. It appears that on seeing an {{{str}}} parameter, the {{{encode()}}} functions "up-convert" it into {{{unicode}}} before converting to their own coding. It also appears that such "up-conversion" makes no assumption of {{{str}}} parameter's coding, choosing a default {{{ascii}}} decoder. Hence a decoding failure inside an encoder. | Paradoxically, a {{{UnicodeDecodeError}}} may happen when _encoding_. The cause of it seems to be the coding-specific {{{encode()}}} functions that normally expect a parameter of type {{{unicode}}}. It appears that on seeing an {{{str}}} parameter, the {{{encode()}}} functions "up-convert" it into {{{unicode}}} before converting to their own coding. It also appears that such "up-conversion" makes no assumption of {{{str}}} parameter's coding, choosing a default {{{ascii}}} decoder. Hence a decoding failure inside an encoder. |
Paradoxically, a UnicodeDecodeError may happen when _encoding_. The cause of it seems to be the coding-specific encode() functions that normally expect a parameter of type unicode. It appears that on seeing an str parameter, the encode() functions "up-convert" it into unicode before converting to their own coding. It also appears that such "up-conversion" makes no assumption of str parameter's coding, choosing a default ascii decoder. Hence a decoding failure inside an encoder.
1 >>> u"a".encode("utf-8")
2 'a'
3 >>> u"\u0411".encode("utf-8")
4 '\xd0\x91'
5 >>> "a".encode("utf-8")
6 'a'
7 >>> "\xd0\x91".encode("utf-8")
8 Traceback (most recent call last):
9 File "<stdin>", line 1, in <module>
10 UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)