Please note: This wiki is currently running in test mode after an attack on January 5 2013. All passwords were reset, so you will have to use the password recovery function to get a new password. To edit wiki pages, please log in first. See the wiki attack description page for more details. If you find problems, please report them to the pydotorg-www mailing list.

Your search query "linkto:"UnicodeEncoding"" didn't return any results. Please change some terms and refer to HelpOnSearching for more information.
(!) Consider performing a full-text search with your search terms.

Clear message

Python supports several Unicode encodings.

Two of the most common encodings are:

It is critical to note that a unicode encoding is not Python unicode!

That is, there is a critical difference between a Python "byte string" (or "normal string" or "regular string") that stores utf-8 / utf-16 encoded unicode, and a Python unicode string.

When you see a "u" in front of quotation marks, that means "this is a Python unicode string." You should not ask yourself: "How is it represented?" Don't even think about that. Just know: "This is pure, platonic, Unicode. Python understands the mystery of the encoding of the character."

How many bytes is "foo"? 3. How many bytes is u"foo"? You do not know, you do not wonder. Only the angels in heaven know how many bytes it takes to represent platonic characters.

But if you're writing to a file, then you need to turn that pure platonic Unicode character into something material and chunked into bytes. Now you encode it into bytes.

   1 pure_platonic_string = u"blah blah blah"  # This is a Unicode string
   2 byte_string = pure_platonic_string.encode("utf-8")  # Now we make it utf-8
   3 f.write(byte_string)  # We write it to a file

See Also

CategoryUnicode