Python supports several [[Unicode]] encodings. Two of the most common encodings are: * utf-8 * utf-16 It is critical to note that '''a unicode encoding is not Python unicode!''' That is, there is a critical difference between a Python "byte string" (or "normal string" or "regular string") that stores utf-8 / utf-16 encoded unicode, and a Python unicode string. * {{{u"foo"}}} -- this is a Python unicode string * {{{"foo"}}} -- this is a Python bytes string -- it is 3 bytes long * {{{u"foo".encode('utf-8')}}} -- this ''starts'' as a Python unicode string, and then encodes it into utf-8, stored in a normal Python bytes string. When you see a "u" in front of quotation marks, that means "this is a Python unicode string." You should not ask yourself: "How is it represented?" Don't even think about that. Just know: "This is pure, platonic, Unicode. Python understands the mystery of the encoding of the character." How many bytes is {{{"foo"}}}? '''3.''' How many bytes is {{{u"foo"}}}? You do not know, you do not wonder. Only the angels in heaven know how many bytes it takes to represent platonic characters. But if you're writing to a file, then you need to turn that pure platonic Unicode character into something material and chunked into bytes. Now you encode it into bytes. {{{ #!python pure_platonic_string = u"blah blah blah" # This is a Unicode string byte_string = pure_platonic_string.encode("utf-8") # Now we make it utf-8 f.write(byte_string) # We write it to a file }}} == See Also == CategoryUnicode * [[http://en.wikipedia.org/wiki/UTF-8|Wikipedia:UTF-8]] -- for general information * [[http://en.wikipedia.org/wiki/UTF-16|Wikipedia:UTF-16]] -- for general information