Differences between revisions 1 and 2
Revision 1 as of 2005-06-10 22:09:46
Size: 2538
Editor: FredDrake
Comment:
Revision 2 as of 2005-06-10 22:15:12
Size: 2754
Editor: FredDrake
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
sufficient for many applications. sufficient for many applications. They should generally be applied to Unicode text that
will later be encoded appropriately, or to already-encoded text using an ASCII-superset
encoding, since most characters are left alone.
Line 55: Line 57:
Like the {{{escape()}}} function, {{{unescape()}}} can be provided with an additional mapping
of replacements that should be performed. This can be used to add support for the additional
predefined entities:
Like the {{{escape()}}} and {{{quoteattr()}}} functions, {{{unescape()}}} can be provided with
an additional mapping of replacements that should be performed. This can be used to add
support for the additional predefined entities:

Escaping XML

The Python standard library contains a couple of simple functions for escaping strings of text as XML character data. These routines are not actually very powerful, but are sufficient for many applications. They should generally be applied to Unicode text that will later be encoded appropriately, or to already-encoded text using an ASCII-superset encoding, since most characters are left alone.

The xml.sax.saxutils module contains the functions escape() and quoteattr(). The escape() function is used to convert the <, &, and > characters to the corresponding entity references:

   1 >>> from xml.sax.saxutils import escape
   2 >>>
   3 >>> escape("< & >")
   4 '&lt; &amp; &gt;'

This function does not generate either the &apos; or &quot; entity references; these are not needed in parsed character data in an XML document. They may be needed in character data in attribute values, however. For attribute values, quoteattr() function provides a more useful service than escape(). quoteattr() will determine whether single or double quotation marks are more appropriate for an attribute value and quote the value appropriately; values which include both kinds of quotation marks in the value cause &quot; to be used as needed. The return value includes the quotation marks which are needed to ensure the value is properly quoted:

   1 >>> from xml.sax.saxutils import quoteattr
   2 >>>
   3 >>> quoteattr("some value ' containing an apostrophe")
   4 '"some value \' containing an apostrophe"'
   5 >>> quoteattr('some value containing " a double-quote')
   6 '\'some value containing " a double-quote\''
   7 >>> quoteattr('value containing " a double-quote \' and an apostrophe')
   8 '"value containing &quot; a double-quote \' and an apostrophe"'

Unescaping XML

The xml.sax.saxutils module provides an unescape() function as well. This function converts the &amp;, &gt;, and &lt; entity references back to the corresponding characters:

   1 >>> from xml.sax.saxutils import unescape
   2 >>>
   3 >>> unescape("&lt; &amp; &gt;")
   4 '< & >'

Note that the predefined entities &apos; and &quot; are not supported by default. Like the escape() and quoteattr() functions, unescape() can be provided with an additional mapping of replacements that should be performed. This can be used to add support for the additional predefined entities:

   1 >>> unescape("&apos; &quot;", {"&apos;": "'", "&quot;": '"'})
   2 '\' "'

This can also be used to perform replacements for longer strings.

See Also

EscapingXml (last edited 2008-11-15 14:00:44 by localhost)

Unable to edit the page? See the FrontPage for instructions.