Differences between revisions 2 and 3
Revision 2 as of 2005-06-10 22:15:12
Size: 2754
Editor: FredDrake
Comment:
Revision 3 as of 2005-06-10 22:23:22
Size: 3309
Editor: FredDrake
Comment:
Deletions are marked like this. Additions are marked like this.
Line 42: Line 42:
Both of these functions can be provided with a mapping of additional replacements that should be
made; the same mapping can generally be used for both. This can be used to add additional entities
specific to the DTD of the document being generated, or to cause particular characters to be encoded
as character references:

{{{
#!python
>>> escape("abc", {"b": "b"})
'abc'
>>> escape("My product, PyThingaMaJiggie, is really cool.",
... {"PyThingaMaJiggie": "&productName;"})
'My product, &productName;, is really cool.'
}}}

Escaping XML

The Python standard library contains a couple of simple functions for escaping strings of text as XML character data. These routines are not actually very powerful, but are sufficient for many applications. They should generally be applied to Unicode text that will later be encoded appropriately, or to already-encoded text using an ASCII-superset encoding, since most characters are left alone.

The xml.sax.saxutils module contains the functions escape() and quoteattr(). The escape() function is used to convert the <, &, and > characters to the corresponding entity references:

   1 >>> from xml.sax.saxutils import escape
   2 >>>
   3 >>> escape("< & >")
   4 '&lt; &amp; &gt;'

This function does not generate either the &apos; or &quot; entity references; these are not needed in parsed character data in an XML document. They may be needed in character data in attribute values, however. For attribute values, quoteattr() function provides a more useful service than escape(). quoteattr() will determine whether single or double quotation marks are more appropriate for an attribute value and quote the value appropriately; values which include both kinds of quotation marks in the value cause &quot; to be used as needed. The return value includes the quotation marks which are needed to ensure the value is properly quoted:

   1 >>> from xml.sax.saxutils import quoteattr
   2 >>>
   3 >>> quoteattr("some value ' containing an apostrophe")
   4 '"some value \' containing an apostrophe"'
   5 >>> quoteattr('some value containing " a double-quote')
   6 '\'some value containing " a double-quote\''
   7 >>> quoteattr('value containing " a double-quote \' and an apostrophe')
   8 '"value containing &quot; a double-quote \' and an apostrophe"'

Both of these functions can be provided with a mapping of additional replacements that should be made; the same mapping can generally be used for both. This can be used to add additional entities specific to the DTD of the document being generated, or to cause particular characters to be encoded as character references:

   1 >>> escape("abc", {"b": "&#98;"})
   2 'a&#98;c'
   3 >>> escape("My product, PyThingaMaJiggie, is really cool.",
   4 ...        {"PyThingaMaJiggie": "&productName;"})
   5 'My product, &productName;, is really cool.'

Unescaping XML

The xml.sax.saxutils module provides an unescape() function as well. This function converts the &amp;, &gt;, and &lt; entity references back to the corresponding characters:

   1 >>> from xml.sax.saxutils import unescape
   2 >>>
   3 >>> unescape("&lt; &amp; &gt;")
   4 '< & >'

Note that the predefined entities &apos; and &quot; are not supported by default. Like the escape() and quoteattr() functions, unescape() can be provided with an additional mapping of replacements that should be performed. This can be used to add support for the additional predefined entities:

   1 >>> unescape("&apos; &quot;", {"&apos;": "'", "&quot;": '"'})
   2 '\' "'

This can also be used to perform replacements for longer strings.

See Also

EscapingXml (last edited 2008-11-15 14:00:44 by localhost)

Unable to edit the page? See the FrontPage for instructions.