Revision 8 as of 2005-06-10 18:35:51

Clear message

Escaping HTML

The cgi module that comes with Python has an escape function:

   1 import cgi
   2 
   3 s = cgi.escape( """& < >""" )   # s = "&amp; &lt; &gt;"

However, it doesn't escape characters beyond &, <, and >.

Here's a small snippet that will let you escape those as well:

   1 html_escape_table = {
   2     "&": "&amp;",
   3     '"': "&quot;",
   4     "'": "&apos;",
   5     ">": "&gt;",
   6     "<": "&lt;",
   7     }
   8 
   9 def html_escape(text):
  10     """Produce entities within text."""
  11     L=[]
  12     for c in text:
  13         L.append(html_escape_table.get(c,c))
  14     return "".join(L)

Discussion

LionKimbro: Is there anything in the standard library for going the other way? Is there something where you can give it "&" and get back "&"? Perhaps in the XML libraries? I looked, but did not see anything. DOM, SAX- wouldn't be there. Not exactly XML-RPC either. Anyone know? (Answer needed for XML; HTML would be nice as well.) Date(2005-06-10T16:35:16Z)

Unable to edit the page? See the FrontPage for instructions.