Size: 1286
Comment: + API documentation
|
Size: 1352
Comment: HTMLParser is similar
|
Deletions are marked like this. | Additions are marked like this. |
Line 24: | Line 24: |
* HtmlParser -- similar module, tailored to HTML interpretation |
"Sax" is an XML parser that operates element by element, line by line.
MiniDom sucks up an entire XML file, holds it in memory, and lets you work with it. Sax, on the other hand, emits events as it goes step by step through the file.
Example
1 import xml.sax
2
3 class InkscapeSvgHandler(xml.sax.ContentHandler):
4 def startElement(self, name, attrs):
5 if name == "svg":
6 for (k,v) in attrs.items():
7 print k + " " + v
8
9 parser = xml.sax.make_parser()
10 parser.setContentHandler(InkscapeSvgHandler())
11 parser.parse(open("svg.xml","r"))
Links
HtmlParser -- similar module, tailored to HTML interpretation
[http://docs.python.org/lib/module-xml.sax.html Python Library Reference, xml.sax] -- API documentation
[http://www.rexx.com/~dkuhlman/pyxmlfaq.html Python XML FAQ and How-to] -- describes sax & MiniDom
[http://www.devchannel.org/webserviceschannel/04/05/25/1414203.shtml SAX processing in Python] -- helpful diagrams
[http://pyxml.sourceforge.net/topics/howto/section-SAX.html SAX: The Simple API for XML] -- wordy tutorial
[http://www-106.ibm.com/developerworks/linux/library/l-pxml.html Charming Python:Revisiting XML tools for Python] -- kind of old
[http://www.xml.com/pub/a/2003/03/12/py-xml.html Usings SAX for Proper XML Output]