Differences between revisions 21 and 22

Client-Side Web Programming

Libraries

utidylib and mxTidy -- Python interfaces to html tidy library to clean up HTML documents.
html5lib A HTML5-compliant library for parsing arbitarily-broken HTML to a range of tree formats including minidom, elementtree (including lxml) and BeautifulSoup
BeautifulSoup -- a permissive HTML parser.
Don't use HTMLParser on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
urllib, urllib2, and httplib in the standard library.
ClientCookie, ClientForm, and Mechanize are higher-level libraries for writing a web client.
mechanoid a mechanize fork.
libxml2dom can parse HTML by employing libxml2's liberal HTML parser.

-  ⇤ ← Revision 21 as of 2008-11-15 14:00:47 → 
  Size: 1877
  Editor: localhost
  Comment: converted to 1.6 markup
+   ← Revision 22 as of 2009-05-13 15:49:51 → ⇥
  Size: 1881
  Editor: 189
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 9:
- * [[http://docs.python.org/library/urllib.html|urllib]], [[http://docs.python.org/library/urllib2.html|urllib2]], and [[http://docs.python.org/library/httplib.html|httplib]] in the standard library.
+ * [[http://docasdfs.python.org/library/urllib.html|urllib]], [[http://docs.python.org/library/urllib2.html|urllib2]], and [[http://docs.python.org/library/httplib.html|httplib]] in the standard library.