Differences between revisions 25 and 26

Client-Side Web Programming

Libraries

µTidylib and mxTidy -- Python interfaces to html tidy library to clean up HTML documents.
html5lib A HTML5-compliant library for parsing arbitarily-broken HTML to a range of tree formats including minidom, elementtree (including lxml) and BeautifulSoup
BeautifulSoup -- a permissive HTML parser.
Don't use HTMLParser (Python 2.x) or html.parser (Python 3.x) on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
urllib, urllib2, and httplib in the standard library.
ClientCookie, ClientForm, and Mechanize are higher-level libraries for writing a web client.
mechanoid a mechanize fork.
libxml2dom can parse HTML by employing libxml2's liberal HTML parser.

-  ⇤ ← Revision 25 as of 2014-04-17 01:04:20 → 
  Size: 1876
  Editor: DaleAthanasias
  Comment: µTidylib
+   ← Revision 26 as of 2014-04-17 01:10:23 → ⇥
  Size: 1967
  Editor: DaleAthanasias
  Comment: not sure if this is true for python 3.x?
-Deletions are marked like this.
+Additions are marked like this.
 Line 8:
- * Don't use [[http://python.org/doc/current/lib/module-HTMLParser.html|HTMLParser]] on HTML that might be invalid!  That way lies pain.  Either clean it up (using tidy), or use a different parser.
+ * Don't use [[https://docs.python.org/2/library/htmlparser.html|HTMLParser (Python 2.x)]] or [[https://docs.python.org/3.5/library/html.parser.html|html.parser (Python 3.x)]] on HTML that might be invalid!  That way lies pain.  Either clean it up (using tidy), or use a different parser.