Size: 1877
Comment: undo link breakage
|
Size: 1875
Comment: ClientCookie, mechanoid & GeneralFAQ links update
|
Deletions are marked like this. | Additions are marked like this. |
Line 10: | Line 10: |
* [[http://wwwsearch.sourceforge.net/ClientCookie/|ClientCookie]], [[http://wwwsearch.sourceforge.net/ClientForm/|ClientForm]], and [[http://wwwsearch.sourceforge.net/mechanize/|Mechanize]] are higher-level libraries for writing a web client. * [[http://www.python.org/pypi?:action=display&name=mechanoid&version=0.4.1|mechanoid]] a mechanize fork. |
* [[http://wwwsearch.sourceforge.net/old/ClientCookie/|ClientCookie]], [[http://wwwsearch.sourceforge.net/ClientForm/|ClientForm]], and [[http://wwwsearch.sourceforge.net/mechanize/|Mechanize]] are higher-level libraries for writing a web client. * [[http://pypi.python.org/pypi?:action=display&name=mechanoid|mechanoid]] a mechanize fork. |
Line 17: | Line 17: |
* [[http://wwwsearch.sourceforge.net/bits/clientx.html|Python web-client programming general FAQs]]. | * [[http://wwwsearch.sourceforge.net/old/bits/GeneralFAQ.html|Python web-client programming general FAQs]]. |
Client-Side Web Programming
Libraries
utidylib and mxTidy -- Python interfaces to html tidy library to clean up HTML documents.
html5lib A HTML5-compliant library for parsing arbitarily-broken HTML to a range of tree formats including minidom, elementtree (including lxml) and BeautifulSoup
BeautifulSoup -- a permissive HTML parser.
Don't use HTMLParser on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
ClientCookie, ClientForm, and Mechanize are higher-level libraries for writing a web client.
mechanoid a mechanize fork.
libxml2dom can parse HTML by employing libxml2's liberal HTML parser.