Size: 470
Comment:
|
Size: 1556
Comment: noted several libraries.
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
== Libraries == * [http://utidylib.berlios.de/ utidylib] and [http://www.egenix.com/files/python/mxTidy.html mxTidy] -- Python interfaces to [http://tidy.sourceforge.net/ html tidy] library to clean up HTML documents. * [http://www.crummy.com/software/BeautifulSoup/ BeautifulSoup] -- a permissive HTML parser. * Don't use [http://python.org/doc/current/lib/module-HTMLParser.html HTMLParser] on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser. * [http://python.org/doc/current/lib/module-urllib.html urllib], [http://python.org/doc/current/lib/module-urllib2.html urllib2], and [http://python.org/doc/current/lib/module-httplib.html httplib] in the standard library. * [http://wwwsearch.sourceforge.net/ClientCookie/ ClientCookie], [http://wwwsearch.sourceforge.net/ClientForm/ ClientForm], and [http://wwwsearch.sourceforge.net/mechanize/ Mechanize] are higher-level libraries for writing a web client. * [http://www.python.org/pypi?:action=display&name=mechanoid&version=0.4.1 mechanoid] a mechanize fork. |
|
Line 6: | Line 15: |
* [http://wwwsearch.sourceforge.net/bits/clientx.html Python web-client programming general FAQs] | * [http://wwwsearch.sourceforge.net/bits/clientx.html Python web-client programming general FAQs]. |
Line 9: | Line 18: |
Client-Side web programming
Libraries
[http://utidylib.berlios.de/ utidylib] and [http://www.egenix.com/files/python/mxTidy.html mxTidy] -- Python interfaces to [http://tidy.sourceforge.net/ html tidy] library to clean up HTML documents.
[http://www.crummy.com/software/BeautifulSoup/ BeautifulSoup] -- a permissive HTML parser.
Don't use [http://python.org/doc/current/lib/module-HTMLParser.html HTMLParser] on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
[http://python.org/doc/current/lib/module-urllib.html urllib], [http://python.org/doc/current/lib/module-urllib2.html urllib2], and [http://python.org/doc/current/lib/module-httplib.html httplib] in the standard library.
[http://wwwsearch.sourceforge.net/ClientCookie/ ClientCookie], [http://wwwsearch.sourceforge.net/ClientForm/ ClientForm], and [http://wwwsearch.sourceforge.net/mechanize/ Mechanize] are higher-level libraries for writing a web client.
[http://www.python.org/pypi?:action=display&name=mechanoid&version=0.4.1 mechanoid] a mechanize fork.
Resources
[http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52199 Grab a document from the web] - from the Python Cookbook
[http://wwwsearch.sourceforge.net/bits/clientx.html Python web-client programming general FAQs].
[http://docs.python.org/lib/module-urllib.html urllib -- Open arbitrary resources by URL]
[http://docs.python.org/lib/module-urllib2.html urllib2 -- extensible library for opening URLs]