Differences between revisions 1 and 13 (spanning 12 versions)

Client-Side Web Programing

Libraries

[http://utidylib.berlios.de/ utidylib] and [http://www.egenix.com/files/python/mxTidy.html mxTidy] -- Python interfaces to [http://tidy.sourceforge.net/ html tidy] library to clean up HTML documents.
[http://www.crummy.com/software/BeautifulSoup/ BeautifulSoup] -- a permissive HTML parser.
Don't use [http://python.org/doc/current/lib/module-HTMLParser.html HTMLParser] on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
[http://python.org/doc/current/lib/module-urllib.html urllib], [http://python.org/doc/current/lib/module-urllib2.html urllib2], and [http://python.org/doc/current/lib/module-httplib.html httplib] in the standard library.
[http://wwwsearch.sourceforge.net/ClientCookie/ ClientCookie], [http://wwwsearch.sourceforge.net/ClientForm/ ClientForm], and [http://wwwsearch.sourceforge.net/mechanize/ Mechanize] are higher-level libraries for writing a web client.
[http://www.python.org/pypi?:action=display&name=mechanoid&version=0.4.1 mechanoid] a mechanize fork.
[http://www.python.org/pypi/libxml2dom libxml2dom] can parse HTML by employing libxml2's liberal HTML parser.

Resources

[http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52199 Grab a document from the web] - from the Python Cookbook
[http://wwwsearch.sourceforge.net/bits/clientx.html Python web-client programming general FAQs].
[http://docs.python.org/lib/module-urllib.html urllib -- Open arbitrary resources by URL]
[http://docs.python.org/lib/module-urllib2.html urllib2 -- extensible library for opening URLs]

-  ⇤ ← Revision 1 as of 2004-07-29 13:27:59 → 
  Size: 437
  Editor: AndrewKuchling
  Comment: Create new page
+   ← Revision 13 as of 2006-11-15 13:19:37 → ⇥
  Size: 1673
  Editor: 85-18-14-8
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
+= Client-Side Web Programing =
-Line 2:
+Line 3:
-There is probably a huge amount of good stuff available from the people who are working actively with XML-RPC, Biztalk and other approaches to web services. More too from XML writers such as [http://uche.ogbuji.net/uche.ogbuji.net/ Uche Ogbuji], who has put much good stuff on IBM's developerworks site, among other things.
+== Libraries ==
-Line 4:
+Line 5:
-Sadly nobody has categorised or classified it in the Wiki, so at the moment we have to scratch around.
+ * [http://utidylib.berlios.de/ utidylib] and [http://www.egenix.com/files/python/mxTidy.html mxTidy] -- Python interfaces to [http://tidy.sourceforge.net/ html tidy] library to clean up HTML documents.
 * [http://www.crummy.com/software/BeautifulSoup/ BeautifulSoup] -- a permissive HTML parser.
 * Don't use [http://python.org/doc/current/lib/module-HTMLParser.html HTMLParser] on HTML that might be invalid!  That way lies pain.  Either clean it up (using tidy), or use a different parser.
 * [http://python.org/doc/current/lib/module-urllib.html urllib], [http://python.org/doc/current/lib/module-urllib2.html urllib2], and [http://python.org/doc/current/lib/module-httplib.html httplib] in the standard library.
 * [http://wwwsearch.sourceforge.net/ClientCookie/ ClientCookie], [http://wwwsearch.sourceforge.net/ClientForm/ ClientForm], and [http://wwwsearch.sourceforge.net/mechanize/ Mechanize] are higher-level libraries for writing a web client.
 * [http://www.python.org/pypi?:action=display&name=mechanoid&version=0.4.1 mechanoid] a mechanize fork.
 * [http://www.python.org/pypi/libxml2dom libxml2dom] can parse HTML by employing libxml2's liberal HTML parser.
-Line 6:
+Line 13:
+== Resources ==
-Line 7:
+Line 15:
+ * [http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52199 Grab a document from the web] - from the Python Cookbook
 * [http://wwwsearch.sourceforge.net/bits/clientx.html Python web-client programming general FAQs].
 * [http://docs.python.org/lib/module-urllib.html urllib -- Open arbitrary resources by URL]
 * [http://docs.python.org/lib/module-urllib2.html urllib2 -- extensible library for opening URLs]

Page

User

Client-Side Web Programing

Libraries

Resources