Differences between revisions 4 and 5
Revision 4 as of 2004-04-17 12:43:22
Size: 3631
Editor: xdsl-213-196-252-135
Comment: Corrected bozo info, removed bozo filtering from code
Revision 5 as of 2004-11-15 01:42:31
Size: 3810
Editor: user-10cm007
Comment: getting the author from a ModWiki feed w/ feedparser
Deletions are marked like this. Additions are marked like this.
Line 100: Line 100:
(none yet) Getting the "author"/"contributor" our of ModWiki with the feedparser module is a bit confusing as of now. Right now (feedparser 3.3), it goes into the "rdf_value" attribute of the entry.

Python RSS Code



Feed Parser

[http://diveintomark.org/projects/feed_parser/ Feed Parser] is an awesome RSS reader.

Download it, and then start a Python prompt in the same directory.

   1 import feedparser
   3 python_wiki_rss_url = "http://www.python.org/cgi-bin/moinmoin/" \
   4                       "RecentChanges?action=macro&" \
   5                       "macro=RecentChanges&do=rss_rc"
   7 feed = feedparser.parse( python_wiki_rss_url )

You now have the RSS feed data for the PythonInfo wiki!

Take a look at it; There's a lot of data there.

Of particular interest:

feed[ "bozo" ]

1 if the feed data isn't well-formed XML.

feed[ "url" ]

URL of the feed's RSS feed

feed[ "version" ]

version of the RSS feed

feed[ "channel" ][ "title" ] 

"PythonInfo Wiki" - Title of the Feed.

feed[ "channel" ][ "description" ]

"RecentChanges at PythonInfo Wiki." - Description of the Feed

feed[ "channel" ][ "link" ]

Link to RecentChanges - Web page associated with the feed.

feed[ "channel" ][ "wiki_interwiki" ]

"Python``Info" - For wiki, the wiki's preferred InterWiki moniker.

feed[ "items" ]

A gigantic list of all of the RecentChanges items.

For each item in feed["items"], we have:

item[ "date" ]

"2004-02-13T22:28:23+08:00" - ISO 8601 (right#?) date

item[ "date_parsed" ]


item[ "title" ]

title for item

item[ "summary" ]

change summary

item[ "link" ]

URL to the page

item[ "wiki_diff" ]

for wiki, a link to the diff for the page

item[ "wiki_history" ]

for wiki, a link to the page history

Aggregating Feeds with Feed Parser

If you're pulling down a lot of feeds, and aggregating them:

First, you probably want to use [http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/84317 Future threads] to pull down your feeds. That way, you can send out 5 requests immediately, and wait for them all to come back at once, rather than sending out a request, waiting for it to come in, send out another request, wait for it to come back in, etc., etc.,.

   1 from future import Future
   3 hit_list = [ "http://...", "...", "..." ] # list of feeds to pull down
   5 # pull down all feeds
   6 future_calls = [Future(feedparser.parse,rss_url) for rss_url in hit_list]
   7 # block until they are all in
   8 feeds = [future_obj() for future_obj in future_calls]

Now that you have your feeds, extract all the entries.

   1 entries = []
   2 for feed in feeds:
   3     entries.extend( feed[ "items" ] )

...and sort them, by SortingListsOfDictionaries:

   1 decorated = [(entry["date_parsed"], entry) for entry in entries]
   2 decorated.sort()
   3 decorated.reverse() # for most recent entries first
   4 sorted = [entry for (date,entry) in decorated]

Congradulations! You've aggregated a bunch of changes!




Getting the "author"/"contributor" our of ModWiki with the feedparser module is a bit confusing as of now. Right now (feedparser 3.3), it goes into the "rdf_value" attribute of the entry.

RssLibraries (last edited 2014-05-08 00:46:56 by DaleAthanasias)

Unable to edit the page? See the FrontPage for instructions.