Python RSS Code
Articles:
[http://www-106.ibm.com/developerworks/webservices/library/ws-pyth11.html The PythonWeb services developer: RSS for Python]
Libraries:
[http://www.mnot.net/python/RSS.py RSS.py] - reads most RSS versions, produces RSS 1.0
[http://diveintomark.org/projects/feed_parser/ Feed Parser] - reads 9 RSS versions
SpycyRoll - I don't know much about - this is an aggregator, and perhaps other stuff as well
Feed Parser
[http://diveintomark.org/projects/feed_parser/ Feed Parser] is an awesome RSS reader.
Download it, and then start a Python prompt in the same directory.
You now have the RSS feed data for the PythonInfo wiki!
Take a look at it; There's a lot of data there.
Of particular interest:
feed[ "bozo" ] |
1 if the feed data isn't well-formed XML. |
feed[ "url" ] |
URL of the feed's RSS feed |
feed[ "version" ] |
version of the RSS feed |
feed[ "channel" ][ "title" ] |
"PythonInfo Wiki" - Title of the Feed. |
feed[ "channel" ][ "description" ] |
"RecentChanges at PythonInfo Wiki." - Description of the Feed |
feed[ "channel" ][ "link" ] |
Link to RecentChanges - Web page associated with the feed. |
feed[ "channel" ][ "wiki_interwiki" ] |
"Python``Info" - For wiki, the wiki's preferred InterWiki moniker. |
feed[ "items" ] |
A gigantic list of all of the RecentChanges items. |
For each item in feed["items"], we have:
item[ "date" ] |
"2004-02-13T22:28:23+08:00" - ISO 8601 (right#?) date |
item[ "date_parsed" ] |
(2004,02,13,14,28,23,4,44,0) |
item[ "title" ] |
title for item |
item[ "summary" ] |
change summary |
item[ "link" ] |
URL to the page |
item[ "wiki_diff" ] |
for wiki, a link to the diff for the page |
item[ "wiki_history" ] |
for wiki, a link to the page history |
Aggregating Feeds with Feed Parser
If you're pulling down a lot of feeds, and aggregating them:
First, you probably want to use [http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/84317 Future threads] to pull down your feeds. That way, you can send out 5 requests immediately, and wait for them all to come back at once, rather than sending out a request, waiting for it to come in, send out another request, wait for it to come back in, etc., etc.,.
1 from future import Future
2
3 hit_list = [ "http://...", "...", "..." ] # list of feeds to pull down
4
5 # pull down all feeds
6 future_calls = [Future(feedparser.parse,rss_url) for rss_url in hit_list]
7 # block until they are all in
8 feeds = [future_obj() for future_obj in future_calls]
Now that you have your feeds, extract all the entries.
1 entries = []
2 for feed in feeds:
3 entries.extend( feed[ "items" ] )
...and sort them, by SortingListsOfDictionaries:
1 decorated = [(entry["date_parsed"], entry) for entry in entries]
2 decorated.sort()
3 decorated.reverse() # for most recent entries first
4 sorted = [entry for (date,entry) in decorated]
Congradulations! You've aggregated a bunch of changes!
Contributors
Discussion
(none yet)