Differences between revisions 9 and 10
Revision 9 as of 2008-03-18 22:30:22
Size: 2876
Editor: ip-64-32-254-138
Comment:
Revision 10 as of 2008-11-15 13:59:37
Size: 2918
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Discuss approaches to the Netflix prize using Python, getting started with [http://pyflix.python-hosting.com/ PyFlix] for new people, algorithm + code performance, etc Discuss approaches to the Netflix prize using Python, getting started with [[http://pyflix.python-hosting.com/|PyFlix]] for new people, algorithm + code performance, etc
Line 5: Line 5:
I will be posting the code later this month on my blog: [http://www.datawrangling.com Data Wrangling] I will be posting the code later this month on my blog: [[http://www.datawrangling.com|Data Wrangling]]
Line 9: Line 9:
 *[http://www.netflixprize.com/teams Register a Team] in order to [http://www.netflixprize.com/download download the Netflix data]
 *[http://pyflix.python-hosting.com/ PyFlix] library for efficiently handling the dataset.
 *[http://www.grouplens.org/node/73 Movielens dataset] - smaller dataset to debug your code with...
 *[[http://www.netflixprize.com/teams|Register a Team]] in order to [[http://www.netflixprize.com/download|download the Netflix data]]
 *[[http://pyflix.python-hosting.com/|PyFlix]] library for efficiently handling the dataset.
 *[[http://www.grouplens.org/node/73|Movielens dataset]] - smaller dataset to debug your code with...
Line 14: Line 14:
 *[http://sifter.org/~simon/journal/20061211.html Simon Funk approach]
 *[http://www.timelydevelopment.com/demos/NetflixPrize.aspx Timely Development code for Simon Funk approach]
 *[http://www.netflixprize.com/community/viewtopic.php?pid=4712#p4712 Netflix forum KNN discussion] - includes numpy, weave specifics
 *[http://devlicio.us/blogs/billy_mccafferty/archive/2006/11/07/netflix-memoirs-using-the-pearson-correlation-coefficient.aspx Basic KNN in SQL]
 *[http://mainline.brynmawr.edu/Courses/cs380/fall2006/TiVo.pdf Tivo KNN paper]
 *[http://www.erikshelley.com/netflix/ Erik Shelly's approach]
 *[http://www.tillberg.us/netflixprizejumpstart Dan Tillberg's page]
 *[http://www.logarithmic.net/pfh/blog/01176798503 Paul Harrison's approach] - using numpy and weave
 *[http://www.siam.org/meetings/sdm06/proceedings/059zhangs2.pdf Dartmouth paper] - using EM/NMF approach with Movielens data
 *[http://www.research.att.com/~volinsky/netflix/ProgressPrize2007BellKorSolution.pdf BellKor paper] - Progress prize winner
 *[http://code.google.com/p/canopy-clustering/ Hadoop MapReduce code] for working with the Netflix data
 *[[http://sifter.org/~simon/journal/20061211.html|Simon Funk approach]]
 *[[http://www.timelydevelopment.com/demos/NetflixPrize.aspx|Timely Development code for Simon Funk approach]]
 *[[http://www.netflixprize.com/community/viewtopic.php?pid=4712#p4712|Netflix forum KNN discussion]] - includes numpy, weave specifics
 *[[http://devlicio.us/blogs/billy_mccafferty/archive/2006/11/07/netflix-memoirs-using-the-pearson-correlation-coefficient.aspx|Basic KNN in SQL]]
 *[[http://mainline.brynmawr.edu/Courses/cs380/fall2006/TiVo.pdf|Tivo KNN paper]]
 *[[http://www.erikshelley.com/netflix/|Erik Shelly's approach]]
 *[[http://www.tillberg.us/netflixprizejumpstart|Dan Tillberg's page]]
 *[[http://www.logarithmic.net/pfh/blog/01176798503|Paul Harrison's approach]] - using numpy and weave
 *[[http://www.siam.org/meetings/sdm06/proceedings/059zhangs2.pdf|Dartmouth paper]] - using EM/NMF approach with Movielens data
 *[[http://www.research.att.com/~volinsky/netflix/ProgressPrize2007BellKorSolution.pdf|BellKor paper]] - Progress prize winner
 *[[http://code.google.com/p/canopy-clustering/|Hadoop MapReduce code]] for working with the Netflix data
Line 36: Line 36:
 *If you need to go parallel for Netlfix, [http://www.datawrangling.com/pycon-2008-elasticwulf-slides.html ElasticWulf] public Amazon EC2 images come with mpi4py, IPython1, pyflix, numpy, scipy, weave, pyrex, etc. already installed and configured. The [http://code.google.com/p/elasticwulf/ python code] for launching your own beowulf on EC2 using the images is on google code.  *If you need to go parallel for Netlfix, [[http://www.datawrangling.com/pycon-2008-elasticwulf-slides.html|ElasticWulf]] public Amazon EC2 images come with mpi4py, IPython1, pyflix, numpy, scipy, weave, pyrex, etc. already installed and configured. The [[http://code.google.com/p/elasticwulf/|python code]] for launching your own beowulf on EC2 using the images is on google code.
Line 38: Line 38:
Parallel Programming is useful for lots of ML algorithms. [http://www.dehora.net/journal/2005/02/two_classic_hardbacks.html How to Write Parallel Programs] is a good book. [http://www.amazon.com/How-Write-Parallel-Programs-Course/dp/026203171X/ Amazon] Consider jython, since ML is often CPU-bound, and jython has no GIL. Parallel Programming is useful for lots of ML algorithms. [[http://www.dehora.net/journal/2005/02/two_classic_hardbacks.html|How to Write Parallel Programs]] is a good book. [[http://www.amazon.com/How-Write-Parallel-Programs-Course/dp/026203171X/|Amazon]] Consider jython, since ML is often CPU-bound, and jython has no GIL.

Discuss approaches to the Netflix prize using Python, getting started with PyFlix for new people, algorithm + code performance, etc

Some Netflix code in Python will be shown/run (KNN, NMF, ARTmap, SVD, etc).

I will be posting the code later this month on my blog: Data Wrangling

Some links for those just getting started:

Some approaches:

More here:

Performance pointers:

Parallel Programming is useful for lots of ML algorithms. How to Write Parallel Programs is a good book. Amazon Consider jython, since ML is often CPU-bound, and jython has no GIL.

NetflixPrizeBOF (last edited 2008-11-15 13:59:37 by localhost)

Unable to edit the page? See the FrontPage for instructions.