Discuss approaches to the Netflix prize using Python, getting started with PyFlix for new people, algorithm + code performance, etc
Some Netflix code in Python will be shown/run (KNN, NMF, ARTmap, SVD, etc)
Some links for those just getting started:
[http://www.netflixprize.com/teams Register a Team] in order to [http://www.netflixprize.com/download download the Netflix data]
[http://pyflix.python-hosting.com/ PyFlix] library for efficiently handling the dataset.
[http://www.grouplens.org/node/73 Movielens dataset] - smaller dataset to debug your code with...
Some approaches:
[http://sifter.org/~simon/journal/20061211.html Simon Funk approach]
[http://www.timelydevelopment.com/demos/NetflixPrize.aspx Timely Development code for Simon Funk approach]
[http://www.netflixprize.com/community/viewtopic.php?pid=4712#p4712 Netflix forum KNN discussion] - includes numpy, weave specifics
[http://devlicio.us/blogs/billy_mccafferty/archive/2006/11/07/netflix-memoirs-using-the-pearson-correlation-coefficient.aspx Basic KNN in SQL]
[http://mainline.brynmawr.edu/Courses/cs380/fall2006/TiVo.pdf Tivo KNN paper]
[http://www.erikshelley.com/netflix/ Erik Shelly's approach]
[http://www.tillberg.us/netflixprizejumpstart Dan Tillberg's page]
[http://www.logarithmic.net/pfh/blog/01176798503 Paul Harrison's approach] - using numpy and weave
[http://www.siam.org/meetings/sdm06/proceedings/059zhangs2.pdf Dartmouth paper] - using EM/NMF approach with Movielens data
[http://www.research.att.com/~volinsky/netflix/ProgressPrize2007BellKorSolution.pdf BellKor paper] - Progress prize winner
[http://code.google.com/p/canopy-clustering/ Hadoop MapReduce code] for working with the Netflix data
More here:
Performance pointers: