Discuss approaches to the Netflix prize using Python, getting started with PyFlix for new people, algorithm + code performance, etc
Some Netflix code in Python will be shown/run (KNN, NMF, ARTmap, SVD, etc).
I will be posting the code later this month on my blog: Data Wrangling
Some links for those just getting started:
PyFlix library for efficiently handling the dataset.
Movielens dataset - smaller dataset to debug your code with...
Netflix forum KNN discussion - includes numpy, weave specifics
Paul Harrison's approach - using numpy and weave
Dartmouth paper - using EM/NMF approach with Movielens data
BellKor paper - Progress prize winner
Hadoop MapReduce code for working with the Netflix data
If you need to go parallel for Netlfix, ElasticWulf public Amazon EC2 images come with mpi4py, IPython1, pyflix, numpy, scipy, weave, pyrex, etc. already installed and configured. The python code for launching your own beowulf on EC2 using the images is on google code.