I've been working on the Netflix prize contest off and on, basically the it's a data-mining challenge with a $1 million dollar prize. I'll be describing some of the approaches I took, hopefully someone will give me some feedback that will improve my score (0.899 as of July 20, 2007).
So far I've the following algorithms up and running: SVD (based on Timely Development's code, thank you!), KNN, clustering, MLE(or is it EM ? I'm not sure, anyway it didn't work), and a couple of amateur-ish schemes I dreamt up, which also didn't work out (by not working I mean did not beat Netflix's Cinematch's RMSE). Oh by the way, this is the first time I've ever implemented these algorithms.
While I certainly have enough programming experience for this contest, I'm not a mathematician or a statistician, and I have never dabbled in data mining before. So you won't find deep theoretical musings on topics like sampling techniques, SVD, restricted Boltzmann machine guns here. I'll concentrate on my implementation detail and try to keep my understanding and my terminology correct. All malaprops may or may not be intestinal.
More details to follow.
No comments:
Post a Comment