Thought this was cool: Reddit
I got this link from Igor Caron. The famous compressed sensing and matrix factorization blogger.
In one of the thread, there was a discussion about recommender capabilities. Since we were looking at Arxaliv.org as a model (this is a Reddit clone), I went to the reddit discussion of the development of that open source platform and found that they, Reddit, actually are looking for a recommeder system and they have a nice dataset.
There are 23,091,688 votes from 43,976 users over 3,436,063 links in 11,675 reddits. (Interestingly these ~44k users represent almost 17% of our total votes). The dump is 2.2gb uncompressed, 375mb in bz2.A reddit is a category. A link is a subject (in Arxaliv it would be a paper) so that matrix (43976 x 3436063) is pretty sparsely filled (1.5e-5). Some SVD has been tried but I am sure they haven’t looked at low rank solvers. Since Reddit is such a massive platform, if your algorithm provides good results, it will get to be known beyond your expectations.
from Large Scale Machine Learning and Other Animals: http://bickson.blogspot.com/2012/09/reddit.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2FsYXZE+%28Large+Scale+Machine+Learning+and+Other+Animals%29