Thought this was cool: Steffen Rendle – libFM
News from the KDD CUP workshop. I was highly impressed by Steffen
Rendle, the author of libFM
collaborative filtering library. Steffen won the 2nd place in track1
and the 3rd place in track2.
Unlike our team who had around 15 people, and the Taiwanese team who
had around 20 people
Steffen worked alone, and got a great rating in BOTH tracks.
What is nice about Steffen’s work, is that he is using only a SINGLE
algorithm, and not an ensemble of
methods as typically deployed. The trick is the he does very smart
feature engineering to create a very good
feature matrix. Once he gets the representative feature matrix he uses
the libFM algorithm.
I asked Steffen to explain the essense of the method:
A is the input (design) matrix where each row is a case and each column a
(real-valued) predictor variable. I.e. the same way of feature engineering
as in other standard ML algorithms such as linear/ polynomial regression,
Internally, the FM model works similarly as a polynomial regression, i.e.
it contains all pairwise interactions between any two variables in A. The
important difference to polynomial regression is that the model parameters
for variable interactions are factorized. This is the reason why FMs
perform well in problems such as recommender systems.
Some of the recent notable work of Steffen is a caching method for
caching ALS computation, that according to Steffen, significantly
speeds up ALS computation and makes it a lighter algorithm like SGD.
The work is described in his recent SIGIR 2011 paper.
A second interesting work is an online matrix factorization
computation described in the paper: Steffen Rendle, Lars Schmidt-Thieme (2008):
Online-Updating Regularized Kernel Matrix Factorization Models forLarge-Scale Recommender Systems, in Proceedings of the 2008 ACM
Conference on Recommender Systems (RecSys 2008), ACM.
When new users/ items are added into the system, only an incremental
update is computed.
Finally, Steffen gave a very detailed tutorial in KDD, about a whole
bunch of matrix factorization methods and the relations between them. I find the tutorial a good overview of the connections between
algorithms, however it is intended for intermediate user level who
actually master some of the algorithms in this domain.
As you may know, we have a very crude and preliminary libFM
implementation in GraphLab. libFM algorithm implementation contains a
subset of the full libFM
functionality with only three predictions: user, item and time. Users
are encouraged to check the original libFM library for enhanced implementation.
libFM library has a track record performance in KDD
CUP and is highly recommended collaborative filtering package.
from Large Scale Machine Learning and Other Animals: http://bickson.blogspot.com/2012/08/steffen-rendle-libfm.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+blogspot%2FsYXZE+%28Large+Scale+Machine+Learning+and+Other+Animals%29