# CWYAlpha

Just another WordPress.com site

## Thought this was cool: Low Rank Approximation and Regression in Input Sparsity Time – implementation –

Ken Clarkson left a new comment on a previous post “OSNAP: Faster numerical linear algebra algorithms …“:

Hi. While Huy and Jelani have sharper results, and cleaner proofs, it’s worth mentioning that David and I have sharpened our analysis as well, reducing O(nnz(A)log ) to O(nnz(A)) dependence in that second set of results for us in their table, for regression and low-rank approximation. These improvements result from a more careful analysis in the heavy-coordinate, “perfect hashing”, part of our proof; they are in the current version of our paper on arXiv.

My bad, indeed, David states so in the video, the log factor is gone. (slides are also listed in the recent Workshop on “Randomized Numerical Linear Algebra (RandNLA): Theory and Practice”). Thanks Ken !

Again, I hesitate to mention that this is an implementation as the hashing mechanism seems so trivial. The arxiv paper is: Low Rank Approximation and Regression in Input Sparsity Time by Ken Clarkson, David P. Woodruff

(Submitted on 26 Jul 2012 (v1), last revised 31 Oct 2012 (this version, v3))
We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$. Such a matrix $S$ is called a \emph{subspace embedding}. Furthermore, $SA$ can be computed in $\nnz(A) + \poly(d \eps^{-1})$ time, where $\nnz(A)$ is the number of non-zero entries of $A$. This improves over all previous subspace embeddings, which required at least $\Omega(nd \log d)$ time to achieve this property. We call our matrices $S$ \emph{sparse embedding matrices}.
Using our sparse embedding matrices, we obtain the fastest known algorithms for $(1+\eps)$-approximation for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and $\ell_p$-regression. The leading order term in the time complexity of our algorithms is $O(\nnz(A))$ or $O(\nnz(A)\log n)$.
We optimize the low-order $\poly(d/\eps)$ terms in our running times (or for rank-$k$ approximation, the $n*\poly(k/eps)$ term), and show various tradeoffs. For instance, we also use our methods to design new preconditioners that improve the dependence on $\eps$ in least squares regression to $\log 1/\eps$. Finally, we provide preliminary experimental results which suggest that our algorithms are competitive in practice.
and here the video:

Low Rank Approximation and Regression in Input Sparsity Time, David Woodruff

Join our Reddit Experiment, Join the CompressiveSensing subreddit and post there !
Liked this entry ? subscribe to Nuit Blanche’s feed, there’s more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.