Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

187 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل David Steurer

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Boaz Barak - Jonathan A. Kelner - David Steurer

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We give a new approach to the dictionary learning (also known as sparse coding) problem of recovering an unknown $ntimes m$ matrix $A$ (for $m geq n$) from examples of the form [ y = Ax + e, ] where $x$ is a random vector in $mathbb R^m$ with at most $tau m$ nonzero coordinates, and $e$ is a random noise vector in $mathbb R^n$ with bounded magnitude. For the case $m=O(n)$, our algorithm recovers every column of $A$ within arbitrarily good constant accuracy in time $m^{O(log m/log(tau^{-1}))}$, in particular achieving polynomial time if $tau = m^{-delta}$ for any $delta>0$, and time $m^{O(log m)}$ if $tau$ is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector $x$ to be much sparser---at most $sqrt{n}$ nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser $x$. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor $T$, given access to a tensor $T$ that is $tau$-close to $T$ in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of $T$ and $T$ have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems.

قيم البحث

77 - Pravesh K. Kothari , David Steurer 2017

We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent w orks of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers. Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors

101 - Samuel B. Hopkins , Tselil Schramm , Jonathan Shi 2015

We consider two problems that arise in machine learning applications: the problem of recovering a planted sparse vector in a random linear subspace and the problem of decomposing a random low-rank overcomplete 3-tensor. For both problems, the best kn own guarantees are based on the sum-of-squares method. We develop new algorithms inspired by analyses of the sum-of-squares method. Our algorithms achieve the same or similar guarantees as sum-of-squares for these problems but the running time is significantly faster. For the planted sparse vector problem, we give an algorithm with running time nearly linear in the input size that approximately recovers a planted sparse vector with up to constant relative sparsity in a random subspace of $mathbb R^n$ of dimension up to $tilde Omega(sqrt n)$. These recovery guarantees match the best known ones of Barak, Kelner, and Steurer (STOC 2014) up to logarithmic factors. For tensor decomposition, we give an algorithm with running time close to linear in the input size (with exponent $approx 1.086$) that approximately recovers a component of a random 3-tensor over $mathbb R^n$ of rank up to $tilde Omega(n^{4/3})$. The best previous algorithm for this problem due to Ge and Ma (RANDOM 2015) works up to rank $tilde Omega(n^{3/2})$ but requires quasipolynomial time.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي التعلم الآلي

Noisy Tensor Completion via the Sum-of-Squares Hierarchy

308 - Boaz Barak , Ankur Moitra 2015

In the noisy tensor completion problem we observe $m$ entries (whose location is chosen uniformly at random) from an unknown $n_1 times n_2 times n_3$ tensor $T$. We assume that $T$ is entry-wise close to being rank $r$. Our goal is to fill in its mi ssing entries using as few observations as possible. Let $n = max(n_1, n_2, n_3)$. We show that if $m = n^{3/2} r$ then there is a polynomial time algorithm based on the sixth level of the sum-of-squares hierarchy for completing it. Our estimate agrees with almost all of $T$s entries almost exactly and works even when our observations are corrupted by noise. This is also the first algorithm for tensor completion that works in the overcomplete case when $r > n$, and in fact it works all the way up to $r = n^{3/2-epsilon}$. Our proofs are short and simple and are based on establishing a new connection between noisy tensor completion (through the language of Rademacher complexity) and the task of refuting random constant satisfaction problems. This connection seems to have gone unnoticed even in the context of matrix completion. Furthermore, we use this connection to show matching lower bounds. Our main technical result is in characterizing the Rademacher complexity of the sequence of norms that arise in the sum-of-squares relaxations to the tensor nuclear norm. These results point to an interesting new direction: Can we explore computational vs. sample complexity tradeoffs through the sum-of-squares hierarchy?

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Rounding Sum-of-Squares Relaxations

143 - Boaz Barak , Jonathan Kelner , David Steurer 2013

We present a general approach to rounding semidefinite programming relaxations obtained by the Sum-of-Squares method (Lasserre hierarchy). Our approach is based on using the connection between these relaxations and the Sum-of-Squares proof system to transform a *combining algorithm* -- an algorithm that maps a distribution over solutions into a (possibly weaker) solution -- into a *rounding algorithm* that maps a solution of the relaxation to a solution of the original problem. Using this approach, we obtain algorithms that yield improved results for natural variants of three well-known problems: 1) We give a quasipolynomial-time algorithm that approximates the maximum of a low degree multivariate polynomial with non-negative coefficients over the Euclidean unit sphere. Beyond being of interest in its own right, this is related to an open question in quantum information theory, and our techniques have already led to improved results in this area (Brand~{a}o and Harrow, STOC 13). 2) We give a polynomial-time algorithm that, given a d dimensional subspace of R^n that (almost) contains the characteristic function of a set of size n/k, finds a vector $v$ in the subspace satisfying $|v|_4^4 > c(k/d^{1/3}) |v|_2^2$, where $|v|_p = (E_i v_i^p)^{1/p}$. Aside from being a natural relaxation, this is also motivated by a connection to the Small Set Expansion problem shown by Barak et al. (STOC 2012) and our results yield a certain improvement for that problem. 3) We use this notion of L_4 vs. L_2 sparsity to obtain a polynomial-time algorithm with substantially improved guarantees for recovering a planted $mu$-sparse vector v in a random d-dimensional subspace of R^n. If v has mu n nonzero coordinates, we can recover it with high probability whenever $mu < O(min(1,n/d^2))$, improving for $d < n^{2/3}$ prior methods which intrinsically required $mu < O(1/sqrt(d))$.

بنى وهياكل البيانات والخوارزميات التعلم الآلي فيزياء الكم

Fast and robust tensor decomposition with applications to dictionary learning

68 - Tselil Schramm , David Steurer 2017

We develop fast spectral algorithms for tensor decomposition that match the robustness guarantees of the best known polynomial-time algorithms for this problem based on the sum-of-squares (SOS) semidefinite programming hierarchy. Our algorithms can decompose a 4-tensor with $n$-dimensional orthonormal components in the presence of error with constant spectral norm (when viewed as an $n^2$-by-$n^2$ matrix). The running time is $n^5$ which is close to linear in the input size $n^4$. We also obtain algorithms with similar running time to learn sparsely-used orthogonal dictionaries even when feature representations have constant relative sparsity and non-independent coordinates. The only previous polynomial-time algorithms to solve these problem are based on solving large semidefinite programs. In contrast, our algorithms are easy to implement directly and are based on spectral projections and tensor-mode rearrangements. Or work is inspired by recent of Hopkins, Schramm, Shi, and Steurer (STOC16) that shows how fast spectral algorithms can achieve the guarantees of SOS for average-case problems. In this work, we introduce general techniques to capture the guarantees of SOS for worst-case problems.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي