Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Fast and robust tensor decomposition with applications to dictionary learning

69 0 0.0 ( 0 )

Download Cite

Added by Tselil Schramm

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Tselil Schramm - David Steurer

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We develop fast spectral algorithms for tensor decomposition that match the robustness guarantees of the best known polynomial-time algorithms for this problem based on the sum-of-squares (SOS) semidefinite programming hierarchy. Our algorithms can decompose a 4-tensor with $n$-dimensional orthonormal components in the presence of error with constant spectral norm (when viewed as an $n^2$-by-$n^2$ matrix). The running time is $n^5$ which is close to linear in the input size $n^4$. We also obtain algorithms with similar running time to learn sparsely-used orthogonal dictionaries even when feature representations have constant relative sparsity and non-independent coordinates. The only previous polynomial-time algorithms to solve these problem are based on solving large semidefinite programs. In contrast, our algorithms are easy to implement directly and are based on spectral projections and tensor-mode rearrangements. Or work is inspired by recent of Hopkins, Schramm, Shi, and Steurer (STOC16) that shows how fast spectral algorithms can achieve the guarantees of SOS for average-case problems. In this work, we introduce general techniques to capture the guarantees of SOS for worst-case problems.

rate research

Online matrix factorization for Markovian data and applications to Network Dictionary Learning

67 - Hanbaek Lyu , Deanna Needell , 2019

Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of dependent data streams remains largely unexplored. In this paper, we show that a non-convex generalization of the well-known OMF algorithm for i.i.d. stream of data in citep{mairal2010online} converges almost surely to the set of critical points of the expected loss function, even when the data matrices are functions of some underlying Markov chain satisfying a mild mixing condition. This allows one to extract features more efficiently from dependent data streams, as there is no need to subsample the data sequence to approximately satisfy the independence assumption. As the main application, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts ``network dictionary patches from a given network in an online manner that encodes main features of the network. We demonstrate this technique and its application to network denoising problems on real-world network data.

Machine Learning Data Structures and Algorithms Optimization and Control

On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition

111 - Marco Mondelli , Andrea Montanari 2018

We establish connections between the problem of learning a two-layer neural network and tensor decomposition. We consider a model with feature vectors $boldsymbol x in mathbb R^d$, $r$ hidden units with weights ${boldsymbol w_i}_{1le i le r}$ and output $yin mathbb R$, i.e., $y=sum_{i=1}^r sigma( boldsymbol w_i^{mathsf T}boldsymbol x)$, with activation functions given by low-degree polynomials. In particular, if $sigma(x) = a_0+a_1x+a_3x^3$, we prove that no polynomial-time learning algorithm can outperform the trivial predictor that assigns to each example the response variable $mathbb E(y)$, when $d^{3/2}ll rll d^2$. Our conclusion holds for a `natural data distribution, namely standard Gaussian feature vectors $boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition. Roughly speaking, we assume that no polynomial-time algorithm can substantially outperform current methods for tensor decomposition based on the sum-of-squares hierarchy. We also prove generalizations of this statement for higher degree polynomial activations, and non-random weight vectors. Remarkably, several existing algorithms for learning two-layer networks with rigorous guarantees are based on tensor decomposition. Our results support the idea that this is indeed the core computational difficulty in learning such networks, under the stated generative model for the data. As a side result, we show that under this model learning the network requires accurate learning of its weights, a property that does not hold in a more general setting.

Machine Learning Data Structures and Algorithms Machine Learning

Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

448 - Boaz Barak , Jonathan A. Kelner , David Steurer 2014

We give a new approach to the dictionary learning (also known as sparse coding) problem of recovering an unknown $ntimes m$ matrix $A$ (for $m geq n$) from examples of the form [ y = Ax + e, ] where $x$ is a random vector in $mathbb R^m$ with at most $tau m$ nonzero coordinates, and $e$ is a random noise vector in $mathbb R^n$ with bounded magnitude. For the case $m=O(n)$, our algorithm recovers every column of $A$ within arbitrarily good constant accuracy in time $m^{O(log m/log(tau^{-1}))}$, in particular achieving polynomial time if $tau = m^{-delta}$ for any $delta>0$, and time $m^{O(log m)}$ if $tau$ is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector $x$ to be much sparser---at most $sqrt{n}$ nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser $x$. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor $T$, given access to a tensor $T$ that is $tau$-close to $T$ in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of $T$ and $T$ have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems.

Data Structures and Algorithms Machine Learning Machine Learning

Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition

131 - HanQin Cai , Zehan Chao , Longxiu Huang 2021

We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction.

Machine Learning Computer Vision and Pattern Recognition Image and Video Processing

Robust Coreset for Continuous-and-Bounded Learning (with Outliers)

144 - Zixiu Wang , Yiwen Guo , Hu Ding 2021

In this big data era, we often confront large-scale data in many machine learning tasks. A common approach for dealing with large-scale data is to build a small summary, {em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers and most existing coreset construction methods are not resilient against outliers (in particular, the outliers can be located arbitrarily in the space by an adversarial attacker). In this paper, we propose a novel robust coreset method for the {em continuous-and-bounded learning} problem (with outliers) which includes a broad range of popular optimization objectives in machine learning, like logistic regression and $ k $-means clustering. Moreover, our robust coreset can be efficiently maintained in fully-dynamic environment. To the best of our knowledge, this is the first robust and fully-dynamic coreset construction method for these optimization problems. We also conduct the experiments to evaluate the effectiveness of our robust coreset in practice.

Machine Learning Data Structures and Algorithms Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fast and robust tensor decomposition with applications to dictionary learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions