No Arabic abstract
In Dictionary Learning one tries to recover incoherent matrices $A^* in mathbb{R}^{n times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* in mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y in mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The Autoencoder architecture we consider is a $mathbb{R}^n rightarrow mathbb{R}^n$ mapping with a single ReLU activation layer of size $h$. Under very mild distributional assumptions on $x^*$, we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of $A^*$. This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that $A^*$ is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest.
Recently, the study on learned iterative shrinkage thresholding algorithm (LISTA) has attracted increasing attentions. A large number of experiments as well as some theories have proved the high efficiency of LISTA for solving sparse coding problems. However, existing LISTA methods are all serial connection. To address this issue, we propose a novel extragradient based LISTA (ELISTA), which has a residual structure and theoretical guarantees. In particular, our algorithm can also provide the interpretability for Res-Net to a certain extent. From a theoretical perspective, we prove that our method attains linear convergence. In practice, extensive empirical results verify the advantages of our method.
It has recently been observed that certain extremely simple feature encoding techniques are able to achieve state of the art performance on several standard image classification benchmarks including deep belief networks, convolutional nets, factored RBMs, mcRBMs, convolutional RBMs, sparse autoencoders and several others. Moreover, these triangle or soft threshold encodings are ex- tremely efficient to compute. Several intuitive arguments have been put forward to explain this remarkable performance, yet no mathematical justification has been offered. The main result of this report is to show that these features are realized as an approximate solution to the a non-negative sparse coding problem. Using this connection we describe several variants of the soft threshold features and demonstrate their effectiveness on two image classification benchmark tasks.
We consider the dictionary learning problem, where the aim is to model the given data as a linear combination of a few columns of a matrix known as a dictionary, where the sparse weights forming the linear combination are known as coefficients. Since the dictionary and coefficients, parameterizing the linear model are unknown, the corresponding optimization is inherently non-convex. This was a major challenge until recently, when provable algorithms for dictionary learning were proposed. Yet, these provide guarantees only on the recovery of the dictionary, without explicit recovery guarantees on the coefficients. Moreover, any estimation error in the dictionary adversely impacts the ability to successfully localize and estimate the coefficients. This potentially limits the utility of existing provable dictionary learning methods in applications where coefficient recovery is of interest. To this end, we develop NOODL: a simple Neurally plausible alternating Optimization-based Online Dictionary Learning algorithm, which recovers both the dictionary and coefficients exactly at a geometric rate, when initialized appropriately. Our algorithm, NOODL, is also scalable and amenable for large scale distributed implementations in neural architectures, by which we mean that it only involves simple linear and non-linear operations. Finally, we corroborate these theoretical results via experimental evaluation of the proposed algorithm with the current state-of-the-art techniques. Keywords: dictionary learning, provable dictionary learning, online dictionary learning, non-convex, sparse coding, support recovery, iterative hard thresholding, matrix factorization, neural architectures, neural networks, noodl, sparse representations, sparse signal processing.
Tensor completion refers to the task of estimating the missing data from an incomplete measurement or observation, which is a core problem frequently arising from the areas of big data analysis, computer vision, and network engineering. Due to the multidimensional nature of high-order tensors, the matrix approaches, e.g., matrix factorization and direct matricization of tensors, are often not ideal for tensor completion and recovery. In this paper, we introduce a unified low-rank and sparse enhanced Tucker decomposition model for tensor completion. Our model possesses a sparse regularization term to promote a sparse core tensor of the Tucker decomposition, which is beneficial for tensor data compression. Moreover, we enforce low-rank regularization terms on factor matrices of the Tucker decomposition for inducing the low-rankness of the tensor with a cheap computational cost. Numerically, we propose a customized ADMM with enough easy subproblems to solve the underlying model. It is remarkable that our model is able to deal with different types of real-world data sets, since it exploits the potential periodicity and inherent correlation properties appeared in tensors. A series of computational experiments on real-world data sets, including internet traffic data sets, color images, and face recognition, demonstrate that our model performs better than many existing state-of-the-art matricization and tensorization approaches in terms of achieving higher recovery accuracy.
The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yielding sparse solutions naturally fits the compression requirement, but due to the limited study of sparse optimization in stochastic learning, its extension and application onto model compression is rarely well explored. In this work, we propose a model compression framework based on the recent progress on sparse stochastic optimization. Compared to existing model compression techniques, our method is effective and requires fewer extra engineering efforts to incorporate with varying applications, and has been numerically demonstrated on benchmark compression tasks. Particularly, we achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to the baseline heavy models, respectively.