No Arabic abstract
The recently proposed Multi-Layer Convolutional Sparse Coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of Convolutional Neural Networks (CNNs). Under this framework, the computation of the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors -- or feature maps -- from a given input signal. Despite having served as a pivotal connection between CNNs and sparse modeling, a deeper understanding of the ML-CSC is still lacking: there are no pursuit algorithms that can serve this model exactly, nor are there conditions to guarantee a non-empty model. While one can easily obtain signals that approximately satisfy the ML-CSC constraints, it remains unclear how to simply sample from the model and, more importantly, how one can train the convolutional filters from real data. In this work, we propose a sound pursuit algorithm for the ML-CSC model by adopting a projection approach. We provide new and improved bounds on the stability of the solution of such pursuit and we analyze different practical alternatives to implement this in practice. We show that the training of the filters is essential to allow for non-trivial signals in the model, and we derive an online algorithm to learn the dictionaries from real data, effectively resulting in cascaded sparse convolutional layers. Last, but not least, we demonstrate the applicability of the ML-CSC model for several applications in an unsupervised setting, providing competitive results. Our work represents a bridge between matrix factorization, sparse dictionary learning and sparse auto-encoders, and we analyze these connections in detail.
We address the multi-focus image fusion problem, where multiple images captured with different focal settings are to be fused into an all-in-focus image of higher quality. Algorithms for this problem necessarily admit the source image characteristics along with focused and blurred features. However, most sparsity-based approaches use a single dictionary in focused feature space to describe multi-focus images, and ignore the representations in blurred feature space. We propose a multi-focus image fusion approach based on sparse representation using a coupled dictionary. It exploits the observations that the patches from a given training set can be sparsely represented by a couple of overcomplete dictionaries related to the focused and blurred categories of images and that a sparse approximation based on such coupled dictionary leads to a more flexible and therefore better fusion strategy than the one based on just selecting the sparsest representation in the original image estimate. In addition, to improve the fusion performance, we employ a coupled dictionary learning approach that enforces pairwise correlation between atoms of dictionaries learned to represent the focused and blurred feature spaces. We also discuss the advantages of the fusion approach based on coupled dictionary learning, and present efficient algorithms for fusion based on coupled dictionary learning. Extensive experimental comparisons with state-of-the-art multi-focus image fusion algorithms validate the effectiveness of the proposed approach.
The Residual Quantization (RQ) framework is revisited where the quantization distortion is being successively reduced in multi-layers. Inspired by the reverse-water-filling paradigm in rate-distortion theory, an efficient regularization on the variances of the codewords is introduced which allows to extend the RQ for very large numbers of layers and also for high dimensional data, without getting over-trained. The proposed Regularized Residual Quantization (RRQ) results in multi-layer dictionaries which are additionally sparse, thanks to the soft-thresholding nature of the regularization when applied to variance-decaying data which can arise from de-correlating transformations applied to correlated data. Furthermore, we also propose a general-purpose pre-processing for natural images which makes them suitable for such quantization. The RRQ framework is first tested on synthetic variance-decaying data to show its efficiency in quantization of high-dimensional data. Next, we use the RRQ in super-resolution of a database of facial images where it is shown that low-resolution facial images from the test set quantized with codebooks trained on high-resolution images from the training set show relevant high-frequency content when reconstructed with those codebooks.
Convolutional dictionary learning (CDL), the problem of estimating shift-invariant templates from data, is typically conducted in the absence of a prior/structure on the templates. In data-scarce or low signal-to-noise ratio (SNR) regimes, which have received little attention from the community, learned templates overfit the data and lack smoothness, which can affect the predictive performance of downstream tasks. To address this limitation, we propose GPCDL, a convolutional dictionary learning framework that enforces priors on templates using Gaussian Processes (GPs). With the focus on smoothness, we show theoretically that imposing a GP prior is equivalent to Wiener filtering the learned templates, thereby suppressing high-frequency components and promoting smoothness. We show that the algorithm is a simple extension of the classical iteratively reweighted least squares, which allows the flexibility to experiment with different smoothness assumptions. Through simulation, we show that GPCDL learns smooth dictionaries with better accuracy than the unregularized alternative across a range of SNRs. Through an application to neural spiking data from rats, we show that learning templates by GPCDL results in a more accurate and visually-interpretable smooth dictionary, leading to superior predictive performance compared to non-regularized CDL, as well as parametric alternatives.
Parsimonious representations are ubiquitous in modeling and processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors. We explore different iterative methods to solve this new problem in practice, and we propose a new Multi-Layer Iterative Soft Thresholding Algorithm (ML-ISTA), as well as a fast version (ML-FISTA). We show that these nested first order algorithms converge, in the sense that the function value of near-fixed points can get arbitrarily close to the solution of the original problem. We further show how these algorithms effectively implement particular recurrent convolutional neural networks (CNNs) that generalize feed-forward ones without introducing any parameters. We present and analyze different architectures resulting unfolding the iterations of the proposed pursuit algorithms, including a new Learned ML-ISTA, providing a principled way to construct deep recurrent CNNs. Unlike other similar constructions, these architectures unfold a global pursuit holistically for the entire network. We demonstrate the emerging constructions in a supervised learning setting, consistently improving the performance of classical CNNs while maintaining the number of parameters constant.
In this paper, we propose a novel information theoretic framework for dictionary learning (DL) and sparse coding (SC) on a statistical manifold (the manifold of probability distributions). Unlike the traditional DL and SC framework, our new formulation does not explicitly incorporate any sparsity inducing norm in the cost function being optimized but yet yields sparse codes. Our algorithm approximates the data points on the statistical manifold (which are probability distributions) by the weighted Kullback-Leibeler center/mean (KL-center) of the dictionary atoms. The KL-center is defined as the minimizer of the maximum KL-divergence between itself and members of the set whose center is being sought. Further, we prove that the weighted KL-center is a sparse combination of the dictionary atoms. This result also holds for the case when the KL-divergence is replaced by the well known Hellinger distance. From an applications perspective, we present an extension of the aforementioned framework to the manifold of symmetric positive definite matrices (which can be identified with the manifold of zero mean gaussian distributions), $mathcal{P}_n$. We present experiments involving a variety of dictionary-based reconstruction and classification problems in Computer Vision. Performance of the proposed algorithm is demonstrated by comparing it to several state-of-the-art methods in terms of reconstruction and classification accuracy as well as sparsity of the chosen representation.