No Arabic abstract
The recently proposed multi-layer sparse model has raised insightful connections between sparse representations and convolutional neural networks (CNN). In its original conception, this model was restricted to a cascade of convolutional synthesis representations. In this paper, we start by addressing a more general model, revealing interesting ties to fully connected networks. We then show that this multi-layer construction admits a brand new interpretation in a unique symbiosis between synthesis and analysis models: while the deepest layer indeed provides a synthesis representation, the mid-layers decompositions provide an analysis counterpart. This new perspective exposes the suboptimality of previously proposed pursuit approaches, as they do not fully leverage all the information comprised in the model constraints. Armed with this understanding, we address fundamental theoretical issues, revisiting previous analysis and expanding it. Motivated by the limitations of previous algorithms, we then propose an integrated - holistic - alternative that estimates all representations in the model simultaneously, and analyze all these different schemes under stochastic noise assumptions. Inspired by the synthesis-analysis duality, we further present a Holistic Pursuit algorithm, which alternates between synthesis and analysis sparse coding steps, eventually solving for the entire model as a whole, with provable improved performance. Finally, we present numerical results that demonstrate the practical advantages of our approach.
State-of-the-art methods for Convolutional Sparse Coding usually employ Fourier-domain solvers in order to speed up the convolution operators. However, this approach is not without shortcomings. For example, Fourier-domain representations implicitly assume circular boundary conditions and make it hard to fully exploit the sparsity of the problem as well as the small spatial support of the filters. In this work, we propose a novel stochastic spatial-domain solver, in which a randomized subsampling strategy is introduced during the learning sparse codes. Afterwards, we extend the proposed strategy in conjunction with online learning, scaling the CSC model up to very large sample sizes. In both cases, we show experimentally that the proposed subsampling strategy, with a reasonable selection of the subsampling rate, outperforms the state-of-the-art frequency-domain solvers in terms of execution time without losing the learning quality. Finally, we evaluate the effectiveness of the over-complete dictionary learned from large-scale datasets, which demonstrates an improved sparse representation of the natural images on account of more abundant learned image features.
Signal models based on sparse representations have received considerable attention in recent years. On the other hand, deep models consisting of a cascade of functional layers, commonly known as deep neural networks, have been highly successful for the task of object classification and have been recently introduced to image reconstruction. In this work, we develop a new image reconstruction approach based on a novel multi-layer model learned in an unsupervised manner by combining both sparse representations and deep models. The proposed framework extends the classical sparsifying transform model for images to a Multi-lAyer Residual Sparsifying transform (MARS) model, wherein the transform domain data are jointly sparsified over layers. We investigate the application of MARS models learned from limited regular-dose images for low-dose CT reconstruction using Penalized Weighted Least Squares (PWLS) optimization. We propose new formulations for multi-layer transform learning and image reconstruction. We derive an efficient block coordinate descent algorithm to learn the transforms across layers, in an unsupervised manner from limited regular-dose images. The learned model is then incorporated into the low-dose image reconstruction phase. Low-dose CT experimental results with both the XCAT phantom and Mayo Clinic data show that the MARS model outperforms conventional methods such as FBP and PWLS methods based on the edge-preserving (EP) regularizer in terms of two numerical metrics (RMSE and SSIM) and noise suppression. Compared with the single-layer learned transform (ST) model, the MARS model performs better in maintaining some subtle details.
We propose an intra frame predictive strategy for compression of 3D point cloud attributes. Our approach is integrated with the region adaptive graph Fourier transform (RAGFT), a multi-resolution transform formed by a composition of localized block transforms, which produces a set of low pass (approximation) and high pass (detail) coefficients at multiple resolutions. Since the transform operations are spatially localized, RAGFT coefficients at a given resolution may still be correlated. To exploit this phenomenon, we propose an intra-prediction strategy, in which decoded approximation coefficients are used to predict uncoded detail coefficients. The prediction residuals are then quantized and entropy coded. For the 8i dataset, we obtain gains up to 0.5db as compared to intra predicted point cloud compresion based on the region adaptive Haar transform (RAHT).
Topological data analysis (TDA) has emerged as one of the most promising techniques to reconstruct the unknown shapes of high-dimensional spaces from observed data samples. TDA, thus, yields key shape descriptors in the form of persistent topological features that can be used for any supervised or unsupervised learning task, including multi-way classification. Sparse sampling, on the other hand, provides a highly efficient technique to reconstruct signals in the spatial-temporal domain from just a few carefully-chosen samples. Here, we present a new method, referred to as the Sparse-TDA algorithm, that combines favorable aspects of the two techniques. This combination is realized by selecting an optimal set of sparse pixel samples from the persistent features generated by a vector-based TDA algorithm. These sparse samples are selected from a low-rank matrix representation of persistent features using QR pivoting. We show that the Sparse-TDA method demonstrates promising performance on three benchmark problems related to human posture recognition and image texture classification.
Sparse coding is a class of unsupervised methods for learning a sparse representation of the input data in the form of a linear combination of a dictionary and a sparse code. This learning framework has led to state-of-the-art results in various image and video processing tasks. However, classical methods learn the dictionary and the sparse code based on alternating optimizations, usually without theoretical guarantees for either optimality or convergence due to non-convexity of the problem. Recent works on sparse coding with a complete dictionary provide strong theoretical guarantees thanks to the development of the non-convex optimization. However, initial non-convex approaches learn the dictionary in the sparse coding problem sequentially in an atom-by-atom manner, which leads to a long execution time. More recent works seek to directly learn the entire dictionary at once, which substantially reduces the execution time. However, the associated recovery performance is degraded with a finite number of data samples. In this paper, we propose an efficient sparse coding scheme with a two-stage optimization. The proposed scheme leverages the global and local Riemannian geometry of the two-stage optimization problem and facilitates fast implementation for superb dictionary recovery performance by a finite number of samples without atom-by-atom calculation. We further prove that, with high probability, the proposed scheme can exactly recover any atom in the target dictionary with a finite number of samples if it is adopted to recover one atom of the dictionary. An application on wireless sensor data compression is also proposed. Experiments on both synthetic and real-world data verify the efficiency and effectiveness of the proposed scheme.