No Arabic abstract
We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and Frank-Wolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed --rippling and linear-- depending on the level of momentum.
Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propose a greedy approximate projection which simultaneously captures appropriate notions of sparsity in distributions, while satisfying the simplex constraint, and investigate the convergence behavior of the resulting procedure in various settings. Our results show, both in theory and practice, that IHT can achieve state of the art results for learning sparse distributions.
Compressed sensing (CS) or sparse signal reconstruction (SSR) is a signal processing technique that exploits the fact that acquired data can have a sparse representation in some basis. One popular technique to reconstruct or approximate the unknown sparse signal is the iterative hard thresholding (IHT) which however performs very poorly under non-Gaussian noise conditions or in the face of outliers (gross errors). In this paper, we propose a robust IHT method based on ideas from $M$-estimation that estimates the sparse signal and the scale of the error distribution simultaneously. The method has a negligible performance loss compared to IHT under Gaussian noise, but superior performance under heavy-tailed non-Gaussian noise conditions.
We investigate a class of constrained sparse regression problem with cardinality penalty, where the feasible set is defined by box constraint, and the loss function is convex, but not necessarily smooth. First, we put forward a smoothing fast iterative hard thresholding (SFIHT) algorithm for solving such optimization problems, which combines smoothing approximations, extrapolation techniques and iterative hard thresholding methods. The extrapolation coefficients can be chosen to satisfy $sup_k beta_k=1$ in the proposed algorithm. We discuss the convergence behavior of the algorithm with different extrapolation coefficients, and give sufficient conditions to ensure that any accumulation point of the iterates is a local minimizer of the original cardinality penalized problem. In particular, for a class of fixed extrapolation coefficients, we discuss several different update rules of the smoothing parameter and obtain the convergence rate of $O(ln k/k)$ on the loss and objective function values. Second, we consider the case in which the loss function is Lipschitz continuously differentiable, and develop a fast iterative hard thresholding (FIHT) algorithm to solve it. We prove that the iterates of FIHT converge to a local minimizer of the problem that satisfies a desirable lower bound property. Moreover, we show that the convergence rate of loss and objective function values are $o(k^{-2})$. Finally, some numerical examples are presented to illustrate the theoretical results.
Recovery of low-rank matrices from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Recently, a tensor variant of the Iterative Hard Thresholding method was proposed and theoretical results were obtained that guarantee exact recovery of tensors with low Tucker rank. In this paper, we utilize the same tensor version of the Restricted Isometry Property (RIP) to extend these results for tensors with low CANDECOMP/PARAFAC (CP) rank. In doing so, we leverage recent results on efficient approximations of CP decompositions that remove the need for challenging assumptions in prior works. We complement our theoretical findings with empirical results that showcase the potential of the approach.
Low-rank tensor recovery problems have been widely studied in many applications of signal processing and machine learning. Tucker decomposition is known as one of the most popular decompositions in the tensor framework. In recent years, researchers have developed many state-of-the-art algorithms to address the problem of low-Tucker-rank tensor recovery. Motivated by the favorable properties of the stochastic algorithms, such as stochastic gradient descent and stochastic iterative hard thresholding, we aim to extend the well-known stochastic iterative hard thresholding algorithm to the tensor framework in order to address the problem of recovering a low-Tucker-rank tensor from its linear measurements. We have also developed linear convergence analysis for the proposed method and conducted a series of experiments with both synthetic and real data to illustrate the performance of the proposed method.