No Arabic abstract
Iterative hard thresholding (IHT) is a projected gradient descent algorithm, known to achieve state of the art performance for a wide range of structured estimation problems, such as sparse inference. In this work, we consider IHT as a solution to the problem of learning sparse discrete distributions. We study the hardness of using IHT on the space of measures. As a practical alternative, we propose a greedy approximate projection which simultaneously captures appropriate notions of sparsity in distributions, while satisfying the simplex constraint, and investigate the convergence behavior of the resulting procedure in various settings. Our results show, both in theory and practice, that IHT can achieve state of the art results for learning sparse distributions.
The sparse inverse covariance estimation problem is commonly solved using an $ell_{1}$-regularized Gaussian maximum likelihood estimator known as graphical lasso, but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $epsilon$-accurate solution in $O(nlog(1/epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.
We study --both in theory and practice-- the use of momentum motions in classic iterative hard thresholding (IHT) methods. By simply modifying plain IHT, we investigate its convergence behavior on convex optimization criteria with non-convex constraints, under standard assumptions. In diverse scenaria, we observe that acceleration in IHT leads to significant improvements, compared to state of the art projected gradient descent and Frank-Wolfe variants. As a byproduct of our inspection, we study the impact of selecting the momentum parameter: similar to convex settings, two modes of behavior are observed --rippling and linear-- depending on the level of momentum.
Sparse Canonical Correlation Analysis (CCA) has received considerable attention in high-dimensional data analysis to study the relationship between two sets of random variables. However, there has been remarkably little theoretical statistical foundation on sparse CCA in high-dimensional settings despite active methodological and applied research activities. In this paper, we introduce an elementary sufficient and necessary characterization such that the solution of CCA is indeed sparse, propose a computationally efficient procedure, called CAPIT, to estimate the canonical directions, and show that the procedure is rate-optimal under various assumptions on nuisance parameters. The procedure is applied to a breast cancer dataset from The Cancer Genome Atlas project. We identify methylation probes that are associated with genes, which have been previously characterized as prognosis signatures of the metastasis of breast cancer.
Compressed sensing (CS) or sparse signal reconstruction (SSR) is a signal processing technique that exploits the fact that acquired data can have a sparse representation in some basis. One popular technique to reconstruct or approximate the unknown sparse signal is the iterative hard thresholding (IHT) which however performs very poorly under non-Gaussian noise conditions or in the face of outliers (gross errors). In this paper, we propose a robust IHT method based on ideas from $M$-estimation that estimates the sparse signal and the scale of the error distribution simultaneously. The method has a negligible performance loss compared to IHT under Gaussian noise, but superior performance under heavy-tailed non-Gaussian noise conditions.
We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for education applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA), that jointly (i) traces learner concept knowledge over time, (ii) analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc, or the forgetting effect), and (iii) estimates the content organization and intrinsic difficulty of the assessment questions. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and a summary of the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instance. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learners concept knowledge evolution over time, as well as analyzing the quality and content organization of learning resources, the question-concept associations, and the question intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses than existing collaborative filtering and knowledge tracing approaches for personalized education.