ترغب بنشر مسار تعليمي؟ اضغط هنا

A Randomized Rounding Algorithm for Sparse PCA

84   0   0.0 ( 0 )
 نشر من قبل Kimon Fountoulakis
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present and analyze a simple, two-step algorithm to approximate the optimal solution of the sparse PCA problem. Our approach first solves a L1 penalized version of the NP-hard sparse PCA optimization problem and then uses a randomized rounding strategy to sparsify the resulting dense solution. Our main theoretical result guarantees an additive error approximation and provides a tradeoff between sparsity and accuracy. Our experimental evaluation indicates that our approach is competitive in practice, even compared to state-of-the-art toolboxes such as Spasm.



قيم البحث

اقرأ أيضاً

175 - Jin Cao , Dewei Zhong 2020
Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ stri ngs is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {em Random-MCS} often yields a solution to $LCS$.
We present a general approach to rounding semidefinite programming relaxations obtained by the Sum-of-Squares method (Lasserre hierarchy). Our approach is based on using the connection between these relaxations and the Sum-of-Squares proof system to transform a *combining algorithm* -- an algorithm that maps a distribution over solutions into a (possibly weaker) solution -- into a *rounding algorithm* that maps a solution of the relaxation to a solution of the original problem. Using this approach, we obtain algorithms that yield improved results for natural variants of three well-known problems: 1) We give a quasipolynomial-time algorithm that approximates the maximum of a low degree multivariate polynomial with non-negative coefficients over the Euclidean unit sphere. Beyond being of interest in its own right, this is related to an open question in quantum information theory, and our techniques have already led to improved results in this area (Brand~{a}o and Harrow, STOC 13). 2) We give a polynomial-time algorithm that, given a d dimensional subspace of R^n that (almost) contains the characteristic function of a set of size n/k, finds a vector $v$ in the subspace satisfying $|v|_4^4 > c(k/d^{1/3}) |v|_2^2$, where $|v|_p = (E_i v_i^p)^{1/p}$. Aside from being a natural relaxation, this is also motivated by a connection to the Small Set Expansion problem shown by Barak et al. (STOC 2012) and our results yield a certain improvement for that problem. 3) We use this notion of L_4 vs. L_2 sparsity to obtain a polynomial-time algorithm with substantially improved guarantees for recovering a planted $mu$-sparse vector v in a random d-dimensional subspace of R^n. If v has mu n nonzero coordinates, we can recover it with high probability whenever $mu < O(min(1,n/d^2))$, improving for $d < n^{2/3}$ prior methods which intrinsically required $mu < O(1/sqrt(d))$.
330 - Lei Wang , Xin Liu , Yin Zhang 2021
As a prominent variant of principal component analysis (PCA), sparse PCA attempts to find sparse loading vectors when conducting dimension reduction. This paper aims to calculate sparse PCA through solving an optimization problem pursuing orthogonali ty and sparsity simultaneously. We propose a splitting and alternating approach, leading to an efficient distributed algorithm, called DAL1, for solving this nonconvex and nonsmooth optimization problem. Convergence of DAL1 to stationary points has been rigorously established. Computational experiments demonstrate that, due to its fast convergence in terms of iteration count, DAL1 requires far fewer rounds of communications to reach the prescribed accuracy than those required by existing peer methods. Unlike existing algorithms, there is a relatively small possibility of data leakage for DAL1.
In this paper we propose a new algorithm for streaming principal component analysis. With limited memory, small devices cannot store all the samples in the high-dimensional regime. Streaming principal component analysis aims to find the $k$-dimension al subspace which can explain the most variation of the $d$-dimensional data points that come into memory sequentially. In order to deal with large $d$ and large $N$ (number of samples), most streaming PCA algorithms update the current model using only the incoming sample and then dump the information right away to save memory. However the information contained in previously streamed data could be useful. Motivated by this idea, we develop a new streaming PCA algorithm called History PCA that achieves this goal. By using $O(Bd)$ memory with $Bapprox 10$ being the block size, our algorithm converges much faster than existing streaming PCA algorithms. By changing the number of inner iterations, the memory usage can be further reduced to $O(d)$ while maintaining a comparable convergence speed. We provide theoretical guarantees for the convergence of our algorithm along with the rate of convergence. We also demonstrate on synthetic and real world data sets that our algorithm compares favorably with other state-of-the-art streaming PCA methods in terms of the convergence speed and performance.
Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimiz ation problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experiment results are reported to demonstrate the advantages of our algorithm.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا