Proximal SCOPE for Distributed Sparse Learning: Better Data Partition Implies Faster Convergence Rate

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhao Shen-Yi

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Shen-Yi Zhao - Gong-Duo Zhang - Ming-Wei Li

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Distributed sparse learning with a cluster of multiple machines has attracted much attention in machine learning, especially for large-scale applications with high-dimensional data. One popular way to implement sparse learning is to use $L_1$ regularization. In this paper, we propose a novel method, called proximal mbox{SCOPE}~(mbox{pSCOPE}), for distributed sparse learning with $L_1$ regularization. pSCOPE is based on a underline{c}ooperative underline{a}utonomous underline{l}ocal underline{l}earning~(mbox{CALL}) framework. In the mbox{CALL} framework of mbox{pSCOPE}, we find that the data partition affects the convergence of the learning procedure, and subsequently we define a metric to measure the goodness of a data partition. Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough. We also prove that better data partition implies faster convergence rate. Furthermore, pSCOPE is also communication efficient. Experimental results on real data sets show that pSCOPE can outperform other state-of-the-art distributed methods for sparse learning.

قيم البحث

123 - Shixiang Chen , Shiqian Ma , Lingzhou Xue 2019

Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimiz ation problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experiment results are reported to demonstrate the advantages of our algorithm.

التعلم الالي التعلم الآلي التحسين والتحكم

A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

104 - Zhongruo Wang , Bingyuan Liu , Shixiang Chen 2020

Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper consi ders a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis.

التعلم الالي التعلم الآلي التحسين والتحكم

Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family

102 - Bai Jiang , Tung-yu Wu , Wing H. Wong 2016

In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates cite{wu2016convergence}. In this paper, we establish consistency and convergence rate of CD with annealed learning rate $eta_t$. Specifically, suppose CD-$m$ generates the sequence of parameters ${theta_t}_{t ge 0}$ using an i.i.d. data sample $mathbf{X}_1^n sim p_{theta^*}$ of size $n$, then $delta_n(mathbf{X}_1^n) = limsup_{t to infty} Vert sum_{s=t_0}^t eta_s theta_s / sum_{s=t_0}^t eta_s - theta^* Vert$ converges in probability to 0 at a rate of $1/sqrt[3]{n}$. The number ($m$) of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in cite{wu2016convergence}. which depends critically on the fact that ${theta_t}_{t ge 0}$ is a homogeneous Markov chain conditional on the observed sample $mathbf{X}_1^n$. Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible $2times 2$ Boltzmann Machine are provided to demonstrate our theoretical results.

التعلم الالي التعلم الآلي

Posterior Concentration for Sparse Deep Learning

324 - Nicholas Polson , Veronika Rockova 2018

Spike-and-Slab Deep Learning (SS-DL) is a fully Bayesian alternative to Dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with unknown levels of smoothn ess. Indeed, we show that the posterior distribution concentrates at the near minimax rate for $alpha$-Holder smooth maps, performing as well as if we knew the smoothness level $alpha$ ahead of time. Our result sheds light on architecture design for deep neural networks, namely the choice of depth, width and sparsity level. These network attributes typically depend on unknown smoothness in order to be optimal. We obviate this constraint with the fully Bayes construction. As an aside, we show that SS-DL does not overfit in the sense that the posterior concentrates on smaller networks with fewer (up to the optimal number of) nodes and links. Our results provide new theoretical justifications for deep ReLU networks from a Bayesian point of view.

التعلم الالي التعلم الآلي

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

83 - Jason Ge , Xingguo Li , Haoming Jiang 2020

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and sca led sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex $ell_1$, nonconvex MCP and SCAD regularizers. The library is coded in C++ and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently.

التعلم الالي التعلم الآلي التحسين والتحكم