Successive Ray Refinement and Its Application to Coordinate Descent for LASSO

104 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jun Liu

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jun Liu - Zheng Zhao - Ruiwen Zhang

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Coordinate descent is one of the most popular approaches for solving Lasso and its extensions due to its simplicity and efficiency. When applying coordinate descent to solving Lasso, we update one coordinate at a time while fixing the remaining coordinates. Such an update, which is usually easy to compute, greedily decreases the objective function value. In this paper, we aim to improve its computational efficiency by reducing the number of coordinate descent iterations. To this end, we propose a novel technique called Successive Ray Refinement (SRR). SRR makes use of the following ray continuation property on the successive iterations: for a particular coordinate, the value obtained in the next iteration almost always lies on a ray that starts at its previous iteration and passes through the current iteration. Motivated by this ray-continuation property, we propose that coordinate descent be performed not directly on the previous iteration but on a refined search point that has the following properties: on one hand, it lies on a ray that starts at a history solution and passes through the previous iteration, and on the other hand, it achieves the minimum objective function value among all the points on the ray. We propose two schemes for defining the search point and show that the refined search point can be efficiently obtained. Empirical results for real and synthetic data sets show that the proposed SRR can significantly reduce the number of coordinate descent iterations, especially for small Lasso regularization parameters.

قيم البحث

413 - Binbin Lin , Qingyang Li , Qian Sun 2014

textit{Drosophila melanogaster} has been established as a model organism for investigating the fundamental principles of developmental gene interactions. The gene expression patterns of textit{Drosophila melanogaster} can be documented as digital ima ges, which are annotated with anatomical ontology terms to facilitate pattern discovery and comparison. The automated annotation of gene expression pattern images has received increasing attention due to the recent expansion of the image database. The effectiveness of gene expression pattern annotation relies on the quality of feature representation. Previous studies have demonstrated that sparse coding is effective for extracting features from gene expression images. However, solving sparse coding remains a computationally challenging problem, especially when dealing with large-scale data sets and learning large size dictionaries. In this paper, we propose a novel algorithm to solve the sparse coding problem, called Stochastic Coordinate Coding (SCC). The proposed algorithm alternatively updates the sparse codes via just a few steps of coordinate descent and updates the dictionary via second order stochastic gradient descent. The computational cost is further reduced by focusing on the non-zero components of the sparse codes and the corresponding columns of the dictionary only in the updating procedure. Thus, the proposed algorithm significantly improves the efficiency and the scalability, making sparse coding applicable for large-scale data sets and large dictionary sizes. Our experiments on Drosophila gene expression data sets demonstrate the efficiency and the effectiveness of the proposed algorithm.

التعلم الآلي الهندسة الحاسوبية، المالية،العلوم

Rate-Distortion Theoretic Model Compression: Successive Refinement for Pruning

167 - Berivan Isik , Albert No , Tsachy Weissman 2021

We study the neural network (NN) compression problem, viewing the tension between the compression ratio and NN performance through the lens of rate-distortion theory. We choose a distortion metric that reflects the effect of NN compression on the mod el output and then derive the tradeoff between rate (compression ratio) and distortion. In addition to characterizing theoretical limits of NN compression, this formulation shows that emph{pruning}, implicitly or explicitly, must be a part of a good compression algorithm. This observation bridges a gap between parts of the literature pertaining to NN and data compression, respectively, providing insight into the empirical success of pruning for NN compression. Finally, we propose a novel pruning strategy derived from our information-theoretic formulation and show that it outperforms the relevant baselines on CIFAR-10 and ImageNet datasets.

التعلم الآلي نظرية المعلومات معالجة الإشارات

Parallel Coordinate Descent for L1-Regularized Loss Minimization

418 - Joseph K. Bradley , Aapo Kyrola , Danny Bickson 2011

We propose Shotgun, a parallel coordinate descent algorithm for minimizing L1-regularized losses. Though coordinate descent seems inherently sequential, we prove convergence bounds for Shotgun which predict linear speedups, up to a problem-dependent limit. We present a comprehensive empirical study of Shotgun for Lasso and sparse logistic regression. Our theoretical predictions on the potential for parallelism closely match behavior on real data. Shotgun outperforms other published solvers on a range of large problems, proving to be one of the most scalable algorithms for L1.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Faster Coordinate Descent via Adaptive Importance Sampling

157 - Dmytro Perekrestenko , Volkan Cevher , Martin Jaggi 2017

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean th at our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التحسين والتحكم

Holonomic Gradient Descent and its Application to Fisher-Bingham Integral

460 - Tomonari Sei , Nobuki Takayama , Akimichi Takemura 2010

We give a new algorithm to find local maximum and minimum of a holonomic function and apply it for the Fisher-Bingham integral on the sphere $S^n$, which is used in the directional statistics. The method utilizes the theory and algorithms of holonomic systems.

الحساب الرمزي التحسين والتحكم نظرية الإحصاء