Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

129 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Deeksha Adil

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Deeksha Adil - Richard Peng - Sushant Sachdeva

بنى وهياكل البيانات والخوارزميات التعلم الآلي التحليل العددي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Linear regression in $ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p in [2,infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+varepsilon)$-approximate solution in $O(p^{3.5} m^{frac{p-2}{2(p-1)}} log frac{m}{varepsilon}) le O_p(sqrt{m} log frac{m}{varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.

قيم البحث

114 - Deeksha Adil , Rasmus Kyng , Richard Peng 2019

We give improved algorithms for the $ell_{p}$-regression problem, $min_{x} |x|_{p}$ such that $A x=b,$ for all $p in (1,2) cup (2,infty).$ Our algorithms obtain a high accuracy solution in $tilde{O}_{p}(m^{frac{|p-2|}{2p + |p-2|}}) le tilde{O}_{p}(m^ {frac{1}{3}})$ iterations, where each iteration requires solving an $m times m$ linear system, $m$ being the dimension of the ambient space. By maintaining an approximate inverse of the linear systems that we solve in each iteration, we give algorithms for solving $ell_{p}$-regression to $1 / text{poly}(n)$ accuracy that run in time $tilde{O}_p(m^{max{omega, 7/3}}),$ where $omega$ is the matrix multiplication constant. For the current best value of $omega > 2.37$, we can thus solve $ell_{p}$ regression as fast as $ell_{2}$ regression, for all constant $p$ bounded away from $1.$ Our algorithms can be combined with fast graph Laplacian linear equation solvers to give minimum $ell_{p}$-norm flow / voltage solutions to $1 / text{poly}(n)$ accuracy on an undirected graph with $m$ edges in $tilde{O}_{p}(m^{1 + frac{|p-2|}{2p + |p-2|}}) le tilde{O}_{p}(m^{frac{4}{3}})$ time. For sparse graphs and for matrices with similar dimensions, our iteration counts and running times improve on the $p$-norm regression algorithm by [Bubeck-Cohen-Lee-Li STOC`18] and general-purpose convex optimization algorithms. At the core of our algorithms is an iterative refinement scheme for $ell_{p}$-norms, using the smoothed $ell_{p}$-norms introduced in the work of Bubeck et al. Given an initial solution, we construct a problem that seeks to minimize a quadratically-smoothed $ell_{p}$ norm over a subspace, such that a crude solution to this problem allows us to improve the initial solution by a constant factor, leading to algorithms with fast convergence.

بنى وهياكل البيانات والخوارزميات التحليل العددي التحسين والتحكم

List-Decodable Linear Regression

77 - Sushrut Karmalkar , Adam R. Klivans , Pravesh K. Kothari 2019

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $alpha < 1$, our algorithm takes as input a sample ${(x_i,y_i)}_{i leq n}$ of $n$ linear equations where $alpha n$ of the equations satisfy $y_i = langle x_i,ell^*rangle +zeta$ for some small noise $zeta$ and $(1-alpha)n$ of the equations are {em arbitrarily} chosen. It outputs a list $L$ of size $O(1/alpha)$ - a fixed constant - that contains an $ell$ that is close to $ell^*$. Our algorithm succeeds whenever the inliers are chosen from a emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/alpha)^{O(1/alpha^8)}$ time algorithm to find a $O(1/alpha)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

$ell_2$-norm Flow Diffusion in Near-Linear Time

84 - Li Chen , Richard Peng , 2021

Diffusion is a fundamental graph procedure and has been a basic building block in a wide range of theoretical and empirical applications such as graph partitioning and semi-supervised learning on graphs. In this paper, we study computationally effici ent diffusion primitives beyond random walk. We design an $widetilde{O}(m)$-time randomized algorithm for the $ell_2$-norm flow diffusion problem, a recently proposed diffusion model based on network flow with demonstrated graph clustering related applications both in theory and in practice. Examples include finding locally-biased low conductance cuts. Using a known connection between the optimal dual solution of the flow diffusion problem and the local cut structure, our algorithm gives an alternative approach for finding such cuts in nearly linear time. From a technical point of view, our algorithm contributes a novel way of dealing with inequality constraints in graph optimization problems. It adapts the high-level algorithmic framework of nearly linear time Laplacian system solvers, but requires several new tools: vertex elimination under constraints, a new family of graph ultra-sparsifiers, and accelerated proximal gradient methods with inexact proximal mapping computation.

بنى وهياكل البيانات والخوارزميات التعلم الآلي

Provably Convergent Learned Inexact Descent Algorithm for Low-Dose CT Reconstruction

120 - Qingchao Zhang , Mehrdad Alvandipour , Wenjun Xia 2021

We propose a provably convergent method, called Efficient Learned Descent Algorithm (ELDA), for low-dose CT (LDCT) reconstruction. ELDA is a highly interpretable neural network architecture with learned parameters and meanwhile retains convergence gu arantee as classical optimization algorithms. To improve reconstruction quality, the proposed ELDA also employs a new non-local feature mapping and an associated regularizer. We compare ELDA with several state-of-the-art deep image methods, such as RED-CNN and Learned Primal-Dual, on a set of LDCT reconstruction problems. Numerical experiments demonstrate improvement of reconstruction quality using ELDA with merely 19 layers, suggesting the promising performance of ELDA in solution accuracy and parameter efficiency.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Simple and Fast Algorithm for Binary Integer and Online Linear Programming

129 - Xiaocheng Li , Chunlin Sun , Yinyu Ye 2020

In this paper, we develop a simple and fast online algorithm for solving a class of binary integer linear programs (LPs) arisen in general resource allocation problem. The algorithm requires only one single pass through the input data and is free of doing any matrix inversion. It can be viewed as both an approximate algorithm for solving binary integer LPs and a fast algorithm for solving online LP problems. The algorithm is inspired by an equivalent form of the dual problem of the relaxed LP and it essentially performs (one-pass) projected stochastic subgradient descent in the dual space. We analyze the algorithm in two different models, stochastic input and random permutation, with minimal technical assumptions on the input data. The algorithm achieves $Oleft(m sqrt{n}right)$ expected regret under the stochastic input model and $Oleft((m+log n)sqrt{n}right)$ expected regret under the random permutation model, and it achieves $O(m sqrt{n})$ expected constraint violation under both models, where $n$ is the number of decision variables and $m$ is the number of constraints. The algorithm enjoys the same performance guarantee when generalized to a multi-dimensional LP setting which covers a wider range of applications. In addition, we employ the notion of permutational Rademacher complexity and derive regret bounds for two earlier online LP algorithms for comparison. Both algorithms improve the regret bound with a factor of $sqrt{m}$ by paying more computational cost. Furthermore, we demonstrate how to convert the possibly infeasible solution to a feasible one through a randomized procedure. Numerical experiments illustrate the general applicability and effectiveness of the algorithms.

بنى وهياكل البيانات والخوارزميات التحسين والتحكم