A Homotopic Method to Solve the Lasso Problems with an Improved Upper Bound of Convergence Rate

129 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yujie Zhao

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Yujie Zhao - Xiaoming Huo

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In optimization, it is known that when the objective functions are strictly convex and well-conditioned, gradient based approaches can be extremely effective, e.g., achieving the exponential rate in convergence. On the other hand, the existing Lasso-type of estimator in general cannot achieve the optimal rate due to the undesirable behavior of the absolute function at the origin. A homotopic method is to use a sequence of surrogate functions to approximate the $ell_1$ penalty that is used in the Lasso-type of estimators. The surrogate functions will converge to the $ell_1$ penalty in the Lasso estimator. At the same time, each surrogate function is strictly convex, which enables provable faster numerical rate of convergence. In this paper, we demonstrate that by meticulously defining the surrogate functions, one can prove faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators. Namely, the state-of-the-art algorithms can only guarantee $O(1/epsilon)$ or $O(1/sqrt{epsilon})$ convergence rates, while we can prove an $O([log(1/epsilon)]^2)$ for the newly proposed algorithm. Our numerical simulations show that the new algorithm also performs better empirically.

قيم البحث

310 - Raphael Hauser , Heinrich Matzinger , Ionel Popescu 2014

Consider finite sequences $X_{[1,n]}=X_1dots X_n$ and $Y_{[1,n]}=Y_1dots Y_n$ of length $n$, consisting of i.i.d. samples of random letters from a finite alphabet, and let $S$ and $T$ be chosen i.i.d. randomly from the unit ball in the space of symme tric scoring functions over this alphabet augmented by a gap symbol. We prove a probabilistic upper bound of linear order in $n^{0.75}$ for the deviation of the score relative to $T$ of optimal alignments with gaps of $X_{[1,n]}$ and $Y_{[1,n]}$ relative to $S$. It remains an open problem to prove a lower bound. Our result contributes to the understanding of the microstructure of optimal alignments relative to one given scoring function, extending a theory begun by the first two authors.

الاحتمالات

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

330 - Nicolas Le Roux , Mark Schmidtn (INRIA Paris - Rocquencourt 2012

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method inc orporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

التحسين والتحكم التعلم الآلي

A Numerical Method to solve Optimal Transport Problems with Coulomb Cost

448 - Jean-David Benamou , Guillaume Carlier , Luca Nenna 2015

In this paper, we present a numerical method, based on iterative Bregman projections, to solve the optimal transport problem with Coulomb cost. This is related to the strong interaction limit of Density Functional Theory. The first idea is to introdu ce an entropic regularization of the Kantorovich formulation of the Optimal Transport problem. The regularized problem then corresponds to the projection of a vector on the intersection of the constraints with respect to the Kullback-Leibler distance. Iterative Bregman projections on each marginal constraint are explicit which enables us to approximate the optimal transport plan. We validate the numerical method against analytical test cases.

التحليل العددي الفيزياء الرياضية الفيزياء الرياضية

Upper Trust Bound Feasibility Criterion for Mixed Constrained Bayesian Optimization with Application to Aircraft Design

238 - Remy Priem 2020

Bayesian optimization methods have been successfully applied to black box optimization problems that are expensive to evaluate. In this paper, we adapt the so-called super effcient global optimization algorithm to solve more accurately mixed constrai ned problems. The proposed approach handles constraints by means of upper trust bound, the latter encourages exploration of the feasible domain by combining the mean prediction and the associated uncertainty function given by the Gaussian processes. On top of that, a refinement procedure, based on a learning rate criterion, is introduced to enhance the exploitation and exploration trade-off. We show the good potential of the approach on a set of numerical experiments. Finally, we present an application to conceptual aircraft configuration upon which we show the superiority of the proposed approach compared to a set of the state-of-the-art black box optimization solvers. Keywords: Global Optimization, Mixed Constrained Optimization, Black box optimization, Bayesian Optimization, Gaussian Process.

التعلم الالي التعلم الآلي التحسين والتحكم

Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family

102 - Bai Jiang , Tung-yu Wu , Wing H. Wong 2016

In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates cite{wu2016convergence}. In this paper, we establish consistency and convergence rate of CD with annealed learning rate $eta_t$. Specifically, suppose CD-$m$ generates the sequence of parameters ${theta_t}_{t ge 0}$ using an i.i.d. data sample $mathbf{X}_1^n sim p_{theta^*}$ of size $n$, then $delta_n(mathbf{X}_1^n) = limsup_{t to infty} Vert sum_{s=t_0}^t eta_s theta_s / sum_{s=t_0}^t eta_s - theta^* Vert$ converges in probability to 0 at a rate of $1/sqrt[3]{n}$. The number ($m$) of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in cite{wu2016convergence}. which depends critically on the fact that ${theta_t}_{t ge 0}$ is a homogeneous Markov chain conditional on the observed sample $mathbf{X}_1^n$. Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible $2times 2$ Boltzmann Machine are provided to demonstrate our theoretical results.

التعلم الالي التعلم الآلي