Global Non-convex Optimization with Discretized Diffusions

171 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Murat A. Erdogdu

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Murat A. Erdogdu - Lester Mackey - Ohad Shamir

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds based on easily accessed properties of the objective and chosen diffusion. Central to our approach are new explicit Stein factor bounds on the solutions of Poisson equations. We complement these results with improved optimization guarantees for targets other than the standard Gibbs measure.

قيم البحث

409 - Shen-Yi Zhao , Gong-Duo Zhang , Wu-Jun Li 2016

Stochastic gradient descent~(SGD) and its variants have attracted much attention in machine learning due to their efficiency and effectiveness for optimization. To handle large-scale problems, researchers have recently proposed several lock-free stra tegy based parallel SGD~(LF-PSGD) methods for multi-core systems. However, existing works have only proved the convergence of these LF-PSGD methods for convex problems. To the best of our knowledge, no work has proved the convergence of the LF-PSGD methods for non-convex problems. In this paper, we provide the theoretical proof about the convergence of two representative LF-PSGD methods, Hogwild! and AsySVRG, for non-convex problems. Empirical results also show that both Hogwild! and AsySVRG are convergent on non-convex problems, which successfully verifies our theoretical results.

التعلم الالي التعلم الآلي

Spectral Non-Convex Optimization for Dimension Reduction with Hilbert-Schmidt Independence Criterion

321 - Chieh Wu , Jared Miller , Yale Chang 2019

The Hilbert Schmidt Independence Criterion (HSIC) is a kernel dependence measure that has applications in various aspects of machine learning. Conveniently, the objectives of different dimensionality reduction applications using HSIC often reduce to the same optimization problem. However, the nonconvexity of the objective function arising from non-linear kernels poses a serious challenge to optimization efficiency and limits the potential of HSIC-based formulations. As a result, only linear kernels have been computationally tractable in practice. This paper proposes a spectral-based optimization algorithm that extends beyond the linear kernel. The algorithm identifies a family of suitable kernels and provides the first and second-order local guarantees when a fixed point is reached. Furthermore, we propose a principled initialization strategy, thereby removing the need to repeat the algorithm at random initialization points. Compared to state-of-the-art optimization algorithms, our empirical results on real data show a run-time improvement by as much as a factor of $10^5$ while consistently achieving lower cost and classification/clustering errors. The implementation source code is publicly available on https://github.com/endsley.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization

73 - Xuefeng Gao , Mert Gurbuzbalaban , Lingjiong Zhu 2018

Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efficient algorithm to find local minima while eventually visiting a global minimum on longer time-scales. LD is based on the first-order La ngevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD). Adopting the techniques of Tzen, Liang and Raginsky (2018) for LD to non-reversible diffusions, we show that for a given local minimum that is within an arbitrary distance from the initialization, with high probability, either the ULD trajectory ends up somewhere outside a small neighborhood of this local minimum within a recurrence time which depends on the smallest eigenvalue of the Hessian at the local minimum or they enter this neighborhood by the recurrence time and stay there for a potentially exponentially long escape time. The ULD algorithms improve upon the recurrence time obtained for LD in Tzen, Liang and Raginsky (2018) with respect to the dependency on the smallest eigenvalue of the Hessian at the local minimum. Similar result and improvement are obtained for the NLD algorithm. We also show that non-reversible variants can exit the basin of attraction of a local minimum faster in discrete time when the objective has two local minima separated by a saddle point and quantify the amount of improvement. Our analysis suggests that non-reversible Langevin algorithms are more efficient to locate a local minimum as well as exploring the state space. Our analysis is based on the quadratic approximation of the objective around a local minimum. As a by-product of our analysis, we obtain optimal mixing rates for quadratic objectives in the 2-Wasserstein distance for two non-reversible Langevin algorithms we consider.

التحسين والتحكم التعلم الآلي التحليل العددي

Combinatorial Bayesian Optimization with Random Mapping Functions to Convex Polytope

111 - Jungtaek Kim , Minsu Cho , Seungjin Choi 2020

Bayesian optimization is a popular method for solving the problem of global optimization of an expensive-to-evaluate black-box function. It relies on a probabilistic surrogate model of the objective function, upon which an acquisition function is bui lt to determine where next to evaluate the objective function. In general, Bayesian optimization with Gaussian process regression operates on a continuous space. When input variables are categorical or discrete, an extra care is needed. A common approach is to use one-hot encoded or Boolean representation for categorical variables which might yield a {em combinatorial explosion} problem. In this paper we present a method for Bayesian optimization in a combinatorial space, which can operate well in a large combinatorial space. The main idea is to use a random mapping which embeds the combinatorial space into a convex polytope in a continuous space, on which all essential process is performed to determine a solution to the black-box optimization in the combinatorial space. We describe our {em combinatorial Bayesian optimization} algorithm and present its regret analysis. Numerical experiments demonstrate that our method outperforms existing methods.

التعلم الالي التعلم الآلي

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

192 - Yikai Yan , Chaoyue Niu , Yucheng Ding 2020

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue when deploying federate d learning in mobile environments: intermittent client availability, where the set of eligible clients may change during the training process. Such intermittent client availability would seriously deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). Thus, we propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(E^{1/2}/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement FedLaAvg along with several baselines and evaluate them over the benchmarking MNIST and Sentiment140 datasets. The evaluation results demonstrate that FedLaAvg achieves more stable training than FedAvg in both convex and non-convex settings and indeed reaches a sublinear speedup.

التعلم الالي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي