Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes

90 0 0.0 ( 0 )

Download Cite

Added by Junchi Li

Publication date 2018

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Chris Junchi Li - Zhaoran Wang - Han Liu

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Solving statistical learning problems often involves nonconvex optimization. Despite the empirical success of nonconvex statistical optimization methods, their global dynamics, especially convergence to the desirable local minima, remain less well understood in theory. In this paper, we propose a new analytic paradigm based on diffusion processes to characterize the global dynamics of nonconvex statistical optimization. As a concrete example, we study stochastic gradient descent (SGD) for the tensor decomposition formulation of independent component analysis. In particular, we cast different phases of SGD into diffusion processes, i.e., solutions to stochastic differential equations. Initialized from an unstable equilibrium, the global dynamics of SGD transit over three consecutive phases: (i) an unstable Ornstein-Uhlenbeck process slowly departing from the initialization, (ii) the solution to an ordinary differential equation, which quickly evolves towards the desirable local minimum, and (iii) a stable Ornstein-Uhlenbeck process oscillating around the desirable local minimum. Our proof techniques are based upon Stroock and Varadhans weak convergence of Markov chains to diffusion processes, which are of independent interest.

rate research

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

172 - Chris Junchi Li , Mengdi Wang , Han Liu 2018

In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Ojas iteration which is an online stochastic gradient descent method for the principal component analysis. Ojas iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Ojas iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Ojas iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples.

Machine Learning Machine Learning

An Optimistic Acceleration of AMSGrad for Nonconvex Optimization

81 - Jun-Kun Wang , Xiaoyun Li , Belhal Karimi 2019

We propose a new variant of AMSGrad, a popular adaptive gradient based optimization algorithm widely used for training deep neural networks. Our algorithm adds prior knowledge about the sequence of consecutive mini-batch gradients and leverages its underlying structure making the gradients sequentially predictable. By exploiting the predictability and ideas from optimistic online learning, the proposed algorithm can accelerate the convergence and increase sample efficiency. After establishing a tighter upper bound under some convexity conditions on the regret, we offer a complimentary view of our algorithm which generalizes the offline and stochastic version of nonconvex optimization. In the nonconvex case, we establish a non-asymptotic convergence bound independently of the initialization. We illustrate the practical speedup on several deep learning models via numerical experiments.

Machine Learning Machine Learning

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

130 - Zhuoran Yang , Lin F. Yang , Ethan X. Fang 2017

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying true statistical models. To address this issue, we take a first step towards taming model misspecification by studying the high-dimensional sparse phase retrieval problem with misspecified link functions. In particular, we propose a simple variant of the thresholded Wirtinger flow algorithm that, given a proper initialization, linearly converges to an estimator with optimal statistical accuracy for a broad family of unknown link functions. We further provide extensive numerical experiments to support our theoretical findings.

Machine Learning Machine Learning Optimization and Control

Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective

119 - Jacky Y. Zhang , Rajiv Khanna , Anastasios Kyrillidis 2020

Bayesian coresets have emerged as a promising approach for implementing scalable Bayesian inference. The Bayesian coreset problem involves selecting a (weighted) subset of the data samples, such that the posterior inference using the selected subset closely approximates the posterior inference using the full dataset. This manuscript revisits Bayesian coresets through the lens of sparsity constrained optimization. Leveraging recent advances in accelerated optimization methods, we propose and analyze a novel algorithm for coreset selection. We provide explicit convergence rate guarantees and present an empirical evaluation on a variety of benchmark datasets to highlight our proposed algorithms superior performance compared to state-of-the-art on speed and accuracy.

Machine Learning Machine Learning Computation

Discovering Latent Causal Variables via Mechanism Sparsity: A New Principle for Nonlinear ICA

82 - Sebastien Lachapelle , Pau Rodriguez Lopez , Remi Le Priol 2021

It can be argued that finding an interpretable low-dimensional representation of a potentially high-dimensional phenomenon is central to the scientific enterprise. Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application. This work proposes mechanism sparsity regularization as a new principle to achieve nonlinear ICA when latent factors depend sparsely on observed auxiliary variables and/or past latent factors. We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse and if some graphical criterion is satisfied by the data generating process. As a special case, our framework shows how one can leverage unknown-target interventions on the latent factors to disentangle them, thus drawing further connections between ICA and causality. We validate our theoretical results with toy experiments.

Machine Learning Machine Learning

Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions