New community

Subscribe to the gold package and get unlimited access to Shamra Academy

From Averaging to Acceleration, There is Only a Step-size

397 0 0.0 ( 0 )

Download Cite

Added by Nicolas Flammarion

Publication date 2015

fields Mathematical Statistics

and research's language is English

Authors Nicolas Flammarion - Francis Bachn (LIENS

Machine Learning Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system , showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.

rate research

Is There an Analog of Nesterov Acceleration for MCMC?

91 - Yi-An Ma , Niladri Chatterji , Xiang Cheng 2019

We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback-Leibler (KL) divergence as the objective functional. We show that an underdamped form of the Langevin algorithm performs accelerated gradient descent in this metric. To characterize the convergence of the algorithm, we construct a Lyapunov functional and exploit hypocoercivity of the underdamped Langevin algorithm. As an application, we show that accelerated rates can be obtained for a class of nonconvex functions with the Langevin algorithm.

Machine Learning Machine Learning Numerical Analysis

Gradient flow encoding with distance optimization adaptive step size

78 - Kyriakos Flouris , Anna Volokitin , Gustav Bredell 2021

The autoencoder model uses an encoder to map data samples to a lower dimensional latent space and then a decoder to map the latent space representations back to the data space. Implicitly, it relies on the encoder to approximate the inverse of the decoder network, so that samples can be mapped to and back from the latent space faithfully. This approximation may lead to sub-optimal latent space representations. In this work, we investigate a decoder-only method that uses gradient flow to encode data samples in the latent space. The gradient flow is defined based on a given decoder and aims to find the optimal latent space representation for any given sample through optimisation, eliminating the need of an approximate inversion through an encoder. Implementing gradient flow through ordinary differential equations (ODE), we leverage the adjoint method to train a given decoder. We further show empirically that the costly integrals in the adjoint method may not be entirely necessary. Additionally, we propose a $2^{nd}$ order ODE variant to the method, which approximates Nesterovs accelerated gradient descent, with faster convergence per iteration. Commonly used ODE solvers can be quite sensitive to the integration step-size depending on the stiffness of the ODE. To overcome the sensitivity for gradient flow encoding, we use an adaptive solver that prioritises minimising loss at each integration step. We assess the proposed method in comparison to the autoencoding model. In our experiments, GFE showed a much higher data-efficiency than the autoencoding model, which can be crucial for data scarce applications.

Machine Learning Machine Learning Applications

Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing

288 - Sanghamitra Dutta , Dennis Wei , Hazar Yueksel 2019

A trade-off between accuracy and fairness is almost taken as a given in the existing literature on fairness in machine learning. Yet, it is not preordained that accuracy should decrease with increased fairness. Novel to this work, we examine fair classification through the lens of mismatched hypothesis testing: trying to find a classifier that distinguishes between two ideal distributions when given two mismatched distributions that are biased. Using Chernoff information, a tool in information theory, we theoretically demonstrate that, contrary to popular belief, there always exist ideal distributions such that optimal fairness and accuracy (with respect to the ideal distributions) are achieved simultaneously: there is no trade-off. Moreover, the same classifier yields the lack of a trade-off with respect to ideal distributions while yielding a trade-off when accuracy is measured with respect to the given (possibly biased) dataset. To complement our main result, we formulate an optimization to find ideal distributions and derive fundamental limits to explain why a trade-off exists on the given biased dataset. We also derive conditions under which active data collection can alleviate the fairness-accuracy trade-off in the real world. Our results lead us to contend that it is problematic to measure accuracy with respect to data that reflects bias, and instead, we should be considering accuracy with respect to ideal, unbiased data.

Machine Learning Computers and Society Information Theory

Is there an upper bound on the size of a black-hole?

56 - Swastik Bhattacharya , S. Shankaranarayanan 2018

According to the third law of Thermodynamics, it takes an infinite number of steps for any object, including black-holes, to reach zero temperature. For any physical system, the process of cooling to absolute zero corresponds to erasing information or generating pure states. In contrast with the ordinary matter, the black-hole temperature can be lowered only by adding matter-energy into it. However, it is impossible to remove the statistical fluctuations of the infalling matter-energy. The fluctuations lead to the fact the black-holes have a finite lower temperature and, hence, an upper bound on the horizon radius. We make an estimate of the upper bound for the horizon radius which is curiosly comparable to Hubble horizon. We compare this bound with known results and discuss its implications.

General Relativity and Quantum Cosmology High Energy Astrophysical Phenomena High Energy Physics - Theory

Interpolation can hurt robust generalization even when there is no noise

80 - Konstantin Donhauser , Alexandru c{T}ifrea , Michael Aerni 2021

Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.

Machine Learning Machine Learning

comments

Fetching comments

Middle East University- Jordan

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

From Averaging to Acceleration, There is Only a Step-size

Ask ChatGPT about the research

No Arabic abstract

Read More