Do you want to publish a course? Click here

On the Effectiveness of Richardson Extrapolation in Machine Learning

187   0   0.0 ( 0 )
 Added by Francis Bach
 Publication date 2020
and research's language is English
 Authors Francis Bach




Ask ChatGPT about the research

Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method. The main goal of this paper is to study when Richardson extrapolation can be used within machine learning, beyond the existing applications to step-size adaptations in stochastic gradient descent. We identify two situations where Richardson interpolation can be useful: (1) when the hyperparameter is the number of iterations of an existing iterative optimization algorithm, with applications to averaged gradient descent and Frank-Wolfe algorithms (where we obtain asymptotically rates of $O(1/k^2)$ on polytopes, where $k$ is the number of iterations), and (2) when it is a regularization parameter, with applications to Nesterov smoothing techniques for minimizing non-smooth functions (where we obtain asymptotically rates close to $O(1/k^2)$ for non-smooth functions), and ridge regression. In all these cases, we show that extrapolation techniques come with no significant loss in performance, but with sometimes strong gains, and we provide theoretical justifications based on asymptotic developments for such gains, as well as empirical illustrations on classical problems from machine learning.



rate research

Read More

We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order methods is their heavier per iteration computation and poor accuracy as compared to first order methods. To address these, we incorporate several novel approaches in ADAHESSIAN, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a root-mean-square exponential moving average to smooth out variations of the Hessian diagonal across different iterations; and (iii) a block diagonal averaging to reduce the variance of Hessian diagonal elements. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods, including variants of Adam. In particular, we perform extensive tests on CV, NLP, and recommendation system tasks and find that ADAHESSIAN: (i) achieves 1.80%/1.45% higher accuracy on ResNets20/32 on Cifar10, and 5.55% higher accuracy on ImageNet as compared to Adam; (ii) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14 and 2.7/1.0 PPL on PTB/Wikitext-103; (iii) outperforms AdamW for SqueezeBert by 0.41 points on GLUE; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the cost per iteration of ADAHESSIAN is comparable to first order methods, and that it exhibits robustness towards its hyperparameters.
Filtering is a data assimilation technique that performs the sequential inference of dynamical systems states from noisy observations. Herein, we propose a machine learning-based ensemble conditional mean filter (ML-EnCMF) for tracking possibly high-dimensional non-Gaussian state models with nonlinear dynamics based on sparse observations. The proposed filtering method is developed based on the conditional expectation and numerically implemented using machine learning (ML) techniques combined with the ensemble method. The contribution of this work is twofold. First, we demonstrate that the ensembles assimilated using the ensemble conditional mean filter (EnCMF) provide an unbiased estimator of the Bayesian posterior mean, and their variance matches the expected conditional variance. Second, we implement the EnCMF using artificial neural networks, which have a significant advantage in representing nonlinear functions over high-dimensional domains such as the conditional mean. Finally, we demonstrate the effectiveness of the ML-EnCMF for tracking the states of Lorenz-63 and Lorenz-96 systems under the chaotic regime. Numerical results show that the ML-EnCMF outperforms the ensemble Kalman filter.
We develop the Littlewood-Richardson homotopy algorithm, which uses numerical continuation to compute solutions to Schubert problems on Grassmannians and is based on the geometric Littlewood-Richardson rule. One key ingredient of this algorithm is our new optimal formulation of Schubert problems in local Stiefel coordinates as systems of equations. Our implementation can solve problem instances with tens of thousands of solutions.
102 - Matthew Thorpe , Bao Wang 2021
Graph Laplacian (GL)-based semi-supervised learning is one of the most used approaches for classifying nodes in a graph. Understanding and certifying the adversarial robustness of machine learning (ML) algorithms has attracted large amounts of attention from different research communities due to its crucial importance in many security-critical applied domains. There is great interest in the theoretical certification of adversarial robustness for popular ML algorithms. In this paper, we provide the first adversarial robust certification for the GL classifier. More precisely we quantitatively bound the difference in the classification accuracy of the GL classifier before and after an adversarial attack. Numerically, we validate our theoretical certification results and show that leveraging existing adversarial defenses for the $k$-nearest neighbor classifier can remarkably improve the robustness of the GL classifier.
In this work, we propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms. The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward. We verify our proposed algorithm by solving an inverse problem with multiple scales. There are several difficulties in the implementation of this problem by using traditional MCMC sampling. Firstly, the computation of the posterior distribution involves evaluating the forward solver, which is very time consuming for a problem with heterogeneous. We hence propose to use the multi-level algorithm. More precisely, we use the generalized multiscale finite element method (GMsFEM) as the forward solver in evaluating a posterior distribution in the multi-level rejection procedure. Secondly, it is hard to find a function which can generate samplings which are meaningful. To solve this issue, we learn an RL policy as the proposal generator. Our experiments show that the proposed method significantly improves the sampling process

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا