Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Lipschitz and Comparator-Norm Adaptivity in Online Learning

167 0 0.0 ( 0 )

Download Cite

Added by Zakaria Mhammedi

Publication date 2020

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Zakaria Mhammedi - Wouter M. Koolen

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present t

rate research

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

100 - Zakaria Mhammedi , Wouter M. Koolen , Tim van Erven 2019

We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing n

Machine Learning Machine Learning

Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks

73 - Guy Uziel 2019

Deep neural networks are considered to be state of the art models in many offline machine learning tasks. However, their performance and generalization abilities in online learning tasks are much less understood. Therefore, we focus on online learning and tackle the challenging problem where the underlying process is stationary and ergodic and thus removing the i.i.d. assumption and allowing observations to depend on each other arbitrarily. We prove the generalization abilities of Lipschitz regularized deep neural networks and show that by using those networks, a convergence to the best possible prediction strategy is guaranteed.

Machine Learning Machine Learning

Markov-Lipschitz Deep Learning

89 - Stan Z. Li , Zelin Zang , Lirong Wu 2020

We propose a novel framework, called Markov-Lipschitz deep learning (MLDL), to tackle geometric deterioration caused by collapse, twisting, or crossing in vector-based neural network transformations for manifold-based representation learning and manifold data generation. A prior constraint, called locally isometric smoothness (LIS), is imposed across-layers and encoded into a Markov random field (MRF)-Gibbs distribution. This leads to the best possible solutions for local geometry preservation and robustness as measured by locally geometric distortion and locally bi-Lipschitz continuity. Consequently, the layer-wise vector transformations are enhanced into well-behaved, LIS-constrained metric homeomorphisms. Extensive experiments, comparisons, and ablation study demonstrate significant advantages of MLDL for manifold learning and manifold data generation. MLDL is general enough to enhance any vector transformation-based networks. The code is available at https://github.com/westlake-cairi/Markov-Lipschitz-Deep-Learning.

Machine Learning Machine Learning

Learning piecewise Lipschitz functions in changing environments

77 - Maria-Florina Balcan , Travis Dick , Dravyansh Sharma 2019

Optimization in the presence of sharp (non-Lipschitz), unpredictable (w.r.t. time and amount) changes is a challenging and largely unexplored problem of great significance. We consider the class of piecewise Lipschitz functions, which is the most general online setting considered in the literature for the problem, and arises naturally in various combinatorial algorithm selection problems where utility functions can have sharp discontinuities. The usual performance metric of $mathit{static}$ regret minimizes the gap between the payoff accumulated and that of the best fixed point for the entire duration, and thus fails to capture changing environments. Shifting regret is a useful alternative, which allows for up to $s$ environment shifts. In this work we provide an $O(sqrt{sdTlog T}+sT^{1-beta})$ regret bound for $beta$-dispersed functions, where $beta$ roughly quantifies the rate at which discontinuities appear in the utility functions in expectation (typically $betage1/2$ in problems of practical interest). We also present a lower bound tight up to sub-logarithmic factors. We further obtain improved bounds when selecting from a small pool of experts. We empirically demonstrate a key application of our algorithms to online clustering problems on popular benchmarks.

Machine Learning Machine Learning

Local Adaptivity in Federated Learning: Convergence and Consistency

125 - Jianyu Wang , Zheng Xu , Zachary Garrett 2021

The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have been studied for server updates. However, the effect of using adaptive optimization methods for local updates at clients is not yet understood. We show in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias, where the final converged solution may be different from the stationary point of the global objective function. We propose correction techniques to overcome this inconsistency and complement the local adaptive methods for FL. Extensive experiments on realistic federated training tasks show that the proposed algorithms can achieve faster convergence and higher test accuracy than the baselines without local adaptivity.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions