Parameter-free online learning via model selection

135 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dylan Foster

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Dylan J. Foster - Satyen Kale - Mehryar Mohri

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and $mathbb{R}^{d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel multi-scale algorithm for prediction with expert advice based on random playout, which may be of independent interest.

قيم البحث

120 - Jonathan N. Lee , Aldo Pacchiano , Vidya Muthukumar 2020

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theo retical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learners goal is to adapt to the complexity of the optimal algorithm without knowing it textit{a priori}. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with $tilde{O}(L^{5/6} T^{2/3})$ regret compared to the optimal candidates $tilde{O}(sqrt T)$ regret, where $T$ is the number of episodes and $L$ is the number of algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.

التعلم الآلي التعلم الالي

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $sqrt{T}$ Regret

216 - Asaf Cassel 2021

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.

التعلم الآلي التعلم الالي

MARTHE: Scheduling the Learning Rate Via Online Hypergradients

71 - Michele Donini , Luca Franceschi , Massimiliano Pontil 2019

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedul e -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO (Franceschi et al., 2017) and HD (Baydin et al. 2018), and is able to produce learning rate schedules that are more stable leading to models that generalize better.

التعلم الآلي التعلم الالي

Secure Mobile Edge Computing in IoT via Collaborative Online Learning

226 - Bingcong Li , Tianyi Chen , 2018

To accommodate heterogeneous tasks in Internet of Things (IoT), a new communication and computing paradigm termed mobile edge computing emerges that extends computing services from the cloud to edge, but at the same time exposes new challenges on sec urity. The present paper studies online security-aware edge computing under jamming attacks. Leveraging online learning tools, novel algorithms abbreviated as SAVE-S and SAVE-A are developed to cope with the stochastic and adversarial forms of jamming, respectively. Without utilizing extra resources such as spectrum and transmission power to evade jamming attacks, SAVE-S and SAVE-A can select the most reliable server to offload computing tasks with minimal privacy and security concerns. It is analytically established that without any prior information on future jamming and server security risks, the proposed schemes can achieve ${cal O}big(sqrt{T}big)$ regret. Information sharing among devices can accelerate the security-aware computing tasks. Incorporating the information shared by other devices, SAVE-S and SAVE-A offer impressive improvements on the sublinear regret, which is guaranteed by what is termed value of cooperation. Effectiveness of the proposed schemes is tested on both synthetic and real datasets.

التعلم الآلي التعلم الالي

Online structural kernel selection for mobile health

78 - Eura Shin , Pedja Klasnja , Susan Murphy 2021

Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel compo sition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that the kernels themselves are meaningful for an mHealth prediction goal.

التعلم الآلي التعلم الالي