The Nonstochastic Control Problem

123 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Karan Singh

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Elad Hazan - Sham M. Kakade - Karan Singh

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of controlling an unknown linear dynamical system in the presence of (nonstochastic) adversarial perturbations and adversarial convex loss functions. In contrast to classical control, the a priori determination of an optimal controller here is hindered by the latters dependence on the yet unknown perturbations and costs. Instead, we measure regret against an optimal linear policy in hindsight, and give the first efficient algorithm that guarantees a sublinear regret bound, scaling as T^{2/3}, in this setting.

قيم البحث

75 - Tobias Sommer Thune , Nicol`o Cesa-Bianchi , Yevgeny Seldin 2019

We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that delayed Exp3 achieves the $O(sqrt{(KT + D)ln K} )$ regret bound conjectured by Cesa-Bianchi et al. [2019] in the case of variable, but bounded delays. Here, $K$ is the number of actions and $D$ is the total delay over $T$ rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of $D$ and $T$. For this algorithm we then construct a novel doubling scheme that forgoes the prior knowledge requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order $min_beta (|S_beta|+beta ln K + (KT + D_beta)/beta)$, where $|S_beta|$ is the number of observations with delay exceeding $beta$, and $D_beta$ is the total delay of observations with delay below $beta$. The bound relaxes to $O (sqrt{(KT + D)ln K} )$, but we also provide examples where $D_beta ll D$ and the oracle bound has a polynomially better dependence on the problem parameters.

التعلم الآلي التعلم الالي

Frank-Wolfe Algorithm for the Exact Sparse Problem

131 - Farah Cherfaoui 2018

In this paper, we study the properties of the Frank-Wolfe algorithm to solve the ExactSparse reconstruction problem. We prove that when the dictionary is quasi-incoherent, at each iteration, the Frank-Wolfe algorithm picks up an atom indexed by the s upport. We also prove that when the dictionary is quasi-incoherent, there exists an iteration beyond which the algorithm converges exponentially fast.

التعلم الآلي التعلم الالي

Bandit Linear Control

202 - Asaf Cassel 2020

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision, here only the cost incurred by the learner is observed. We present a new and efficient algorithm that, for strongly convex and smooth costs, obtains regret that grows with the square root of the time horizon $T$. We also give extensions of this result to general convex, possibly non-smooth costs, and to non-stochastic system noise. A key component of our algorithm is a new technique for addressing bandit optimization of loss functions with memory.

التعلم الآلي التعلم الالي

Online Linear Quadratic Control

135 - Alon Cohen , Avinatan Hassidim , Tomer Koren 2018

We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to strongly stable policies that mix exponentially fast to a steady state.

التعلم الآلي التعلم الالي

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

232 - Imanol Schlag , Paul Smolensky , Roland Fernandez 2019

We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformers attention maps give better insights into how it is capable of solving the Mathematics Datasets challenging problems. Pretrained models and code will be made available after publication.

التعلم الآلي التعلم الالي