Information Theoretic Regret Bounds for Online Nonlinear Control

383 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wen Sun

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sham Kakade - Akshay Krishnamurthy - Kendall Lowrey

التعلم الآلي علم الروبوتات التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control ($LC^3$) algorithm, enjoys a near-optimal $O(sqrt{T})$ regret bound against the optimal controller in episodic settings, where $T$ is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.

قيم البحث

147 - Gergely Neu , Gintare Karolina Dziugaite , Mahdi Haghifam 2021

We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization er ror that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates.

التعلم الآلي التعلم الالي

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

81 - Seyed Mohammad Asghari , Yi Ouyang , 2020

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in mul ti-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one systems dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $tilde{O}(sqrt{T})$ regret bound. (Here $tilde{O}(cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.

التعلم الآلي أنظمة متعددة العملاء التحسين والتحكم

Bounds for Approximate Regret-Matching Algorithms

155 - Ryan DOrazio , Dustin Morrill , James R. Wright 2019

A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed to render C FR tractable. Abstractions are often manually tuned, possibly removing important strategic differences in the full game and harming performance. Function approximation provides a natural solution to finding good abstractions to approximate the full game. A common approach to incorporating function approximation is to learn the inputs needed for a regret minimizing algorithm, allowing for generalization across many regret minimization problems. This paper gives regret bounds when a regret minimizing algorithm uses estimates instead of true values. This form of analysis is the first to generalize to a larger class of $(Phi, f)$-regret matching algorithms, and includes different forms of regret such as swap, internal, and external regret. We demonstrate how these results give a slightly tighter bound for Regression Regret-Matching (RRM), and present a novel bound for combining regression with Hedge.

التعلم الآلي علوم الكمبيوتر ونظرية الألعاب

Learning Model Predictive Control for Competitive Autonomous Racing

69 - Lukas Brunke 2020

The goal of this thesis is to design a learning model predictive controller (LMPC) that allows multiple agents to race competitively on a predefined race track in real-time. This thesis addresses two major shortcomings in the already existing single- agent formulation. Previously, the agent determines a locally optimal trajectory but does not explore the state space, which may be necessary for overtaking maneuvers. Additionally, obstacle avoidance for LMPC has been achieved in the past by using a non-convex terminal set, which increases the complexity for determining a solution to the optimization problem. The proposed algorithm for multi-agent racing explores the state space by executing the LMPC for multiple different initializations, which yields a richer terminal safe set. Furthermore, a new method for selecting states in the terminal set is developed, which keeps the convexity for the terminal safe set and allows for taking suboptimal states.

التعلم الآلي علم الروبوتات التحسين والتحكم

Online Control with Adversarial Disturbances

97 - Naman Agarwal , Brian Bullins , Elad Hazan 2019

We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure tha t has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in two main aspects: our model allows for adversarial noise in the dynamics, and allows for general convex costs.

التعلم الآلي أنظمة وتحكم التحسين والتحكم