ترغب بنشر مسار تعليمي؟ اضغط هنا

Online Control with Adversarial Disturbances

98   0   0.0 ( 0 )
 نشر من قبل Karan Singh
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in two main aspects: our model allows for adversarial noise in the dynamics, and allows for general convex costs.



قيم البحث

اقرأ أيضاً

This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous $p$ decisi ons. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems.
This paper presents a trajectory tracking control strategy that modulates the active power injected by geographically distributed inverter-based resources to support transient stability. Each resource is independently controlled, and its response dri ves the local bus voltage angle toward a trajectory that tracks the angle of the center of inertia. The center-of-inertia angle is estimated in real time from wide-area measurements. The main objectives are to stabilize transient disturbances and increase the amount of power that can be safely transferred over key transmission paths without loss of synchronism. Here we envision the actuators as utility-scale energy storage systems; however, equivalent examples could be developed for partially-curtailed photovoltaic generation and/or Type 4 wind turbine generators. The strategy stems from a time-varying linearization of the equations of motion for a synchronous machine. The control action produces synchronizing torque in a special reference frame that accounts for the motion of the center of inertia. This drives the system states toward the desired trajectory and promotes rotor angle stability. For testing we employ a reduced-order dynamic model of the North American Western Interconnection. The results show that this approach improves system reliability and can increase capacity utilization on stability-limited transmission corridors.
This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting t hat permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control ($LC^3$) algorithm, enjoys a near-optimal $O(sqrt{T})$ regret bound against the optimal controller in episodic settings, where $T$ is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $widetilde{mathcal{O}}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $widetilde{mathcal{O}}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2} )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.
Adversarial attacks expose important vulnerabilities of deep learning models, yet little attention has been paid to settings where data arrives as a stream. In this paper, we formalize the online adversarial attack problem, emphasizing two key elemen ts found in real-world use-cases: attackers must operate under partial knowledge of the target model, and the decisions made by the attacker are irrevocable since they operate on a transient data stream. We first rigorously analyze a deterministic variant of the online threat model by drawing parallels to the well-studied $k$-secretary problem in theoretical computer science and propose Virtual+, a simple yet practical online algorithm. Our main theoretical result show Virtual+ yields provably the best competitive ratio over all single-threshold algorithms for $k<5$ -- extending previous analysis of the $k$-secretary problem. We also introduce the textit{stochastic $k$-secretary} -- effectively reducing online blackbox transfer attacks to a $k$-secretary problem under noise -- and prove theoretical bounds on the performance of textit{any} online algorithms adapted to this setting. Finally, we complement our theoretical results by conducting experiments on both MNIST and CIFAR-10 with both vanilla and robust classifiers, revealing not only the necessity of online algorithms in achieving near-optimal performance but also the rich interplay of a given attack strategy towards online attack selection, enabling simple strategies like FGSM to outperform classically strong whitebox adversaries.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا