ترغب بنشر مسار تعليمي؟ اضغط هنا

Online Optimization and Learning in Uncertain Dynamical Environments with Performance Guarantees

123   0   0.0 ( 0 )
 نشر من قبل Dan Li
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a new framework to solve online optimization and learning problems in unknown and uncertain dynamical environments. This framework enables us to simultaneously learn the uncertain dynamical environment while making online decisions in a quantifiably robust manner. The main technical approach relies on the theory of distributional robust optimization that leverages adaptive probabilistic ambiguity sets. However, as defined, the ambiguity set usually leads to online intractable problems, and the first part of our work is directed to find reformulations in the form of online convex problems for two sub-classes of objective functions. To solve the resulting problems in the proposed framework, we further introduce an online version of the Nesterov accelerated-gradient algorithm. We determine how the proposed solution system achieves a probabilistic regret bound under certain conditions. Two applications illustrate the applicability of the proposed framework.



قيم البحث

اقرأ أيضاً

The combination of machine learning with control offers many opportunities, in particular for robust control. However, due to strong safety and reliability requirements in many real-world applications, providing rigorous statistical and control-theor etic guarantees is of utmost importance, yet difficult to achieve for learning-based control schemes. We present a general framework for learning-enhanced robust control that allows for systematic integration of prior engineering knowledge, is fully compatible with modern robust control and still comes with rigorous and practically meaningful guarantees. Building on the established Linear Fractional Representation and Integral Quadratic Constraints framework, we integrate Gaussian Process Regression as a learning component and state-of-the-art robust controller synthesis. In a concrete robust control example, our approach is demonstrated to yield improved performance with more data, while guarantees are maintained throughout.
The need for robust control laws is especially important in safety-critical applications. We propose robust hybrid control barrier functions as a means to synthesize control laws that ensure robust safety. Based on this notion, we formulate an optimi zation problem for learning robust hybrid control barrier functions from data. We identify sufficient conditions on the data such that feasibility of the optimization problem ensures correctness of the learned robust hybrid control barrier functions. Our techniques allow us to safely expand the region of attraction of a compass gait walker that is subject to model uncertainty.
Systematic design and verification of advanced control strategies for complex systems under uncertainty largely remains an open problem. Despite the promise of blackbox optimization methods for automated controller tuning, they generally lack formal guarantees on the solution quality, which is especially important in the control of safety-critical systems. This paper focuses on obtaining closed-loop performance guarantees for automated controller tuning, which can be formulated as a black-box optimization problem under uncertainty. We use recent advances in non-convex scenario theory to provide a distribution-free bound on the probability of the closed-loop performance measures. To mitigate the computational complexity of the data-driven scenario optimization method, we restrict ourselves to a discrete set of candidate tuning parameters. We propose to generate these candidates using constrained Bayesian optimization run multiple times from different random seed points. We apply the proposed method for tuning an economic nonlinear model predictive controller for a semibatch reactor modeled by seven highly nonlinear differential equations.
In the current control design of safety-critical autonomous systems, formal verification techniques are typically applied after the controller is designed to evaluate whether the required properties (e.g., safety) are satisfied. However, due to the i ncreasing system complexity and the fundamental hardness of designing a controller with formal guarantees, such an open-loop process of design-then-verify often results in many iterations and fails to provide the necessary guarantees. In this paper, we propose a correct-by-construction control learning framework that integrates the verification into the control design process in a closed-loop manner, i.e., design-while-verify. Specifically, we leverage the verification results (computed reachable set of the system state) to construct feedback metrics for control learning, which measure how likely the current design of control parameters can meet the required reach-avoid property for safety and goal-reaching. We formulate an optimization problem based on such metrics for tuning the controller parameters, and develop an approximated gradient descent algorithm with a difference method to solve the optimization problem and learn the controller. The learned controller is formally guaranteed to meet the required reach-avoid property. By treating verifiability as a first-class objective and effectively leveraging the verification results during the control learning process, our approach can significantly improve the chance of finding a control design with formal property guarantees. This is demonstrated via a set of experiments on both linear and non-linear systems that use model-based or neural network based controllers.
143 - Yiwen Lu , Yilin Mo 2021
This paper considers the data-driven linear-quadratic regulation (LQR) problem where the system parameters are unknown and need to be identified in real time. Contrary to existing system identification and data-driven control methods, which typically require either offline data collection or multiple resets, we propose an online non-episodic algorithm that gains knowledge about the system from a single trajectory. The algorithm guarantees that both the identification error and the suboptimality gap of control performance in this trajectory converge to zero almost surely. Furthermore, we characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation. We provide a numerical example to illustrate the effectiveness of our proposed strategy.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا