Online Model-Free Reinforcement Learning for the Automatic Control of a Flexible Wing Aircraft

59 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wail Gueaieb

تاريخ النشر 2021

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Mohammed Abouheaf - Wail Gueaieb - Frank Lewis

أنظمة وتحكم الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The control problem of the flexible wing aircraft is challenging due to the prevailing and high nonlinear deformations in the flexible wing system. This urged for new control mechanisms that are robust to the real-time variations in the wings aerodynamics. An online control mechanism based on a value iteration reinforcement learning process is developed for flexible wing aerial structures. It employs a model-free control policy framework and a guaranteed convergent adaptive learning architecture to solve the systems Bellman optimality equation. A Riccati equation is derived and shown to be equivalent to solving the underlying Bellman equation. The online reinforcement learning solution is implemented using means of an adaptive-critic mechanism. The controller is proven to be asymptotically stable in the Lyapunov sense. It is assessed through computer simulations and its superior performance is demonstrated on two scenarios under different operating conditions.

قيم البحث

208 - Mohammed Abouheaf , Nathaniel Mailhot , Wail Gueaieb 2021

The autonomous operation of flexible-wing aircraft is technically challenging and has never been presented within literature. The lack of an exact modeling framework is due to the complex nonlinear aerodynamic relationships governed by the deformatio ns in the flexible-wing shape, which in turn complicates the controls and instrumentation setup of the navigation system. This urged for innovative approaches to interface affordable instrumentation platforms to autonomously control this type of aircraft. This work leverages ideas from instrumentation and measurements, machine learning, and optimization fields in order to develop an autonomous navigation system for a flexible-wing aircraft. A novel machine learning process based on a guiding search mechanism is developed to interface real-time measurements of wing-orientation dynamics into control decisions. This process is realized using an online value iteration algorithm that decides on two improved and interacting model-free control strategies in real-time. The first strategy is concerned with achieving the tracking objectives while the second supports the stability of the system. A neural network platform that employs adaptive critics is utilized to approximate the control strategies while approximating the assessments of their values. An experimental actuation system is utilized to test the validity of the proposed platform. The experimental results are shown to be aligned with the stability features of the proposed model-free adaptive learning approach.

أنظمة وتحكم أنظمة وتحكم

Lifelong Control of Off-grid Microgrid with Model Based Reinforcement Learning

92 - Simone Totaro , Ioannis Boukas , Anders Jonsson 2020

The lifelong control problem of an off-grid microgrid is composed of two tasks, namely estimation of the condition of the microgrid devices and operational planning accounting for the uncertainties by forecasting the future consumption and the renewa ble production. The main challenge for the effective control arises from the various changes that take place over time. In this paper, we present an open-source reinforcement framework for the modeling of an off-grid microgrid for rural electrification. The lifelong control problem of an isolated microgrid is formulated as a Markov Decision Process (MDP). We categorize the set of changes that can occur in progressive and abrupt changes. We propose a novel model based reinforcement learning algorithm that is able to address both types of changes. In particular the proposed algorithm demonstrates generalisation properties, transfer capabilities and better robustness in case of fast-changing system dynamics. The proposed algorithm is compared against a rule-based policy and a model predictive controller with look-ahead. The results show that the trained agent is able to outperform both benchmarks in the lifelong setting where the system dynamics are changing over time.

أنظمة وتحكم الذكاء الاصطناعي التعلم الآلي

Model-Reference Reinforcement Learning for Collision-Free Tracking Control of Autonomous Surface Vehicles

239 - Qingrui Zhang , Wei Pan , Vasso Reppa 2020

This paper presents a novel model-reference reinforcement learning algorithm for the intelligent tracking control of uncertain autonomous surface vehicles with collision avoidance. The proposed control algorithm combines a conventional control method with reinforcement learning to enhance control accuracy and intelligence. In the proposed control design, a nominal system is considered for the design of a baseline tracking controller using a conventional control approach. The nominal system also defines the desired behaviour of uncertain autonomous surface vehicles in an obstacle-free environment. Thanks to reinforcement learning, the overall tracking controller is capable of compensating for model uncertainties and achieving collision avoidance at the same time in environments with obstacles. In comparison to traditional deep reinforcement learning methods, our proposed learning-based control can provide stability guarantees and better sample efficiency. We demonstrate the performance of the new algorithm using an example of autonomous surface vehicles.

أنظمة وتحكم التعلم الآلي علم الروبوتات

Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices

87 - Haotian Liu , Wenchuan Wu 2021

In Volt/Var control (VVC) of active distribution networks(ADNs), both slow timescale discrete devices (STDDs) and fast timescale continuous devices (FTCDs) are involved. The STDDs such as on-load tap changers (OLTC) and FTCDs such as distributed gene rators should be coordinated in time sequence. Such VCC is formulated as a two-timescale optimization problem to jointly optimize FTCDs and STDDs in ADNs. Traditional optimization methods are heavily based on accurate models of the system, but sometimes impractical because of their unaffordable effort on modelling. In this paper, a novel bi-level off-policy reinforcement learning (RL) algorithm is proposed to solve this problem in a model-free manner. A Bi-level Markov decision process (BMDP) is defined to describe the two-timescale VVC problem and separate agents are set up for the slow and fast timescale sub-problems. For the fast timescale sub-problem, we adopt an off-policy RL method soft actor-critic with high sample efficiency. For the slow one, we develop an off-policy multi-discrete soft actor-critic (MDSAC) algorithm to address the curse of dimensionality with various STDDs. To mitigate the non-stationary issue existing the two agents learning processes, we propose a multi-timescale off-policy correction (MTOPC) method by adopting importance sampling technique. Comprehensive numerical studies not only demonstrate that the proposed method can achieve stable and satisfactory optimization of both STDDs and FTCDs without any model information, but also support that the proposed method outperforms existing two-timescale VVC methods.

أنظمة وتحكم الذكاء الاصطناعي التعلم الآلي

Flow Rate Control in Smart District Heating Systems Using Deep Reinforcement Learning

96 - Tinghao Zhang , Jing Luo , Ping Chen 2019

At high latitudes, many cities adopt a centralized heating system to improve the energy generation efficiency and to reduce pollution. In multi-tier systems, so-called district heating, there are a few efficient approaches for the flow rate control d uring the heating process. In this paper, we describe the theoretical methods to solve this problem by deep reinforcement learning and propose a cloud-based heating control system for implementation. A real-world case study shows the effectiveness and practicability of the proposed system controlled by humans, and the simulated experiments for deep reinforcement learning show about 1985.01 gigajoules of heat quantity and 42276.45 tons of water are saved per hour compared with manual control.

أنظمة وتحكم الذكاء الاصطناعي التعلم الآلي