Do you want to publish a course? Click here

A Domain-Knowledge-Aided Deep Reinforcement Learning Approach for Flight Control Design

89   0   0.0 ( 0 )
 Added by Hyo-Sang Shin PhD
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

This paper aims to examine the potential of using the emerging deep reinforcement learning techniques in flight control. Instead of learning from scratch, we suggest to leverage domain knowledge available in learning to improve learning efficiency and generalisability. More specifically, the proposed approach fixes the autopilot structure as typical three-loop autopilot and deep reinforcement learning is utilised to learn the autopilot gains. To solve the flight control problem, we then formulate a Markovian decision process with a proper reward function that enable the application of reinforcement learning theory. Another type of domain knowledge is exploited for defining the reward function, by shaping reference inputs in consideration of important control objectives and using the shaped reference inputs in the reward function. The state-of-the-art deep deterministic policy gradient algorithm is utilised to learn an action policy that maps the observed states to the autopilot gains. Extensive empirical numerical simulations are performed to validate the proposed computational control algorithm.



rate research

Read More

187 - Dattaraj Rao 2019
Traditional Reinforcement Learning (RL) problems depend on an exhaustive simulation environment that models real-world physics of the problem and trains the RL agent by observing this environment. In this paper, we present a novel approach to creating an environment by modeling the reward function based on empirical rules extracted from human domain knowledge of the system under study. Using this empirical rewards function, we will build an environment and train the agent. We will first create an environment that emulates the effect of setting cabin temperature through thermostat. This is typically done in RL problems by creating an exhaustive model of the system with detailed thermodynamic study. Instead, we propose an empirical approach to model the reward function based on human domain knowledge. We will document some rules of thumb that we usually exercise as humans while setting thermostat temperature and try and model these into our reward function. This modeling of empirical human domain rules into a reward function for RL is the unique aspect of this paper. This is a continuous action space problem and using deep deterministic policy gradient (DDPG) method, we will solve for maximizing the reward function. We will create a policy network that predicts optimal temperature setpoint given external temperature and humidity.
207 - Haoran Su , Kejian Shi , Li Jin 2020
Emergency vehicle (EMV) service is a key function of cities and is exceedingly challenging due to urban traffic congestion. A main reason behind EMV service delay is the lack of communication and cooperation between vehicles blocking EMVs. In this paper, we study the improvement of EMV service under V2I connectivity. We consider the establishment of dynamic queue jump lanes (DQJLs) based on real-time coordination of connected vehicles. We develop a novel Markov decision process formulation for the DQJL problem, which explicitly accounts for the uncertainty of drivers reaction to approaching EMVs. We propose a deep neural network-based reinforcement learning algorithm that efficiently computes the optimal coordination instructions. We also validate our approach on a micro-simulation testbed using Simulation of Urban Mobility (SUMO). Validation results show that with our proposed methodology, the centralized control system saves approximately 15% EMV passing time than the benchmark system.
128 - Yufei Ye , Xiaoqin Ren , Jin Wang 2018
With the rapid development of deep learning, deep reinforcement learning (DRL) began to appear in the field of resource scheduling in recent years. Based on the previous research on DRL in the literature, we introduce online resource scheduling algorithm DeepRM2 and the offline resource scheduling algorithm DeepRM_Off. Compared with the state-of-the-art DRL algorithm DeepRM and heuristic algorithms, our proposed algorithms have faster convergence speed and better scheduling efficiency with regarding to average slowdown time, job completion time and rewards.
Deep reinforcement learning has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel deep reinforcement learning approach by constructing a curriculum consisting of a set of intermediate tasks defined by a fidelity threshold. Tasks among a curriculum can be statically determined using empirical knowledge or adaptively generated with the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based deep reinforcement learning (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical simulations on closed quantum systems and open quantum systems demonstrate that the proposed method exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with fewer control pulses.
353 - Shaoyang Wang , Tiejun Lv , Wei Ni 2021
This paper presents a novel and effective deep reinforcement learning (DRL)-based approach to addressing joint resource management (JRM) in a practical multi-carrier non-orthogonal multiple access (MC-NOMA) system, where hardware sensitivity and imperfect successive interference cancellation (SIC) are considered. We first formulate the JRM problem to maximize the weighted-sum system throughput. Then, the JRM problem is decoupled into two iterative subtasks: subcarrier assignment (SA, including user grouping) and power allocation (PA). Each subtask is a sequential decision process. Invoking a deep deterministic policy gradient algorithm, our proposed DRL-based JRM (DRL-JRM) approach jointly performs the two subtasks, where the optimization objective and constraints of the subtasks are addressed by a new joint reward and internal reward mechanism. A multi-agent structure and a convolutional neural network are adopted to reduce the complexity of the PA subtask. We also tailor the neural network structure for the stability and convergence of DRL-JRM. Corroborated by extensive experiments, the proposed DRL-JRM scheme is superior to existing alternatives in terms of system throughput and resistance to interference, especially in the presence of many users and strong inter-cell interference. DRL-JRM can flexibly meet individual service requirements of users.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا