No Arabic abstract
This paper proposes a reinforcement learning approach for traffic control with the adaptive horizon. To build the controller for the traffic network, a Q-learning-based strategy that controls the green light passing time at the network intersections is applied. The controller includes two components: the regular Q-learning controller that controls the traffic light signal, and the adaptive controller that continuously optimizes the action space for the Q-learning algorithm in order to improve the efficiency of the Q-learning algorithm. The regular Q-learning controller uses the control cost function as a reward function to determine the action to choose. The adaptive controller examines the control cost and updates the action space of the controller by determining the subset of actions that are most likely to obtain optimal results and shrinking the action space to that subset. Uncertainties in traffic influx and turning rate are introduced to test the robustness of the controller under a stochastic environment. Compared with those with model predictive control (MPC), the results show that the proposed Q-learning-based controller outperforms the MPC method by reaching a stable solution in a shorter period and achieves lower control costs. The proposed Q-learning-based controller is also robust under 30% traffic demand uncertainty and 15% turning rate uncertainty.
This paper presents the results of a new deep learning model for traffic signal control. In this model, a novel state space approach is proposed to capture the main attributes of the control environment and the underlying temporal traffic movement patterns, including time of day, day of the week, signal status, and queue lengths. The performance of the model was examined over nine weeks of simulated data on a single intersection and compared to a semi-actuated and fixed time traffic controller. The simulation analysis shows an average delay reductions of 32% when compared to actuated control and 37% when compared to fixed time control. The results highlight the potential for deep reinforcement learning as a signal control optimization method.
This paper develops a reinforcement learning (RL) scheme for adaptive traffic signal control (ATSC), called CVLight, that leverages data collected only from connected vehicles (CV). Seven types of RL models are proposed within this scheme that contain various state and reward representations, including incorporation of CV delay and green light duration into state and the usage of CV delay as reward. To further incorporate information of both CV and non-CV into CVLight, an algorithm based on actor-critic, A2C-Full, is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to update the policy network and execute optimal signal timing. These models are compared at an isolated intersection under various CV market penetration rates. A full model with the best performance (i.e., minimum average travel delay per vehicle) is then selected and applied to compare with state-of-the-art benchmarks under different levels of traffic demands, turning proportions, and dynamic traffic demands, respectively. Two case studies are performed on an isolated intersection and a corridor with three consecutive intersections located in Manhattan, New York, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models that use all vehicle information, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and can achieve a similar or even greater performance when the CV penetration rate is no less than 20%.
We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems.
We consider networked control systems consisting of multiple independent controlled subsystems, operating over a shared communication network. Such systems are ubiquitous in cyber-physical systems, Internet of Things, and large-scale industrial systems. In many large-scale settings, the size of the communication network is smaller than the size of the system. In consequence, scheduling issues arise. The main contribution of this paper is to develop a deep reinforcement learning-based emph{control-aware} scheduling (textsc{DeepCAS}) algorithm to tackle these issues. We use the following (optimal) design strategy: First, we synthesize an optimal controller for each subsystem; next, we design a learning algorithm that adapts to the chosen subsystems (plants) and controllers. As a consequence of this adaptation, our algorithm finds a schedule that minimizes the emph{control loss}. We present empirical results to show that textsc{DeepCAS} finds schedules with better performance than periodic ones.
This work considers the problem of control and resource scheduling in networked systems. We present DIRA, a Deep reinforcement learning based Iterative Resource Allocation algorithm, which is scalable and control-aware. Our algorithm is tailored towards large-scale problems where control and scheduling need to act jointly to optimize performance. DIRA can be used to schedule general time-domain optimization based controllers. In the present work, we focus on control designs based on suitably adapted linear quadratic regulators. We apply our algorithm to networked systems with correlated fading communication channels. Our simulations show that DIRA scales well to large scheduling problems.