No Arabic abstract
Deep deterministic policy gradient (DDPG) based car-following strategy can break through the constraints of the differential equation model due to the ability of exploration on complex environments. However, the car-following performance of DDPG is usually degraded by unreasonable reward function design, insufficient training and low sampling efficiency. In order to solve this kind of problem, a hybrid car-following strategy based on DDPG and cooperative adaptive cruise control (CACC) is proposed. Firstly, the car-following process is modeled as markov decision process to calculate CACC and DDPG simultaneously at each frame. Given a current state, two actions are obtained from CACC and DDPG, respectively. Then an optimal action, corresponding to the one offering a larger reward, is chosen as the output of the hybrid strategy. Meanwhile, a rule is designed to ensure that the change rate of acceleration is smaller than the desired value. Therefore, the proposed strategy not only guarantees the basic performance of car-following through CACC, but also makes full use of the advantages of exploration on complex environments via DDPG. Finally, simulation results show that the car-following performance of proposed strategy is improved significantly as compared with that of DDPG and CACC in the whole state space.
The paper evaluates the influence of the maximum vehicle acceleration and variable proportions of ACC/CACC vehicles on the throughput of an intersection. Two cases are studied: (1) free road downstream of the intersection; and (2) red light at some distance downstream of the intersection. Simulation of a 4-mile stretch of an arterial with 13 signalized intersections is used to evaluate the impact of (C)ACC vehicles on the mean and standard deviation of travel time as the proportion of (C)ACC vehicles is increased. The results suggest a very high urban mobility benefit of (C)ACC vehicles at little or no cost in infrastructure.
Traffic light timing optimization is still an active line of research despite the wealth of scientific literature on the topic, and the problem remains unsolved for any non-toy scenario. One of the key issues with traffic light optimization is the large scale of the input information that is available for the controlling agent, namely all the traffic data that is continually sampled by the traffic detectors that cover the urban network. This issue has in the past forced researchers to focus on agents that work on localized parts of the traffic network, typically on individual intersections, and to coordinate every individual agent in a multi-agent setup. In order to overcome the large scale of the available state information, we propose to rely on the ability of deep Learning approaches to handle large input spaces, in the form of Deep Deterministic Policy Gradient (DDPG) algorithm. We performed several experiments with a range of models, from the very simple one (one intersection) to the more complex one (a big city section).
To properly assess the impact of (cooperative) adaptive cruise control ACC (CACC), one has to model vehicle dynamics. First of all, one has to choose the car following model, as it determines the vehicle flow as vehicles accelerate from standstill or decelerate because of the obstacle ahead. The other factor significantly affecting the intersection throughput is the maximal vehicle acceleration rate. In this paper, we analyze three car following behaviors: Gipps model, Improved Intelligent Driver Model (IIDM) and Helly model. Gipps model exhibits rather aggressive acceleration behavior. If used for the intersection throughput estimation, this model would lead to overly optimistic results. Helly model is convenient to analyze due to its linear nature, but its deceleration behavior in the presence of obstacles ahead is unrealistically abrupt. Showing the most realistic acceleration and deceleration behavior of the three models, IIDM is suited for ACC/CACC impact evaluation better than the other two. We discuss the influence of the maximal vehicle acceleration rate and presence of different portions of ACC/CACC vehicles on intersection throughput in the context of the three car following models. The analysis is done for two cases: (1) free road downstream of the intersection; and (2) red light at some distance downstream of the intersection. Finally, we introduce the platoon model and evaluate ACC and CACC with platooning in terms of travel time ad network throughput using SUMO simulation of the 4-mile stretch of Colorado Boulevard / Huntington Drive arterial with 13 signalized intersections in Arcadia, Southern California.
This paper presented a deep reinforcement learning method named Double Deep Q-networks to design an end-to-end vision-based adaptive cruise control (ACC) system. A simulation environment of a highway scene was set up in Unity, which is a game engine that provided both physical models of vehicles and feature data for training and testing. Well-designed reward functions associated with the following distance and throttle/brake force were implemented in the reinforcement learning model for both internal combustion engine (ICE) vehicles and electric vehicles (EV) to perform adaptive cruise control. The gap statistics and total energy consumption are evaluated for different vehicle types to explore the relationship between reward functions and powertrain characteristics. Compared with the traditional radar-based ACC systems or human-in-the-loop simulation, the proposed vision-based ACC system can generate either a better gap regulated trajectory or a smoother speed trajectory depending on the preset reward function. The proposed system can be well adaptive to different speed trajectories of the preceding vehicle and operated in real-time.
COVID-19 has impacted nations differently based on their policy implementations. The effective policy requires taking into account public information and adaptability to new knowledge. Epidemiological models built to understand COVID-19 seldom provide the policymaker with the capability for adaptive pandemic control (APC). Among the core challenges to be overcome include (a) inability to handle a high degree of non-homogeneity in different contributing features across the pandemic timeline, (b) lack of an approach that enables adaptive incorporation of public health expert knowledge, and (c) transparent models that enable understanding of the decision-making process in suggesting policy. In this work, we take the early steps to address these challenges using Knowledge Infused Policy Gradient (KIPG) methods. Prior work on knowledge infusion does not handle soft and hard imposition of varying forms of knowledge in disease information and guidelines to necessarily comply with. Furthermore, the models do not attend to non-homogeneity in feature counts, manifesting as partial observability in informing the policy. Additionally, interpretable structures are extracted post-learning instead of learning an interpretable model required for APC. To this end, we introduce a mathematical framework for KIPG methods that can (a) induce relevant feature counts over multi-relational features of the world, (b) handle latent non-homogeneous counts as hidden variables that are linear combinations of kernelized aggregates over the features, and (b) infuse knowledge as functional constraints in a principled manner. The study establishes a theory for imposing hard and soft constraints and simulates it through experiments. In comparison with knowledge-intensive baselines, we show quick sample efficient adaptation to new knowledge and interpretability in the learned policy, especially in a pandemic context.