ترغب بنشر مسار تعليمي؟ اضغط هنا

We study the class of reach-avoid dynamic games in which multiple agents interact noncooperatively, and each wishes to satisfy a distinct target condition while avoiding a failure condition. Reach-avoid games are commonly used to express safety-criti cal optimal control problems found in mobile robot motion planning. While a wide variety of approaches exist for these motion planning problems, we focus on finding time-consistent solutions, in which planned future motion is still optimal despite prior suboptimal actions. Though abstract, time consistency encapsulates an extremely desirable property: namely, time-consistent motion plans remain optimal even when a robots motion diverges from the plan early on due to, e.g., intrinsic dynamic uncertainty or extrinsic environment disturbances. Our main contribution is a computationally-efficient algorithm for multi-agent reach-avoid games which renders time-consistent solutions. We demonstrate our approach in a simulated driving scenario, where we construct a two-player adversarial game to model a range of defensive driving behaviors.
444 - Yue Meng , Zengyi Qin , Chuchu Fan 2021
Reactive and safe agent modelings are important for nowadays traffic simulator designs and safe planning applications. In this work, we proposed a reactive agent model which can ensure safety without comprising the original purposes, by learning only high-level decisions from expert data and a low-level decentralized controller guided by the jointly learned decentralized barrier certificates. Empirical results show that our learned road user simulation models can achieve a significant improvement in safety comparing to state-of-the-art imitation learning and pure control-based methods, while being similar to human agents by having smaller errors to the expert data. Moreover, our learned reactive agents are shown to generalize better to unseen traffic conditions, and react better to other road users and therefore can help understand challenging planning problems pragmatically.
Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance. However, DRL and deep MARL agents are w idely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings, thus preventing the wide application in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how to efficiently explore the unknown environments and collect informative experiences that could benefit the policy learning most. In this paper, we conduct a comprehensive survey on existing exploration methods in DRL and deep MARL for the purpose of providing understandings and insights on the critical problems and solutions. We first identify several key challenges to achieve efficient exploration, which most of the exploration methods aim at addressing. Then we provide a systematic survey of existing approaches by classifying them into two major categories: uncertainty-oriented exploration and intrinsic motivation-oriented exploration. The essence of uncertainty-oriented exploration is to leverage the quantification of the epistemic and aleatoric uncertainty to derive efficient exploration. By contrast, intrinsic motivation-oriented exploration methods usually incorporate different reward agnostic information for intrinsic exploration guidance. Beyond the above two main branches, we also conclude other exploration methods which adopt sophisticated techniques but are difficult to be classified into the above two categories. In addition, we provide a comprehensive empirical comparison of exploration methods for DRL on a set of commonly used benchmarks. Finally, we summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.
414 - Eshagh Kargar , Ville Kyrki 2021
Driving in a complex urban environment is a difficult task that requires a complex decision policy. In order to make informed decisions, one needs to gain an understanding of the long-range context and the importance of other vehicles. In this work, we propose to use Vision Transformer (ViT) to learn a driving policy in urban settings with birds-eye-view (BEV) input images. The ViT network learns the global context of the scene more effectively than with earlier proposed Convolutional Neural Networks (ConvNets). Furthermore, ViTs attention mechanism helps to learn an attention map for the scene which allows the ego car to determine which surrounding cars are important to its next decision. We demonstrate that a DQN agent with a ViT backbone outperforms baseline algorithms with ConvNet backbones pre-trained in various ways. In particular, the proposed method helps reinforcement learning algorithms to learn faster, with increased performance and less data than baselines.
Autonomous Driving Systems (ADS) are critical dynamic reconfigurable agent systems whose specification and validation raises extremely challenging problems. The paper presents a multilevel semantic framework for the specification of ADS and discusses associated validation problems. The framework relies on a formal definition of maps modeling the physical environment in which vehicles evolve. Maps are directed metric graphs whose nodes represent positions and edges represent segments of roads. We study basic properties of maps including their geometric consistency. Furthermore, we study position refinement and segment abstraction relations allowing multilevel representation from purely topological to detailed geometric. We progressively define first order logics for modeling families of maps and distributions of vehicles over maps. These are Configuration Logics, which in addition to the usual logical connectives are equipped with a coalescing operator to build configurations of models. We study their semantics and basic properties. We illustrate their use for the specification of traffic rules and scenarios characterizing sequences of scenes. We study various aspects of the validation problem including run-time verification and satisfiability of specifications. Finally, we show links of our framework with practical validation needs for ADS and advocate its adequacy for addressing the many facets of this challenge.
In this paper, we investigate the problem of a last-mile delivery service that selects up to $N$ available vehicles to deliver $M$ packages from a centralized depot to $M$ delivery locations. The objective of the last-mile delivery service is to join tly maximize customer satisfaction (minimize delivery time) and minimize operating cost (minimize total travel time) by selecting the optimal number of vehicles to perform the deliveries. We model this as an assignment (vehicles to packages) and path planning (determining the delivery order and route) problem, which is equivalent to the NP-hard multiple traveling salesperson problem. We propose a scalable heuristic algorithm, which sacrifices some optimality to achieve a reasonable computational cost for a high number of packages. The algorithm combines hierarchical clustering with a greedy search. To validate our approach, we compare the results of our simulation to experiments in a $1$:$25$ scale robotic testbed for future mobility systems.
Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating i n an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions between vehicles in a highway setting. The result is a set of equations that describes when platooning is an appropriate strategy, as well as a descriptive optimal control law that yields emergent platooning behavior. Finally, we demonstrate these properties in simulation and with a real-time experiment in a scaled testbed.
To accomplish complex swarm robotic missions in the real world, one needs to plan and execute a combination of single robot behaviors, group primitives such as task allocation, path planning, and formation control, and mission-specific objectives suc h as target search and group coverage. Most such missions are designed manually by teams of robotics experts. Recent work in automated approaches to learning swarm behavior has been limited to individual primitives with sparse work on learning complete missions. This paper presents a systematic approach to learn tactical mission-specific policies that compose primitives in a swarm to accomplish the mission efficiently using neural networks with special input and output encoding. To learn swarm tactics in an adversarial environment, we employ a combination of 1) map-to-graph abstraction, 2) input/output encoding via Pareto filtering of points of interest and clustering of robots, and 3) learning via neuroevolution and policy gradient approaches. We illustrate this combination as critical to providing tractable learning, especially given the computational cost of simulating swarm missions of this scale and complexity. Successful mission completion outcomes are demonstrated with up to 60 robots. In addition, a close match in the performance statistics in training and testing scenarios shows the potential generalizability of the proposed framework.
244 - Libing Wu , Min Wang , Dan Wu 2021
Adaptive traffic signal control plays a significant role in the construction of smart cities. This task is challenging because of many essential factors, such as cooperation among neighboring intersections and dynamic traffic scenarios. First, to fac ilitate cooperation of traffic signals, existing work adopts graph neural networks to incorporate the temporal and spatial influences of the surrounding intersections into the target intersection, where spatial-temporal information is used separately. However, one drawback of these methods is that the spatial-temporal correlations are not adequately exploited to obtain a better control scheme. Second, in a dynamic traffic environment, the historical state of the intersection is also critical for predicting future signal switching. Previous work mainly solves this problem using the current intersections state, neglecting the fact that traffic flow is continuously changing both spatially and temporally and does not handle the historical state. In this paper, we propose a novel neural network framework named DynSTGAT, which integrates dynamic historical state into a new spatial-temporal graph attention network to address the above two problems. More specifically, our DynSTGAT model employs a novel multi-head graph attention mechanism, which aims to adequately exploit the joint relations of spatial-temporal information. Then, to efficiently utilize the historical state information of the intersection, we design a sequence model with the temporal convolutional network (TCN) to capture the historical information and further merge it with the spatial information to improve its performance. Extensive experiments conducted in the multi-intersection scenario on synthetic data and real-world data confirm that our method can achieve superior performance in travel time and throughput against the state-of-the-art methods.
144 - Ziyuan Ma , Yudong Luo , Jia Pan 2021
Learning communication via deep reinforcement learning (RL) or imitation learning (IL) has recently been shown to be an effective way to solve Multi-Agent Path Finding (MAPF). However, existing communication based MAPF solvers focus on broadcast comm unication, where an agent broadcasts its message to all other or predefined agents. It is not only impractical but also leads to redundant information that could even impair the multi-agent cooperation. A succinct communication scheme should learn which information is relevant and influential to each agents decision making process. To address this problem, we consider a request-reply scenario and propose Decision Causal Communication (DCC), a simple yet efficient model to enable agents to select neighbors to conduct communication during both training and execution. Specifically, a neighbor is determined as relevant and influential only when the presence of this neighbor causes the decision adjustment on the central agent. This judgment is learned only based on agents local observation and thus suitable for decentralized execution to handle large scale problems. Empirical evaluation in obstacle-rich environment indicates the high success rate with low communication overhead of our method.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا