MIDAS: Multi-agent Interaction-aware Decision-making with Adaptive Strategies for Urban Autonomous Navigation

188 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiaoyi Chen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiaoyi Chen - Pratik Chaudhari

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonable, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a driver-type parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making.

قيم البحث

117 - Kasra Mokhtari , Alan R. Wagner 2021

Risk is traditionally described as the expected likelihood of an undesirable outcome, such as collisions for autonomous vehicles. Accurately predicting risk or potentially risky situations is critical for the safe operation of autonomous vehicles. In our previous work, we showed that risk could be characterized by two components: 1) the probability of an undesirable outcome and 2) an estimate of how undesirable the outcome is (loss). This paper is an extension to our previous work. In this paper, using our trained deep reinforcement learning model for navigating around crowds, we developed a risk-based decision-making framework for the autonomous vehicle that integrates the high-level risk-based path planning with the reinforcement learning-based low-level control. We evaluated our method in a high-fidelity simulation such as CARLA. This work can improve safety by allowing an autonomous vehicle to one day avoid and react to risky situations.

الذكاء الاصطناعي علم الروبوتات

Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation

232 - Wenhao Ding , Baiming Chen , Bo Li 2020

Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance. However, evaluating the robustness only under the worst-case scena rios based on known attacks is not comprehensive, not to mention that some of them even rarely occur in the real world. In addition, the distribution of safety-critical data is usually multimodal, while most traditional attacks and evaluation methods focus on a single modality. To solve the above challenges, we propose a flow-based multimodal safety-critical scenario generator for evaluating decisionmaking algorithms. The proposed generative model is optimized with weighted likelihood maximization and a gradient-based sampling procedure is integrated to improve the sampling efficiency. The safety-critical scenarios are generated by querying the task algorithms and the log-likelihood of the generated scenarios is in proportion to the risk level. Experiments on a self-driving task demonstrate our advantages in terms of testing efficiency and multimodal modeling capability. We evaluate six Reinforcement Learning algorithms with our generated traffic scenarios and provide empirical conclusions about their robustness.

التعلم الآلي علم الروبوتات التعلم الالي

Non-Parametric Behavior Learning for Multi-Agent Decision Making With Chance Constraints: A Data-Driven Predictive Control Framework

107 - Jun Ma , Zilong Cheng , Xiaoxue Zhang 2020

In many specific scenarios, accurate and effective system identification is a commonly encountered challenge in the model predictive control (MPC) formulation. As a consequence, the overall system performance could be significantly degraded in outcom e when the traditional MPC algorithm is adopted under those circumstances when such accuracy is lacking. To cater to this rather major shortcoming, this paper investigates a non-parametric behavior learning method for multi-agent decision making, which underpins an alternate data-driven predictive control framework. Utilizing an innovative methodology with closed-loop input/output measurements of the unknown system, the behavior of the system is learned based on the collected dataset, and thus the constructed non-parametric predictive model can be used for the determination of optimal control actions. This non-parametric predictive control framework attains the noteworthy key advantage of alleviating the heavy computational burden commonly encountered in the optimization procedures otherwise involved. Such requisite optimization procedures are typical in existing methodologies requiring open-loop input/output measurement data collection and parametric system identification. Then with a conservative approximation of probabilistic chance constraints for the MPC problem, a resulting deterministic optimization problem is formulated and solved effectively. This intuitive data-driven approach is also shown to preserve good robustness properties (even in the inevitable existence of parametric uncertainties that naturally arise in the typical system identification process). Finally, a multi-drone system is used to demonstrate the practical appeal and highly effective outcome of this promising development.

أنظمة متعددة العملاء علم الروبوتات أنظمة وتحكم

Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States

116 - Peter Wolf , Karl Kurzer , Tobias Wingert 2018

Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge. In this work we use Deep Reinforceme nt Learning to learn maneuver decisions based on a compact semantic state representation. This ensures a consistent model of the environment across scenarios as well as a behavior adaptation function, enabling on-line changes of desired behaviors without re-training. The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors, superimposed by a relational semantic scene description. The state as well as the reward are extended by a behavior adaptation function and a parameterization respectively. With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere to traffic rules and learns to drive safely in a variety of situations.

التعلم الآلي علم الروبوتات التعلم الالي

Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

339 - Conor F. Hayes , Mathieu Reymond , Diederik M. Roijers 2021

In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a decision, just the expected return -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Our key insight is that we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time. In this paper, we propose Distributional Monte Carlo Tree Search, an algorithm that learns a posterior distribution over the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Moreover, our algorithm outperforms the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.

التعلم الآلي الذكاء الاصطناعي