Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation

233 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wenhao Ding

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Wenhao Ding - Baiming Chen - Bo Li

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance. However, evaluating the robustness only under the worst-case scenarios based on known attacks is not comprehensive, not to mention that some of them even rarely occur in the real world. In addition, the distribution of safety-critical data is usually multimodal, while most traditional attacks and evaluation methods focus on a single modality. To solve the above challenges, we propose a flow-based multimodal safety-critical scenario generator for evaluating decisionmaking algorithms. The proposed generative model is optimized with weighted likelihood maximization and a gradient-based sampling procedure is integrated to improve the sampling efficiency. The safety-critical scenarios are generated by querying the task algorithms and the log-likelihood of the generated scenarios is in proportion to the risk level. Experiments on a self-driving task demonstrate our advantages in terms of testing efficiency and multimodal modeling capability. We evaluate six Reinforcement Learning algorithms with our generated traffic scenarios and provide empirical conclusions about their robustness.

قيم البحث

187 - Xiaoyi Chen , Pratik Chaudhari 2020

Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonab le, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a driver-type parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making.

التعلم الآلي علم الروبوتات التعلم الالي

Inverse Policy Evaluation for Value-based Sequential Decision-making

85 - Alan Chan , Kris de Asis , Richard S. Sutton 2020

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with a n arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of textit{some} policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to textit{any} policy. This is especially relevant in the function-approximation regime, when the true value function cant be perfectly represented. In this work, we explore the use of textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function. We provide theoretical and empirical results to show that inverse policy evaluation, combined with an approximate value iteration algorithm, is a feasible method for value-based control.

التعلم الآلي الذكاء الاصطناعي

Safety and Robustness in Decision Making: Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

280 - Geoffroy Dubourg-Felonneau , Omar Darwish , Christopher Parsons 2019

The genomic profile underlying an individual tumor can be highly informative in the creation of a personalized cancer treatment strategy for a given patient; a practice known as precision oncology. This involves next generation sequencing of a tumor sample and the subsequent identification of genomic aberrations, such as somatic mutations, to provide potential candidates of targeted therapy. The identification of these aberrations from sequencing noise and germline variant background poses a classic classification-style problem. This has been previously broached with many different supervised machine learning methods, including deep-learning neural networks. However, these neural networks have thus far not been tailored to give any indication of confidence in the mutation call, meaning an oncologist could be targeting a mutation with a low probability of being true. To address this, we present here a deep bayesian recurrent neural network for cancer variant calling, which shows no degradation in performance compared to standard neural networks. This approach enables greater flexibility through different priors to avoid overfitting to a single dataset. We will be incorporating this approach into software for oncologists to obtain safe, robust, and statistically confident somatic mutation calls for precision oncology treatment choices.

التعلم الآلي الجينوم التعلم الالي

CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios

145 - Wenhao Ding , Mengdi Xu , Ding Zhao 2019

Naturalistic driving trajectories are crucial for the performance of autonomous driving algorithms. However, most of the data is collected in safe scenarios leading to the duplication of trajectories which are easy to be handled by currently develope d algorithms. When considering safety, testing algorithms in near-miss scenarios that rarely show up in off-the-shelf datasets is a vital part of the evaluation. As a remedy, we propose a near-miss data synthesizing framework based on Variational Bayesian methods and term it as Conditional Multiple Trajectory Synthesizer (CMTS). We leverage a generative model conditioned on road maps to bridge safe and collision driving data by representing their distribution in the latent space. By sampling from the near-miss distribution, we can synthesize safety-critical data crucial for understanding traffic scenarios but not shown in neither the original dataset nor the collision dataset. Our experimental results demonstrate that the augmented dataset covers more kinds of driving scenarios, especially the near-miss ones, which help improve the trajectory prediction accuracy and the capability of dealing with risky driving scenarios.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment

154 - Kosuke Imai , Zhichao Jiang , James Greiner 2020

Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judges decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges decision. Finally, we find that the PSAs recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high.

أجهزة الكمبيوتر والمجتمع تطبيقات الإحصاء المنهجية