Minimizing the Negative Side Effects of Planning with Reduced Models

285 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sandhya Saisubramanian

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sandhya Saisubramanian - Shlomo Zilberstein

الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reduced models of large Markov decision processes accelerate planning by considering a subset of outcomes for each state-action pair. This reduction in reachable states leads to replanning when the agent encounters states without a precomputed action during plan execution. However, not all states are suitable for replanning. In the worst case, the agent may not be able to reach the goal from the newly encountered state. Agents should be better prepared to handle such risky situations and avoid replanning in risky states. Hence, we consider replanning in states that are unsafe for deliberation as a negative side effect of planning with reduced models. While the negative side effects can be minimized by always using the full model, this defeats the purpose of using reduced models. The challenge is to plan with reduced models, but somehow account for the possibility of encountering risky situations. An agent should thus only replan in states that the user has approved as safe for replanning. To that end, we propose planning using a portfolio of reduced models, a planning paradigm that minimizes the negative side effects of planning using reduced models by alternating between different outcome selection approaches. We empirically demonstrate the effectiveness of our approach on three domains: an electric vehicle charging domain using real-world data from a university campus and two benchmark planning problems.

قيم البحث

102 - Sandhya Saisubramanian , Shlomo Zilberstein 2021

Agents operating in unstructured environments often produce negative side effects (NSE), which are difficult to identify at design time. While the agent can learn to mitigate the side effects from human feedback, such feedback is often expensive and the rate of learning is sensitive to the agents state representation. We examine how humans can assist an agent, beyond providing feedback, and exploit their broader scope of knowledge to mitigate the impacts of NSE. We formulate this problem as a human-agent team with decoupled objectives. The agent optimizes its assigned task, during which its actions may produce NSE. The human shapes the environment through minor reconfiguration actions so as to mitigate the impacts of the agents side effects, without affecting the agents ability to complete its assigned task. We present an algorithm to solve this problem and analyze its theoretical properties. Through experiments with human subjects, we assess the willingness of users to perform minor environment modifications to mitigate the impacts of NSE. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE, without affecting the agents ability to complete its assigned task.

الذكاء الاصطناعي أنظمة متعددة العملاء علم الروبوتات

Short-term Maintenance Planning of Autonomous Trucks for Minimizing Economic Risk

133 - Xin Tao , Jonas M{aa}rtensson , H{aa}kan Warnquist 2021

New autonomous driving technologies are emerging every day and some of them have been commercially applied in the real world. While benefiting from these technologies, autonomous trucks are facing new challenges in short-term maintenance planning, wh ich directly influences the truck operators profit. In this paper, we implement a vehicle health management system by addressing the maintenance planning issues of autonomous trucks on a transport mission. We also present a maintenance planning model using a risk-based decision-making method, which identifies the maintenance decision with minimal economic risk of the truck company. Both availability losses and maintenance costs are considered when evaluating the economic risk. We demonstrate the proposed model by numerical experiments illustrating real-world scenarios. In the experiments, compared to three baseline methods, the expected economic risk of the proposed method is reduced by up to $47%$. We also conduct sensitivity analyses of different model parameters. The analyses show that the economic risk significantly decreases when the estimation accuracy of remaining useful life, the maximal allowed time of delivery delay before order cancellation, or the number of workshops increases. The experiment results contribute to identifying future research and development attentions of autonomous trucks from an economic perspective.

الذكاء الاصطناعي أنظمة وتحكم أنظمة وتحكم

Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models

74 - Sunandita Patra , James Mason , Malik Ghallab 2020

In AI research, synthesizing a plan of action has typically used descriptive models of the actions that abstractly specify what might happen as a result of an action, and are tailored for efficiently computing state transitions. However, executing th e planned actions has needed operational models, in which rich computational control structures and closed-loop online decision-making are used to specify how to perform an action in a complex execution context, react to events and adapt to an unfolding situation. Deliberative actors, which integrate acting and planning, have typically needed to use both of these models together -- which causes problems when attempting to develop the different models, verify their consistency, and smoothly interleave acting and planning. As an alternative, we define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models. These rely on hierarchical task-oriented refinement methods offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system. At each decision step, RAE can get advice from a planner for a near-optimal choice with respect to a utility function. The anytime planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM, (UCT Procedure for Operational Models), whose rollouts are simulations of the actors operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. We demonstrate the asymptotic convergence of UPOM towards optimal methods in static domains, and show experimentally that UPOM and the learning strategies significantly improve the acting efficiency and robustness.

الذكاء الاصطناعي

Electron-scale reduced fluid models with gyroviscous effects

113 - T. Passot , P.L. Sulem , E. Tassi 2017

Reduced fluid models for collisionless plasmas including electron inertia and finite Larmor radius corrections are derived for scales ranging from the ion to the electron gyroradii. Based either on pressure balance or on the incompressibility of the electron fluid, they respectively capture kinetic Alfven waves (KAWs) or whistler waves (WWs), and can provide suitable tools for reconnection and turbulence studies. Both isothermal regimes and Landau fluid closures permitting anisotropic pressure fluctuations are considered. For small values of the electron beta parameter $beta_e$, a perturbative computation of the gyroviscous force valid at scales comparable to the electron inertial length is performed at order $O(beta_e)$, which requires second-order contributions in a scale expansion. Comparisons with kinetic theory are performed in the linear regime. The spectrum of transverse magnetic fluctuations for strong and weak turbulence energy cascades is also phenomenologically predicted for both types of waves. In the case of moderate ion to electron temperature ratio, a new regime of KAW turbulence at scales smaller than the electron inertial length is obtained, where the magnetic energy spectrum decays like $k_perp^{-13/3}$, thus faster than the $k_perp^{-11/3}$ spectrum of WW turbulence.

فيزياء البلازما الفيزياء الفضاء

Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems

78 - Sandhya Saisubramanian , Shlomo Zilberstein , Ece Kamar 2020

Autonomous agents acting in the real-world often operate based on models that ignore certain aspects of the environment. The incompleteness of any given model---handcrafted or machine acquired---is inevitable due to practical limitations of any model ing technique for complex real-world settings. Due to the limited fidelity of its model, an agents actions may have unexpected, undesirable consequences during execution. Learning to recognize and avoid such negative side effects of the agents actions is critical to improving the safety and reliability of autonomous systems. This emerging research topic is attracting increased attention due to the increased deployment of AI systems and their broad societal impacts. This article provides a comprehensive overview of different forms of negative side effects and the recent research efforts to address them. We identify key characteristics of negative side effects, highlight the challenges in avoiding negative side effects, and discuss recently developed approaches, contrasting their benefits and limitations. We conclude with a discussion of open questions and suggestions for future research directions.

أجهزة الكمبيوتر والمجتمع الذكاء الاصطناعي