Decentralized Cooperative Planning for Automated Vehicles with Continuous Monte Carlo Tree Search

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Karl Kurzer

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Karl Kurzer - Florian Engelhorn - J. Marius Zollner

الذكاء الاصطناعي علم الروبوتات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Urban traffic scenarios often require a high degree of cooperation between traffic participants to ensure safety and efficiency. Observing the behavior of others, humans infer whether or not others are cooperating. This work aims to extend the capabilities of automated vehicles, enabling them to cooperate implicitly in heterogeneous environments. Continuous actions allow for arbitrary trajectories and hence are applicable to a much wider class of problems than existing cooperative approaches with discrete action spaces. Based on cooperative modeling of other agents, Monte Carlo Tree Search (MCTS) in conjunction with Decoupled-UCT evaluates the action-values of each agent in a cooperative and decentralized way, respecting the interdependence of actions among traffic participants. The extension to continuous action spaces is addressed by incorporating novel MCTS-specific enhancements for efficient search space exploration. The proposed algorithm is evaluated under different scenarios, showing that the algorithm is able to achieve effective cooperative planning and generate solutions egocentric planning fails to identify.

قيم البحث

214 - Karl Kurzer , Chenyang Zhou , J. Marius Zollner 2018

Todays automated vehicles lack the ability to cooperate implicitly with others. This work presents a Monte Carlo Tree Search (MCTS) based approach for decentralized cooperative planning using macro-actions for automated vehicles in heterogeneous envi ronments. Based on cooperative modeling of other agents and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the state-action-values of each agent in a cooperative and decentralized manner, explicitly modeling the interdependence of actions between traffic participants. Macro-actions allow for temporal extension over multiple time steps and increase the effective search depth requiring fewer iterations to plan over longer horizons. Without predefined policies for macro-actions, the algorithm simultaneously learns policies over and within macro-actions. The proposed method is evaluated under several conflict scenarios, showing that the algorithm can achieve effective cooperative planning with learned macro-actions in heterogeneous environments.

الذكاء الاصطناعي

Accelerating Cooperative Planning for Automated Vehicles with Learned Heuristics and Monte Carlo Tree Search

195 - Karl Kurzer , Marcus Fechner , J. Marius Zollner 2020

Efficient driving in urban traffic scenarios requires foresight. The observation of other traffic participants and the inference of their possible next actions depending on the own action is considered cooperative prediction and planning. Humans are well equipped with the capability to predict the actions of multiple interacting traffic participants and plan accordingly, without the need to directly communicate with others. Prior work has shown that it is possible to achieve effective cooperative planning without the need for explicit communication. However, the search space for cooperative plans is so large that most of the computational budget is spent on exploring the search space in unpromising regions that are far away from the solution. To accelerate the planning process, we combined learned heuristics with a cooperative planning method to guide the search towards regions with promising actions, yielding better solutions at lower computational costs.

التعلم الآلي أنظمة متعددة العملاء علم الروبوتات

Multiple Policy Value Monte Carlo Tree Search

168 - Li-Cheng Lan , Wei Li , Ting-Han Wei 2019

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-p lay phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

الذكاء الاصطناعي

Toward Optimal FDM Toolpath Planning with Monte Carlo Tree Search

65 - Chanyeol Yoo , Samuel Lensgraf , Robert Fitch 2020

The most widely used methods for toolpath planning in fused deposition 3D printing slice the input model into successive 2D layers in order to construct the toolpath. Unfortunately slicing-based methods can incur a substantial amount of wasted motion (i.e., the extruder is moving while not printing), particularly when features of the model are spatially separated. In recent years we have introduced a new paradigm that characterizes the space of feasible toolpaths using a dependency graph on the input model, along with several algorithms to search this space for toolpaths that optimize objective functions such as wasted motion or print time. A natural question that arises is, under what circumstances can we efficiently compute an optimal toolpath? In this paper, we give an algorithm for computing fused deposition modeling (FDM) toolpaths that utilizes Monte Carlo Tree Search (MCTS), a powerful general-purpose method for navigating large search spaces that is guaranteed to converge to the optimal solution. Under reasonable assumptions on printer geometry that allow us to compress the dependency graph, our MCTS-based algorithm converges to find the optimal toolpath. We validate our algorithm on a dataset of 75 models and show it performs on par with our previous best local search-based algorithm in terms of toolpath quality. In prior work we speculated that the performance of local search was near optimal, and we examine in detail the properties of the models and MCTS executions that lead to better or worse results than local search.

علم الروبوتات

Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning

346 - Yann Labbe , Sergey Zagoruyko , Igor Kalevatykh 2019

We address the problem of visually guided rearrangement planning with many movable objects, i.e., finding a sequence of actions to move a set of objects from an initial arrangement to a desired one, while relying on visual inputs coming from an RGB c amera. To do so, we introduce a complete pipeline relying on two key contributions. First, we introduce an efficient and scalable rearrangement planning method, based on a Monte-Carlo Tree Search exploration strategy. We demonstrate that because of its good trade-off between exploration and exploitation our method (i) scales well with the number of objects while (ii) finding solutions which require a smaller number of moves compared to the other state-of-the-art approaches. Note that on the contrary to many approaches, we do not require any buffer space to be available. Second, to precisely localize movable objects in the scene, we develop an integrated approach for robust multi-object workspace state estimation from a single uncalibrated RGB camera using a deep neural network trained only with synthetic data. We validate our multi-object visually guided manipulation pipeline with several experiments on a real UR-5 robotic arm by solving various rearrangement planning instances, requiring only 60 ms to compute the plan to rearrange 25 objects. In addition, we show that our system is insensitive to camera movements and can successfully recover from external perturbations. Supplementary video, source code and pre-trained models are available at https://ylabbe.github.io/rearrangement-planning.

علم الروبوتات الرؤية الحاسوبية وتمييز الأنماط