ترغب بنشر مسار تعليمي؟ اضغط هنا

Theory and Algorithms for Partial Order Based Reduction in Planning

79   0   0.0 ( 0 )
 نشر من قبل You Xu
 تاريخ النشر 2011
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Search is a major technique for planning. It amounts to exploring a state space of planning domains typically modeled as a directed graph. However, prohibitively large sizes of the search space make search expensive. Developing better heuristic functions has been the main technique for improving search efficiency. Nevertheless, recent studies have shown that improving heuristics alone has certain fundamental limits on improving search efficiency. Recently, a new direction of research called partial order based reduction (POR) has been proposed as an alternative to improving heuristics. POR has shown promise in speeding up searches. POR has been extensively studied in model checking research and is a key enabling technique for scalability of model checking systems. Although the POR theory has been extensively studied in model checking, it has never been developed systematically for planning before. In addition, the conditions for POR in the model checking theory are abstract and not directly applicable in planning. Previous works on POR algorithms for planning did not establish the connection between these algorithms and existing theory in model checking. In this paper, we develop a theory for POR in planning. The new theory we develop connects the stubborn set theory in model checking and POR methods in planning. We show that previous POR algorithms in planning can be explained by the new theory. Based on the new theory, we propose a new, stronger POR algorithm. Experimental results on various planning domains show further search cost reduction using the new algorithm.



قيم البحث

اقرأ أيضاً

Several recent studies have compared the relative efficiency of alternative flaw selection strategies for partial-order causal link (POCL) planning. We review this literature, and present new experimental results that generalize the earlier work and explain some of the discrepancies in it. In particular, we describe the Least-Cost Flaw Repair (LCFR) strategy developed and analyzed by Joslin and Pollack (1994), and compare it with other strategies, including Gerevini and Schuberts (1996) ZLIFO strategy. LCFR and ZLIFO make very different, and apparently conflicting claims about the most effective way to reduce search-space size in POCL planning. We resolve this conflict, arguing that much of the benefit that Gerevini and Schubert ascribe to the LIFO component of their ZLIFO strategy is better attributed to other causes. We show that for many problems, a strategy that combines least-cost flaw selection with the delay of separable threats will be effective in reducing search-space size, and will do so without excessive computational overhead. Although such a strategy thus provides a good default, we also show that certain domain characteristics may reduce its effectiveness.
Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the worlds oldest board games and many classic video games, but they require vast quantities of experience to learn successfully -- none of todays algorithms account for the human ability to learn so many different tasks, so quickly. Here we propose a new approach to this challenge based on a particularly strong form of model-based RL which we call Theory-Based Reinforcement Learning, because it uses human-like intuitive theories -- rich, abstract, causal models of physical objects, intentional agents, and their interactions -- to explore and model an environment, and plan effectively to achieve task goals. We instantiate the approach in a video game playing agent called EMPA (the Exploring, Modeling, and Planning Agent), which performs Bayesian inference to learn probabilistic generative models expressed as programs for a game-engine simulator, and runs internal simulations over these models to support efficient object-based, relational exploration and heuristic planning. EMPA closely matches human learning efficiency on a suite of 90 challenging Atari-style video games, learning new games in just minutes of game play and generalizing robustly to new game situations and new levels. The model also captures fine-grained structure in peoples exploration trajectories and learning dynamics. Its design and behavior suggest a way forward for building more general human-like AI systems.
There is an impressive body of work on developing heuristics and other reasoning algorithms to guide search in optimal and anytime planning algorithms for classical planning. However, very little effort has been directed towards developing analogous techniques to guide search towards high-quality solutions in hierarchical planning formalisms like HTN planning, which allows using additional domain-specific procedural control knowledge. In lieu of such techniques, this control knowledge often needs to provide the necessary search guidance to the planning algorithm, which imposes a substantial burden on the domain author and can yield brittle or error-prone domain models. We address this gap by extending recent work on a new hierarchical goal-based planning formalism called Hierarchical Goal Network (HGN) Planning to develop the Hierarchically-Optimal Goal Decomposition Planner (HOpGDP), an HGN planning algorithm that computes hierarchically-optimal plans. HOpGDP is guided by $h_{HL}$, a new HGN planning heuristic that extends existing admissible landmark-based heuristics from classical planning to compute admissible cost estimates for HGN planning problems. Our experimental evaluation across three benchmark planning domains shows that HOpGDP compares favorably to both optimal classical planners due to its ability to use domain-specific procedural knowledge, and a blind-search version of HOpGDP due to the search guidance provided by $h_{HL}$.
Organizations that develop software have recognized that software process models are particularly useful for maintaining a high standard of quality. In the last decade, simulations of software processes were used in several settings and environments. This paper gives a short overview of the benefits of software process simulation and describes the development of a discrete-event model, a technique rarely used before in that field. The model introduced in this paper captures the behavior of a detailed code inspection process. It aims at reducing the risks inherent in implementing inspection processes and techniques in the overall development process. The determination of the underlying cause-effect relations using data mining techniques and empirical data is explained. Finally, the paper gives an outlook on our future work.
Prior work on generating explanations in a planning and decision-making context has focused on providing the rationale behind an AI agents decision making. While these methods provide the right explanations from the explainers perspective, they fail to heed the cognitive requirement of understanding an explanation from the explainees (the humans) perspective. In this work, we set out to address this issue by first considering the influence of information order in an explanation, or the progressiveness of explanations. Intuitively, progression builds later concepts on previous ones and is known to contribute to better learning. In this work, we aim to investigate similar effects during explanation generation when an explanation is broken into multiple parts that are communicated sequentially. The challenge here lies in modeling the humans preferences for information order in receiving such explanations to assist understanding. Given this sequential process, a formulation based on goal-based MDP for generating progressive explanations is presented. The reward function of this MDP is learned via inverse reinforcement learning based on explanations that are retrieved via human subject studies. We first evaluated our approach on a scavenger-hunt domain to demonstrate its effectively in capturing the humans preferences. Upon analyzing the results, it revealed something more fundamental: the preferences arise strongly from both domain dependent and independence features. The correlation with domain independent features pushed us to verify this result further in an escape room domain. Results confirmed our hypothesis that the process of understanding an explanation was a dynamic process. The human preference that reflected this aspect corresponded exactly to the progression for knowledge assimilation hidden deeper in our cognitive process.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا