No Arabic abstract
In this work, we introduce a new approach for the efficient solution of autonomous decision and planning problems, with a special focus on decision making under uncertainty and belief space planning (BSP) in high-dimensional state spaces. Usually, to solve the decision problem, we identify the optimal action, according to some objective function. We claim that we can sometimes generate and solve an analogous yet simplified decision problem, which can be solved more efficiently; a wise simplification method can lead to the same action selection, or one for which the maximal loss can be guaranteed. Furthermore, such simplification is separated from the state inference, and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems, and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to BSP problems, which can be simplified by considering a sparse approximation of the initial (Gaussian) belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a highly realistic active-SLAM problem, and manage to significantly reduce computation time, with practically no loss in the quality of solution. This work is conceptual and fundamental, and holds numerous possible extensions.
Partially Observable Markov Decision Processes (POMDPs) are notoriously hard to solve. Most advanced state-of-the-art online solvers leverage ideas of Monte Carlo Tree Search (MCTS). These solvers rapidly converge to the most promising branches of the belief tree, avoiding the suboptimal sections. Most of these algorithms are designed to utilize straightforward access to the state reward and assume the belief-dependent reward is nothing but expectation over the state reward. Thus, they are inapplicable to a more general and essential setting of belief-dependent rewards. One example of such reward is differential entropy approximated using a set of weighted particles of the belief. Such an information-theoretic reward introduces a significant computational burden. In this paper, we embed the paradigm of simplification into the MCTS algorithm. In particular, we present Simplified Information-Theoretic Particle Filter Tree (SITH-PFT), a novel variant to the MCTS algorithm that considers information-theoretic rewards but avoids the need to calculate them completely. We replace the costly calculation of information-theoretic rewards with adaptive upper and lower bounds. These bounds are easy to calculate and tightened only by the demand of our algorithm. Crucially, we guarantee precisely the same belief tree and solution that would be obtained by MCTS, which explicitly calculates the original information-theoretic rewards. Our approach is general; namely, any converging to the reward bounds can be easily plugged-in to achieve substantial speedup without any loss in performance.
The textured images classification assumes to consider the images in terms of area with the same texture. In uncertain environment, it could be better to take an imprecise decision or to reject the area corresponding to an unlearning class. Moreover, on the areas that are the classification units, we can have more than one texture. These considerations allows us to develop a belief decision model permitting to reject an area as unlearning and to decide on unions and intersections of learning classes. The proposed approach finds all its justification in an application of seabed characterization from sonar images, which contributes to an illustration.
The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy $pi_1$ that is optimized assuming past actions were taken by a given, fixed policy ($pi_0$), but assuming that future actions will be taken by $pi_1$. When $pi_0$ is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents behavior (an optimal grounded policy). OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC). OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.
The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the decentralized partially observable semi-Markov decision process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed methods performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems.
This paper combines two studies: a topological semantics for epistemic notions and abstract argumentation theory. In our combined setting, we use a topological semantics to represent the structure of an agents collection of evidence, and we use argumentation theory to single out the relevant sets of evidence through which a notion of beliefs grounded on arguments is defined. We discuss the formal properties of this newly defined notion, providing also a formal language with a matching modality together with a sound and complete axiom system for it. Despite the fact that our agent can combine her evidence in a rational way (captured via the topological structure), argument-based beliefs are not closed under conjunction. This illustrates the difference between an agents reasoning abilities (i.e. the way she is able to combine her available evidence) and the closure properties of her beliefs. We use this point to argue for why the failure of closure under conjunction of belief should not bear the burden of the failure of rationality.