ﻻ يوجد ملخص باللغة العربية
We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To overcome this difficulty, we propose a method based on bounded policy iteration for designing stochastic but finite state (memory) controllers, which takes advantage of standard convex optimization methods. Given a memory budget and optimality criterion, the proposed method modifies the stochastic finite state controller leading to sub-optimal solutions with lower coherent risk.
We consider the stochastic shortest path planning problem in MDPs, i.e., the problem of designing policies that ensure reaching a goal state from a given initial state with minimum accrued cost. In order to account for rare but important realizations
Collision avoidance is an essential concern for the autonomous operations of aerial vehicles in dynamic and uncertain urban environments. This paper introduces a risk-bounded path planning algorithm for unmanned aerial vehicles (UAVs) operating in su
This paper introduces an intermediary between conditional expectation and conditional sublinear expectation, called R-conditioning. The R-conditioning of a random-vector in $L^2$ is defined as the best $L^2$-estimate, given a $sigma$-subalgebra and a
Imitation learning algorithms learn viable policies by imitating an experts behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the experts behavi
This paper considers safe robot mission planning in uncertain dynamical environments. This problem arises in applications such as surveillance, emergency rescue, and autonomous driving. It is a challenging problem due to modeling and integrating dyna