No Arabic abstract
In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in which the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of the value function is of particular importance since it enables the approximate solution of our active smoothing problem using piecewise-linear function approximations in conjunction with standard POMDP solvers. We illustrate the approximate solution of our active smoothing problem in simulation and compare its performance to alternative approaches based on minimising marginal state estimate uncertainties.
In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the dynamical system and observe its reactions so that the attribute of interest is classified efficiently with high confidence. We present a decision-theoretic framework based on partially observable Markov decision processes (POMDPs). The proposed framework relies on assigning a classification belief (a probability distribution) to the attributes of interest. Given an initial belief, confidence level over which a classification decision can be made, a cost bound, safe belief sets, and a finite time horizon, we compute POMDP strategies leading to classification decisions. We present two different algorithms to compute such strategies. The first algorithm computes the optimal strategy exactly by value iteration. To overcome the computational complexity of computing the exact solutions, we propose a second algorithm is based on adaptive sampling to approximate the optimal probability of reaching a classification decision. We illustrate the proposed methodology using examples from medical diagnosis and privacy-preserving advertising.
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of an agents trajectories to an outside observer while guaranteeing the completion of a task expressed by a reward function. We first prove that an agent with partial observations can achieve an entropy at most as well as an agent with perfect observations. Then, focusing on finite-state controllers (FSCs) with deterministic memory transitions, we show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of the parametric Markov chain (pMC) induced by such FSCs. This relationship allows us to recast the entropy maximization problem as a so-called parameter synthesis problem for the induced pMC. We then present an algorithm to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In numerical examples, we illustrate the relationship between the maximum entropy, the number of memory states in the FSC, and the expected reward.
For hybrid Markov decision processes, UPPAAL Stratego can compute strategies that are safe for a given safety property and (in the limit) optimal for a given cost function. Unfortunately, these strategies cannot be exported easily since they are computed as a very long list. In this paper, we demonstrate methods to learn compact representations of the strategies in the form of decision trees. These decision trees are much smaller, more understandable, and can easily be exported as code that can be loaded into embedded systems. Despite the size compression and actual differences to the original strategy, we provide guarantees on both safety and optimality of the decision-tree strategy. On the top, we show how to obtain yet smaller representations, which are still guaranteed safe, but achieve a desired trade-off between size and optimality.
We consider a fundamental remote state estimation problem of discrete-time linear time-invariant (LTI) systems. A smart sensor forwards its local state estimate to a remote estimator over a time-correlated $M$-state Markov fading channel, where the packet drop probability is time-varying and depends on the current fading channel state. We establish a necessary and sufficient condition for mean-square stability of the remote estimation error covariance as $rho^2(mathbf{A})rho(mathbf{DM})<1$, where $rho(cdot)$ denotes the spectral radius, $mathbf{A}$ is the state transition matrix of the LTI system, $mathbf{D}$ is a diagonal matrix containing the packet drop probabilities in different channel states, and $mathbf{M}$ is the transition probability matrix of the Markov channel states. To derive this result, we propose a novel estimation-cycle based approach, and provide new element-wise bounds of matrix powers. The stability condition is verified by numerical results, and is shown more effective than existing sufficient conditions in the literature. We observe that the stability region in terms of the packet drop probabilities in different channel states can either be convex or concave depending on the transition probability matrix $mathbf{M}$. Our numerical results suggest that the stability conditions for remote estimation may coincide for setups with a smart sensor and with a conventional one (which sends raw measurements to the remote estimator), though the smart sensor setup achieves a better estimation performance.
We consider remote state estimation of multiple discrete-time linear time-invariant (LTI) systems over multiple wireless time-varying communication channels. Each system state is measured by a sensor, and the measurements from sensors are sent to a remote estimator over the shared wireless channels in a scheduled manner. We answer the following open problem: what is the fundamental requirement on the multi-sensor-multi-channel system to guarantee the existence of a sensor scheduling policy that can stabilize the remote estimation system? To tackle the problem, we propose a novel policy construction method, and develop a new analytical approach by applying the asymptotic theory of spectral radii of products of non-negative matrices. A necessary and sufficient stability condition is derived in terms of the LTI system parameters and the channel statistics, which is more effective than existing sufficient conditions available in the literature. Explicit scheduling policies with stability guarantees are presented as well. We further extend the analytical framework to cover remote estimation with four alternative network setups and obtain corresponding necessary and sufficient stability conditions.