No Arabic abstract
We study the problem of controlling a partially observed Markov decision process (POMDP) to either aid or hinder the estimation of its state trajectory by optimising the conditional entropy of the state trajectory given measurements and controls, a quantity we dub the smoother entropy. Our consideration of the smoother entropy contrasts with previous active state estimation and obfuscation approaches that instead resort to measures of marginal (or instantaneous) state uncertainty due to tractability concerns. By establishing novel expressions of the smoother entropy in terms of the usual POMDP belief state, we show that our active estimation and obfuscation problems can be reformulated as Markov decision processes (MDPs) that are fully observed in the belief state. Surprisingly, we identify belief-state MDP reformulations of both active estimation and obfuscation with concave cost and cost-to-go functions, which enables the use of standard POMDP techniques to construct tractable bounded-error (approximate) solutions. We show in simulations that optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches.
In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in which the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of the value function is of particular importance since it enables the approximate solution of our active smoothing problem using piecewise-linear function approximations in conjunction with standard POMDP solvers. We illustrate the approximate solution of our active smoothing problem in simulation and compare its performance to alternative approaches based on minimising marginal state estimate uncertainties.
In this work, we investigate a state estimation problem for a full-car semi-active suspension system. To account for the complex calculation and optimization problems, a vehicle-to- cloud-to-vehicle (V2C2V) scheme is utilized. Moving horizon estimation is introduced for the state estimation system design. All the optimization problems are solved in a remotely-embedded agent with high computational ability. Measurements and state estimates are transmitted between the vehicle and the remote agent via networked communication channels. The effectiveness of the proposed method is illustrated via a set of simulations.
Learning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.
We introduce a formal model for the information leakage of probability distributions and define a notion called distribution privacy as the local differential privacy for probability distributions. Roughly, the distribution privacy of a local obfuscation mechanism means that the attacker cannot significantly gain any information on the distribution of the mechanisms input by observing its output. Then we show that existing local mechanisms can hide input distributions in terms of distribution privacy, while deteriorating the utility by adding too much noise. For example, we prove that the Laplace mechanism needs to add a large amount of noise proportionally to the infinite Wasserstein distance between the two distributions we want to make indistinguishable. To improve the tradeoff between distribution privacy and utility, we introduce a local obfuscation mechanism, called a tupling mechanism, that adds random dummy data to the output. Then we apply this mechanism to the protection of user attributes in location based services. By experiments, we demonstrate that the tupling mechanism outperforms popular local mechanisms in terms of attribute obfuscation and service quality.
This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.