No Arabic abstract
This paper targets control problems that exhibit specific safety and performance requirements. In particular, the aim is to ensure that an agent, operating under uncertainty, will at runtime strictly adhere to such requirements. Previous works create so-called shields that correct an existing controller for the agent if it is about to take unbearable safety risks. However, so far, shields do not consider that an environment may not be fully known in advance and may evolve for complex control and learning tasks. We propose a new method for the efficient computation of a shield that is adaptive to a changing environment. In particular, we base our method on problems that are sufficiently captured by potentially infinite Markov decision processes (MDP) and quantitative specifications such as mean payoff objectives. The shield is independent of the controller, which may, for instance, take the form of a high-performing reinforcement learning agent. At runtime, our method builds an internal abstract representation of the MDP and constantly adapts this abstraction and the shield based on observations from the environment. We showcase the applicability of our method via an urban traffic control problem.
Planning under model uncertainty is a fundamental problem across many applications of decision making and learning. In this paper, we propose the Robust Adaptive Monte Carlo Planning (RAMCP) algorithm, which allows computation of risk-sensitive Bayes-adaptive policies that optimally trade off exploration, exploitation, and robustness. RAMCP formulates the risk-sensitive planning problem as a two-player zero-sum game, in which an adversary perturbs the agents belief over the models. We introduce t
We put forward a model of action-based randomization mechanisms to analyse quantitative information flow (QIF) under generic leakage functions, and under possibly adaptive adversaries. This model subsumes many of the QIF models proposed so far. Our main contributions include the following: (1) we identify mild general conditions on the leakage function under which it is possible to derive general and significant results on adaptive QIF; (2) we contrast the efficiency of adaptive and non-adaptive strategies, showing that the latter are as efficient as the former in terms of length up to an expansion factor bounded by the number of available actions; (3) we show that the maximum information leakage over strategies, given a finite time horizon, can be expressed in terms of a Bellman equation. This can be used to compute an optimal finite strategy recursively, by resorting to standard methods like backward induction.
We develop a logical framework for reasoning about knowledge and evidence in which the agent may be uncertain about how to interpret their evidence. Rather than representing an evidential state as a fixed subset of the state space, our models allow the set of possible worlds that a piece of evidence corresponds to to vary from one possible world to another, and therefore itself be the subject of uncertainty. Such structures can be viewed as (epistemically motivated) generalizations of topological spaces. In this context, there arises a natural distinction between what is actually entailed by the evidence and what the agent knows is entailed by the evidence -- with the latter, in general, being much weaker. We provide a sound and complete axiomatization of the corresponding bi-modal logic of knowledge and evidence entailment, and investigate some natural extensions of this core system, including the addition of a belief modality and its interaction with evidence interpretation and entailment, and the addition of a knowability modality interpreted via a (generalized) interior operator.
We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the definition of optimality, a Gittins allocation index gives optimal choices. This involves studying the interaction of our uncertainty with controls which determine the filtration. We also run a simple numerical example which illustrates the interaction between the willingness to explore and uncertainty aversion of the agent when making decisions.
Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner. Existing methods often learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty, leading to performance degradation when encountering unseen out-of-distribution (OOD) patterns at test time. To address this limitation, we propose a novel probabilistic method called Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC), which models the per-pixel residual by generalized Gaussian distribution, capable of modelling heavy-tailed distributions. We compare our model with a wide variety of state-of-the-art methods on two challenging tasks: unpaired image denoising in the natural image and unpaired modality prorogation in medical image domains. Experimental results demonstrate that our model offers superior image generation quality compared to recent methods in terms of quantitative metrics such as signal-to-noise ratio and structural similarity. Our model also exhibits stronger robustness towards OOD test data.