No Arabic abstract
This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite Horizon Gittins indices provide a justification for the kl-UCB+ and kl-UCB-H+ algorithms, whose asymptotic optimality is also established.
In this paper we introduce a class of Markov decision processes that arise as a natural model for many renewable resource allocation problems. Upon extending results from the inventory control literature, we prove that they admit a closed form solution and we show how to exploit this structure to speed up its computation. We consider the application of the proposed framework to several problems arising in very different domains, and as part of the ongoing effort in the emerging field of Computational Sustainability we discuss in detail its application to the Northern Pacific Halibut marine fishery. Our approach is applied to a model based on real world data, obtaining a policy with a guaranteed lower bound on the utility function that is structurally very different from the one currently employed.
We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, where the conditional probability tables of this network are then integrated out analytically. We show that the resulting marginal process turns out to be a Polya urn, an integer valued self-reinforcing process. This urn processes, which we name a Polya-Bayes process, obey certain conditional independence properties that provide further insight about the nature of NTF. These insights also let us develop space efficient simulation algorithms that respect the potential sparsity of data: we propose a class of sequential importance sampling algorithms for computing NTF and approximating their marginal likelihood, which would be useful for model selection. The resulting methods can also be viewed as a model scoring method for topic models and discrete Bayesian networks with hidden variables. The new algorithms have favourable properties in the sparse data regime when contrasted with variational algorithms that become more accurate when the total sum of the elements of the observed tensor goes to infinity. We illustrate the performance on several examples and numerically study the behaviour of the algorithms for various data regimes.
Due to spectrum scarcity, the coexistence of radar and wireless communication has gained substantial research interest recently. Among many scenarios, the heterogeneouslydistributed joint radar-communication system is promising due to its flexibility and compatibility of existing architectures. In this paper, we focus on a heterogeneous radar and communication network (HRCN), which consists of various generic radars for multiple target tracking (MTT) and wireless communications for multiple users. We aim to improve the MTT performance and maintain good throughput levels for communication users by a well-designed resource allocation. The problem is formulated as a Bayesian Cramer-Rao bound (CRB) based minimization subjecting to resource budgets and throughput constraints. The formulated nonconvex problem is solved based on an alternating descent-ascent approach. Numerical results demonstrate the efficacy of the proposed allocation scheme for this heterogeneous network.
We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of adaptive Bayesian experimental design that allows experiments to be run in real-time. Traditional sequential Bayesian optimal experimental design approaches require substantial computation at each stage of the experiment. This makes them unsuitable for most real-world applications, where decisions must typically be made quickly. DAD addresses this restriction by learning an amortized design network upfront and then using this to rapidly run (multiple) adaptive experiments at deployment time. This network represents a design policy which takes as input the data from previous steps, and outputs the next design using a single forward pass; these design decisions can be made in milliseconds during the live experiment. To train the network, we introduce contrastive information bounds that are suitable objectives for the sequential setting, and propose a customized network architecture that exploits key symmetries. We demonstrate that DAD successfully amortizes the process of experimental design, outperforming alternative strategies on a number of problems.
The Kalman filter (KF) is used in a variety of applications for computing the posterior distribution of latent states in a state space model. The model requires a linear relationship between states and observations. Extensions to the Kalman filter have been proposed that incorporate linear approximations to nonlinear models, such as the extended Kalman filter (EKF) and the unscented Kalman filter (UKF). However, we argue that in cases where the dimensionality of observed variables greatly exceeds the dimensionality of state variables, a model for $p(text{state}|text{observation})$ proves both easier to learn and more accurate for latent space estimation. We derive and validate what we call the discriminative Kalman filter (DKF): a closed-form discriminative version of Bayesian filtering that readily incorporates off-the-shelf discriminative learning techniques. Further, we demonstrate that given mild assumptions, highly non-linear models for $p(text{state}|text{observation})$ can be specified. We motivate and validate on synthetic datasets and in neural decoding from non-human primates, showing substantial increases in decoding performance versus the standard Kalman filter.