Policy Design for Active Sequential Hypothesis Testing using Deep Learning

105 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ekraam Sabir

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dhruva Kartik - Ekraam Sabir - Urbashi Mitra

نظرية المعلومات الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

قيم البحث

114 - Achilleas Anastasopoulos , Sandeep Pradhan 2020

We consider the problem of decentralized sequential active hypothesis testing (DSAHT), where two transmitting agents, each possessing a private message, are actively helping a third agent--and each other--to learn the message pair over a discrete mem oryless multiple access channel (DM-MAC). The third agent (receiver) observes the noisy channel output, which is also available to the transmitting agents via noiseless feedback. We formulate this problem as a decentralized dynamic team, show that optimal transmission policies have a time-invariant domain, and characterize the solution through a dynamic program. Several alternative formulations are discussed involving time-homogenous cost functions and/or variable-length codes, resulting in solutions described through fixed-point, Bellman-type equations. Subsequently, we make connections with the problem of simplifying the multi-letter capacity expressions for the noiseless feedback capacity of the DM-MAC. We show that restricting attention to distributions induced by optimal transmission schemes for the DSAHT problem, without loss of optimality, transforms the capacity expression, so that it can be thought of as the average reward received by an appropriately defined stochastic dynamical system with time-invariant state space.

نظرية المعلومات نظرية المعلومات

Active Privacy-utility Trade-off Against a Hypothesis Testing Adversary

271 - Ecenaz Erdemir , Pier Luigi Dragotti , Deniz Gunduz 2021

We consider a user releasing her data containing some personal information in return of a service. We model users personal information as two correlated random variables, one of them, called the secret variable, is to be kept private, while the other , called the useful variable, is to be disclosed for utility. We consider active sequential data release, where at each time step the user chooses from among a finite set of release mechanisms, each revealing some information about the users personal information, i.e., the true hypotheses, albeit with different statistics. The user manages data release in an online fashion such that maximum amount of information is revealed about the latent useful variable, while the confidence for the sensitive variable is kept below a predefined level. For the utility, we consider both the probability of correct detection of the useful variable and the mutual information (MI) between the useful variable and released data. We formulate both problems as a Markov decision process (MDP), and numerically solve them by advantage actor-critic (A2C) deep reinforcement learning (RL).

نظرية المعلومات التشفير والأمن التعلم الآلي

Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design

351 - Adam Foster , Desi R. Ivanova , Ilyas Malik 2021

We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of adaptive Bayesian experimental design that allows experiments to be run in real-time. Traditional sequential Bayesian optimal experimental design approaches require substant ial computation at each stage of the experiment. This makes them unsuitable for most real-world applications, where decisions must typically be made quickly. DAD addresses this restriction by learning an amortized design network upfront and then using this to rapidly run (multiple) adaptive experiments at deployment time. This network represents a design policy which takes as input the data from previous steps, and outputs the next design using a single forward pass; these design decisions can be made in milliseconds during the live experiment. To train the network, we introduce contrastive information bounds that are suitable objectives for the sequential setting, and propose a customized network architecture that exploits key symmetries. We demonstrate that DAD successfully amortizes the process of experimental design, outperforming alternative strategies on a number of problems.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Optimal Adaptive Strategies for Sequential Quantum Hypothesis Testing

75 - Yonglong Li , Vincent Y. F. Tan , 2021

We consider sequential hypothesis testing between two quantum states using adaptive and non-adaptive strategies. In this setting, samples of an unknown state are requested sequentially and a decision to either continue or to accept one of the two hyp otheses is made after each test. Under the constraint that the number of samples is bounded, either in expectation or with high probability, we exhibit adaptive strategies that minimize both types of misidentification errors. Namely, we show that these errors decrease exponentially (in the stopping time) with decay rates given by the measured relative entropies between the two states. Moreover, if we allow joint measurements on multiple samples, the rates are increased to the respective quantum relative entropies. We also fully characterize the achievable error exponents for non-adaptive strategies and provide numerical evidence showing that adaptive measurements are necessary to achieve our bounds under some additional assumptions.

فيزياء الكم نظرية المعلومات الفيزياء الرياضية

Distributed Learning for Time-varying Networks: A Scalable Design

102 - Jian Wang , Yourui Huangfu , Rong Li 2021

The wireless network is undergoing a trend from onnection of things to connection of intelligence. With data spread over the communication networks and computing capability enhanced on the devices, distributed learning becomes a hot topic in both ind ustrial and academic communities. Many frameworks, such as federated learning and federated distillation, have been proposed. However, few of them takes good care of obstacles such as the time-varying topology resulted by the characteristics of wireless networks. In this paper, we propose a distributed learning framework based on a scalable deep neural network (DNN) design. By exploiting the permutation equivalence and invariance properties of the learning tasks, the DNNs with different scales for different clients can be built up based on two basic parameter sub-matrices. Further, model aggregation can also be conducted based on these two sub-matrices to improve the learning convergence and performance. Finally, simulation results verify the benefits of the proposed framework by compared with some baselines.

نظرية المعلومات الذكاء الاصطناعي التعلم الآلي