Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Decoding from Pooled Data: Sharp Information-Theoretic Bounds

84 0 0.0 ( 0 )

Download Cite

Added by Ahmed El Alaoui

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Ahmed El Alaoui - Aaditya Ramdas - Florent Krzakala

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Consider a population consisting of n individuals, each of whom has one of d types (e.g. their blood type, in which case d=4). We are allowed to query this database by specifying a subset of the population, and in response we observe a noiseless histogram (a d-dimensional vector of counts) of types of the pooled individuals. This measurement model arises in practical situations such as pooling of genetic data and may also be motivated by privacy considerations. We are interested in the number of queries one needs to unambiguously determine the type of each individual. In this paper, we study this information-theoretic question under the random, dense setting where in each query, a random subset of individuals of size proportional to n is chosen. This makes the problem a particular example of a random constraint satisfaction problem (CSP) with a planted solution. We establish almost matching upper and lower bounds on the minimum number of queries m such that there is no solution other than the planted one with probability tending to 1 as n tends to infinity. Our proof relies on the computation of the exact annealed free energy of this model in the thermodynamic limit, which corresponds to the exponential rate of decay of the expected number of solution to this planted CSP. As a by-product of the analysis, we show an identity of independent interest relating the Gaussian integral over the space of Eulerian flows of a graph to its spanning tree polynomial.

rate research

Decoding from Pooled Data: Phase Transitions of Message Passing

91 - Ahmed El Alaoui , Aaditya Ramdas , Florent Krzakala 2017

We consider the problem of decoding a discrete signal of categorical variables from the observation of several histograms of pooled subsets of it. We present an Approximate Message Passing (AMP) algorithm for recovering the signal in the random dense setting where each observed histogram involves a random subset of entries of size proportional to n. We characterize the performance of the algorithm in the asymptotic regime where the number of observations $m$ tends to infinity proportionally to n, by deriving the corresponding State Evolution (SE) equations and studying their dynamics. We initiate the analysis of the multi-dimensional SE dynamics by proving their convergence to a fixed point, along with some further properties of the iterates. The analysis reveals sharp phase transition phenomena where the behavior of AMP changes from exact recovery to weak correlation with the signal as m/n crosses a threshold. We derive formulae for the threshold in some special cases and show that they accurately match experimental behavior.

Information Theory Data Structures and Algorithms Information Theory

Information-theoretic bounds on quantum advantage in machine learning

211 - Hsin-Yuan Huang , Richard Kueng , John Preskill 2021

We study the performance of classical and quantum machine learning (ML) models in predicting outcomes of physical experiments. The experiments depend on an input parameter $x$ and involve execution of a (possibly unknown) quantum process $mathcal{E}$. Our figure of merit is the number of runs of $mathcal{E}$ required to achieve a desired prediction performance. We consider classical ML models that perform a measurement and record the classical outcome after each run of $mathcal{E}$, and quantum ML models that can access $mathcal{E}$ coherently to acquire quantum data; the classical or quantum data is then used to predict outcomes of future experiments. We prove that for any input distribution $mathcal{D}(x)$, a classical ML model can provide accurate predictions on average by accessing $mathcal{E}$ a number of times comparable to the optimal quantum ML model. In contrast, for achieving accurate prediction on all inputs, we prove that exponential quantum advantage is possible. For example, to predict expectations of all Pauli observables in an $n$-qubit system $rho$, classical ML models require $2^{Omega(n)}$ copies of $rho$, but we present a quantum ML model using only $mathcal{O}(n)$ copies. Our results clarify where quantum advantage is possible and highlight the potential for classical ML models to address challenging quantum problems in physics and chemistry.

Quantum Physics Information Theory Machine Learning

A Bayesian Framework for Information-Theoretic Probing

154 - Tiago Pimentel , Ryan Cotterell 2021

Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective. They argue that probing should be seen as approximating a mutual information. This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences. The mutual information, however, assumes the true probability distribution of a pair of random variables is known, leading to unintuitive results in settings where it is not. This paper proposes a new framework to measure what we term Bayesian mutual information, which analyses information from the perspective of Bayesian agents -- allowing for more intuitive findings in scenarios with finite data. For instance, under Bayesian MI we have that data can add information, processing can help, and information can hurt, which makes it more intuitive for machine learning applications. Finally, we apply our framework to probing where we believe Bayesian mutual information naturally operationalises ease of extraction by explicitly limiting the available background knowledge to solve a task.

Computation and Language Information Theory Information Theory

An Information-theoretic Approach to Distribution Shifts

421 - Marco Federici , Ryota Tomioka , Patrick Forre 2021

Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.

Machine Learning Information Theory Information Theory

Information-theoretic and algorithmic thresholds for group testing

130 - Amin Coja-Oghlan , Oliver Gebhard , Max Hahn-Klimroth 2019

In the group testing problem we aim to identify a small number of infected individuals within a large population. We avail ourselves to a procedure that can test a group of multiple individuals, with the test result coming out positive iff at least one individual in the group is infected. With all tests conducted in parallel, what is the least number of tests required to identify the status of all individuals? In a recent test design [Aldridge et al. 2016] the individuals are assigned to test groups randomly, with every individual joining an equal number of groups. We pinpoint the sharp threshold for the number of tests required in this randomised design so that it is information-theoretically possible to infer the infection status of every individual. Moreover, we analyse two efficient inference algorithms. These results settle conjectures from [Aldridge et al. 2014, Johnson et al. 2019].

Discrete Mathematics Information Theory Information Theory

comments

Fetching comments

Syrian International University for Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Decoding from Pooled Data: Sharp Information-Theoretic Bounds

Ask ChatGPT about the research

No Arabic abstract

Read More