Do you want to publish a course? Click here

Optimal distributed testing in high-dimensional Gaussian models

149   0   0.0 ( 0 )
 Added by Lasse Vuursteen
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

In this paper study the problem of signal detection in Gaussian noise in a distributed setting. We derive a lower bound on the size that the signal needs to have in order to be detectable. Moreover, we exhibit optimal distributed testing strategies that attain the lower bound.



rate research

Read More

71 - Lin Zhou , Yun Wei , Alfred Hero 2020
We revisit the universal outlier hypothesis testing (Li emph{et al.}, TIT 2014) and derive fundamental limits for the optimal test. In outlying hypothesis testing, one is given multiple observed sequences, where most sequences are generated i.i.d. from a nominal distribution. The task is to discern the set of outlying sequences that are generated according to anomalous distributions. The nominal and anomalous distributions are emph{unknown}. We study the tradeoff among the probabilities of misclassification error, false alarm and false reject for tests that satisfy weak conditions on the rate of decrease of these error probabilities as a function of sequence length. Specifically, we propose a threshold-based universal test that ensures exponential decay of misclassification error and false alarm probabilities. We study two constraints on the false reject probabilities, one is that it be a non-vanishing constant and the other is that it have an exponential decay rate. For both cases, we characterize bounds on the false reject probability, as a function of the threshold, for each pair of nominal and anomalous distributions and demonstrate the optimality of our test in the generalized Neyman-Pearson sense. We first consider the case of at most one outlier and then generalize our results to the case of multiple outliers where the number of outliers is unknown and each outlier can follow a different anomalous distribution.
This paper presents results pertaining to sequential methods for support recovery of sparse signals in noise. Specifically, we show that any sequential measurement procedure fails provided the average number of measurements per dimension grows slower then log s / D(f0||f1) where s is the level of sparsity, and D(f0||f1) the Kullback-Leibler divergence between the underlying distributions. For comparison, we show any non-sequential procedure fails provided the number of measurements grows at a rate less than log n / D(f1||f0), where n is the total dimension of the problem. Lastly, we show that a simple procedure termed sequential thresholding guarantees exact support recovery provided the average number of measurements per dimension grows faster than (log s + log log n) / D(f0||f1), a mere additive factor more than the lower bound.
The distributed hypothesis testing problem with full side-information is studied. The trade-off (reliability function) between the two types of error exponents under limited rate is studied in the following way. First, the problem is reduced to the problem of determining the reliability function of channel codes designed for detection (in analogy to a similar result which connects the reliability function of distributed lossless compression and ordinary channel codes). Second, a single-letter random-coding bound based on a hierarchical ensemble, as well as a single-letter expurgated bound, are derived for the reliability of channel-detection codes. Both bounds are derived for a system which employs the optimal detection rule. We conjecture that the resulting random-coding bound is ensemble-tight, and consequently optimal within the class of quantization-and-binning schemes.
In this paper, we analyze the operational information rate distortion function (RDF) ${R}_{S;Z|Y}(Delta_X)$, introduced by Draper and Wornell, for a triple of jointly independent and identically distributed, multivariate Gaussian random variables (RVs), $(X^n, S^n, Y^n)= {(X_{t}, S_t, Y_{t}): t=1,2, ldots,n}$, where $X^n$ is the source, $S^n$ is a measurement of $X^n$, available to the encoder, $Y^n$ is side information available to the decoder only, $Z^n$ is the auxiliary RV available to the decoder, with respect to the square-error fidelity, between the source $X^n$ and its reconstruction $widehat{X}^n$. We also analyze the RDF ${R}_{S;widehat{X}|Y}(Delta_X)$ that corresponds to the above set up, when side information $Y^n$ is available to the encoder and decoder. The main results include, (1) Structural properties of test channel realizations that induce distributions, which achieve the two RDFs, (2) Water-filling solutions of the two RDFs, based on parallel channel realizations of test channels, (3) A proof of equality ${R}_{S;Z|Y}(Delta_X) = {R}_{S;widehat{X}|Y}(Delta_X)$, i.e., side information $Y^n$ at both the encoder and decoder does not incur smaller compression, and (4) Relations to other RDFs, as degenerate cases, which show past literature, contain oversights related to the optimal test channel realizations and value of the RDF ${R}_{S;Z|Y}(Delta_X)$.
Consider the problem of estimating parameters $X^n in mathbb{R}^n $, generated by a stationary process, from $m$ response variables $Y^m = AX^n+Z^m$, under the assumption that the distribution of $X^n$ is known. This is the most general version of the Bayesian linear regression problem. The lack of computationally feasible algorithms that can employ generic prior distributions and provide a good estimate of $X^n$ has limited the set of distributions researchers use to model the data. In this paper, a new scheme called Q-MAP is proposed. The new method has the following properties: (i) It has similarities to the popular MAP estimation under the noiseless setting. (ii) In the noiseless setting, it achieves the asymptotically optimal performance when $X^n$ has independent and identically distributed components. (iii) It scales favorably with the dimensions of the problem and therefore is applicable to high-dimensional setups. (iv) The solution of the Q-MAP optimization can be found via a proposed iterative algorithm which is provably robust to the error (noise) in the response variables.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا