No Arabic abstract
The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly declaring it to be safe is sufficiently large. This problem is modeled as an active hypothesis testing problem in the Neyman-Pearson setting. Component-selection and inference strategies are designed and analyzed in the non-asymptotic regime. For a specific class of homogeneous problems, stronger (with respect to prior work) non-asymptotic converse and achievability bounds are provided.
Opportunistic detection rules (ODRs) are variants of fixed-sample-size detection rules in which the statistician is allowed to make an early decision on the alternative hypothesis opportunistically based on the sequentially observed samples. From a sequential decision perspective, ODRs are also mixtures of one-sided and truncated sequential detection rules. Several results regarding ODRs are established in this paper. In the finite regime, the maximum sample size is modeled either as a fixed finite number, or a geometric random variable with a fixed finite mean. For both cases, the corresponding Bayesian formulations are investigated. The former case is a slight variation of the well-known finite-length sequential hypothesis testing procedure in the literature, whereas the latter case is new, for which the Bayesian optimal ODR is shown to be a sequence of likelihood ratio threshold tests with two different thresholds: a running threshold, which is determined by solving a stationary state equation, is used when future samples are still available, and a terminal threshold (simply the ratio between the priors scaled by costs) is used when the statistician reaches the final sample and thus has to make a decision immediately. In the asymptotic regime, the tradeoff among the exponents of the (false alarm and miss) error probabilities and the normalized expected stopping time under the alternative hypothesis is completely characterized and proved to be tight, via an information-theoretic argument. Within the tradeoff region, one noteworthy fact is that the performance of the Stein-Chernoff Lemma is attainable by ODRs.
We investigate the nonparametric, composite hypothesis testing problem for arbitrary unknown distributions in the asymptotic regime where both the sample size and the number of hypotheses grow exponentially large. Such asymptotic analysis is important in many practical problems, where the number of variations that can exist within a family of distributions can be countably infinite. We introduce the notion of emph{discrimination capacity}, which captures the largest exponential growth rate of the number of hypotheses relative to the sample size so that there exists a test with asymptotically vanishing probability of error. Our approach is based on various distributional distance metrics in order to incorporate the generative model of the data. We provide analyses of the error exponent using the maximum mean discrepancy (MMD) and Kolmogorov-Smirnov (KS) distance and characterize the corresponding discrimination rates, i.e., lower bounds on the discrimination capacity, for these tests. Finally, an upper bound on the discrimination capacity based on Fanos inequality is developed. Numerical results are presented to validate the theoretical results.
The goal of threshold group testing is to identify up to $d$ defective items among a population of $n$ items, where $d$ is usually much smaller than $n$. A test is positive if it has at least $u$ defective items and negative otherwise. Our objective is to identify defective items in sublinear time the number of items, e.g., $mathrm{poly}(d, ln{n}),$ by using the number of tests as low as possible. In this paper, we reduce the number of tests to $O left( h times frac{d^2 ln^2{n}}{mathsf{W}^2(d ln{n})} right)$ and the decoding time to $O left( mathrm{dec}_0 times h right),$ where $mathrm{dec}_0 = O left( frac{d^{3.57} ln^{6.26}{n}}{mathsf{W}^{6.26}(d ln{n})} right) + O left( frac{d^6 ln^4{n}}{mathsf{W}^4(d ln{n})} right)$, $h = Oleft( frac{d_0^2 ln{frac{n}{d_0}}}{(1-p)^2} right)$ , $d_0 = max{u, d - u }$, $p in [0, 1),$ and $mathsf{W}(x) = Theta left( ln{x} - ln{ln{x}} right).$ If the number of tests is increased to $Oleft( h times frac{d^2ln^3{n}}{mathsf{W}^2(d ln{n})} right),$ the decoding complexity is reduced to $O left(mathrm{dec}_1 times h right),$ where $mathrm{dec}_1 = max left{ frac{d^2 ln^3{n}}{mathsf{W}^2(d ln{n})}, frac{ud ln^4{n}}{mathsf{W}^3(d ln{n})} right}.$ Moreover, our proposed scheme is capable of handling errors in test outcomes.
The task of non-adaptive group testing is to identify up to $d$ defective items from $N$ items, where a test is positive if it contains at least one defective item, and negative otherwise. If there are $t$ tests, they can be represented as a $t times N$ measurement matrix. We have answered the question of whether there exists a scheme such that a larger measurement matrix, built from a given $ttimes N$ measurement matrix, can be used to identify up to $d$ defective items in time $O(t log_2{N})$. In the meantime, a $t times N$ nonrandom measurement matrix with $t = O left(frac{d^2 log_2^2{N}}{(log_2(dlog_2{N}) - log_2{log_2(dlog_2{N})})^2} right)$ can be obtained to identify up to $d$ defective items in time $mathrm{poly}(t)$. This is much better than the best well-known bound, $t = O left( d^2 log_2^2{N} right)$. For the special case $d = 2$, there exists an efficient nonrandom construction in which at most two defective items can be identified in time $4log_2^2{N}$ using $t = 4log_2^2{N}$ tests. Numerical results show that our proposed scheme is more practical than existing ones, and experimental results confirm our theoretical analysis. In particular, up to $2^{7} = 128$ defective items can be identified in less than $16$s even for $N = 2^{100}$.
We consider the problem of decentralized sequential active hypothesis testing (DSAHT), where two transmitting agents, each possessing a private message, are actively helping a third agent--and each other--to learn the message pair over a discrete memoryless multiple access channel (DM-MAC). The third agent (receiver) observes the noisy channel output, which is also available to the transmitting agents via noiseless feedback. We formulate this problem as a decentralized dynamic team, show that optimal transmission policies have a time-invariant domain, and characterize the solution through a dynamic program. Several alternative formulations are discussed involving time-homogenous cost functions and/or variable-length codes, resulting in solutions described through fixed-point, Bellman-type equations. Subsequently, we make connections with the problem of simplifying the multi-letter capacity expressions for the noiseless feedback capacity of the DM-MAC. We show that restricting attention to distributions induced by optimal transmission schemes for the DSAHT problem, without loss of optimality, transforms the capacity expression, so that it can be thought of as the average reward received by an appropriately defined stochastic dynamical system with time-invariant state space.