Efficient and error-tolerant schemes for non-adaptive complex group testing and its application in complex disease genetics


الملخص بالإنكليزية

The goal of combinatorial group testing is to efficiently identify up to $d$ defective items in a large population of $n$ items, where $d ll n$. Defective items satisfy certain properties while the remaining items in the population do not. To efficiently identify defective items, a subset of items is pooled and then tested. In this work, we consider complex group testing (CmplxGT) in which a set of defective items consists of subsets of positive items (called textit{positive complexes}). CmplxGT is classified into two categories: classical CmplxGT (CCmplxGT) and generalized CmplxGT (GCmplxGT). In CCmplxGT, the outcome of a test on a subset of items is positive if the subset contains at least one positive complex, and negative otherwise. In GCmplxGT, the outcome of a test on a subset of items is positive if the subset has a certain number of items of some positive complex, and negative otherwise. For CCmplxGT, we present a scheme that efficiently identifies all positive complexes in time $t times mathrm{poly}(d, ln{n})$ in the presence of erroneous outcomes, where $t$ is a predefined parameter. As $d ll n$, this is significantly better than the currently best time of $mathrm{poly}(t) times O(n ln{n})$. Moreover, in specific cases, the number of tests in our proposed scheme is smaller than previous work. For GCmplxGT, we present a scheme that efficiently identifies all positive complexes. These schemes are directly applicable in various areas such as complex disease genetics, molecular biology, and learning a hidden graph.

تحميل البحث