ترغب بنشر مسار تعليمي؟ اضغط هنا

Private DNA Sequencing: Hiding Information in Discrete Noise

160   0   0.0 ( 0 )
 نشر من قبل Kayvon Mazooji
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

When an individuals DNA is sequenced, sensitive medical information becomes available to the sequencing laboratory. A recently proposed way to hide an individuals genetic information is to mix in DNA samples of other individuals. We assume these samples are known to the individual but unknown to the sequencing laboratory. Thus, these DNA samples act as noise to the sequencing laboratory, but still allow the individual to recover their own DNA samples afterward. Motivated by this idea, we study the problem of hiding a binary random variable X (a genetic marker) with the additive noise provided by mixing DNA samples, using mutual information as a privacy metric. This is equivalent to the problem of finding a worst-case noise distribution for recovering X from the noisy observation among a set of feasible discrete distributions. We characterize upper and lower bounds to the solution of this problem, which are empirically shown to be very close. The lower bound is obtained through a convex relaxation of the original discrete optimization problem, and yields a closed-form expression. The upper bound is computed via a greedy algorithm for selecting the mixing proportions.



قيم البحث

اقرأ أيضاً

We investigate the problem of semantic private information retrieval (semantic PIR). In semantic PIR, a user retrieves a message out of $K$ independent messages stored in $N$ replicated and non-colluding databases without revealing the identity of th e desired message to any individual database. The messages come with emph{different semantics}, i.e., the messages are allowed to have emph{non-uniform a priori probabilities} denoted by $(p_i>0,: i in [K])$, which are a proxy for their respective popularity of retrieval, and emph{arbitrary message sizes} $(L_i,: i in [K])$. This is a generalization of the classical private information retrieval (PIR) problem, where messages are assumed to have equal a priori probabilities and equal message sizes. We derive the semantic PIR capacity for general $K$, $N$. The results show that the semantic PIR capacity depends on the number of databases $N$, the number of messages $K$, the a priori probability distribution of messages $p_i$, and the message sizes $L_i$. We present two achievable semantic PIR schemes: The first one is a deterministic scheme which is based on message asymmetry. This scheme employs non-uniform subpacketization. The second scheme is probabilistic and is based on choosing one query set out of multiple options at random to retrieve the required message without the need for exponential subpacketization. We derive necessary and sufficient conditions for the semantic PIR capacity to exceed the classical PIR capacity with equal priors and sizes. Our results show that the semantic PIR capacity can be larger than the classical PIR capacity when longer messages have higher popularities. However, when messages are equal-length, the non-uniform priors cannot be exploited to improve the retrieval rate over the classical PIR capacity.
We introduce the problem of emph{timely} private information retrieval (PIR) from $N$ non-colluding and replicated servers. In this problem, a user desires to retrieve a message out of $M$ messages from the servers, whose contents are continuously up dating. The retrieval process should be executed in a timely manner such that no information is leaked about the identity of the message. To assess the timeliness, we use the emph{age of information} (AoI) metric. Interestingly, the timely PIR problem reduces to an AoI minimization subject to PIR constraints under emph{asymmetric traffic}. We explicitly characterize the optimal tradeoff between the PIR rate and the AoI metric (peak AoI or average AoI) for the case of $N=2$, $M=3$. Further, we provide some structural insights on the general problem with arbitrary $N$, $M$.
We study the problem of private set intersection (PSI). In this problem, there are two entities $E_i$, for $i=1, 2$, each storing a set $mathcal{P}_i$, whose elements are picked from a finite field $mathbb{F}_K$, on $N_i$ replicated and non-colluding databases. It is required to determine the set intersection $mathcal{P}_1 cap mathcal{P}_2$ without leaking any information about the remaining elements to the other entity with the least amount of downloaded bits. We first show that the PSI problem can be recast as a multi-message symmetric private information retrieval (MM-SPIR) problem. Next, as a stand-alone result, we derive the information-theoretic sum capacity of MM-SPIR, $C_{MM-SPIR}$. We show that with $K$ messages, $N$ databases, and the size of the desired message set $P$, the exact capacity of MM-SPIR is $C_{MM-SPIR} = 1 - frac{1}{N}$ when $P leq K-1$, provided that the entropy of the common randomness $S$ satisfies $H(S) geq frac{P}{N-1}$ per desired symbol. This result implies that there is no gain for MM-SPIR over successive single-message SPIR (SM-SPIR). For the MM-SPIR problem, we present a novel capacity-achieving scheme that builds on the near-optimal scheme of Banawan-Ulukus originally proposed for the multi-message PIR (MM-PIR) problem without database privacy constraints. Surprisingly, our scheme here is exactly optimal for the MM-SPIR problem for any $P$, in contrast to the scheme for the MM-PIR problem, which was proved only to be near-optimal. Our scheme is an alternative to the SM-SPIR scheme of Sun-Jafar. Based on this capacity result for MM-SPIR, and after addressing the added requirements in its conversion to the PSI problem, we show that the optimal download cost for the PSI problem is $minleft{leftlceilfrac{P_1 N_2}{N_2-1}rightrceil, leftlceilfrac{P_2 N_1}{N_1-1}rightrceilright}$, where $P_i$ is the cardinality of set $mathcal{P}_i$
We consider the problem of private information retrieval from $N$ emph{storage-constrained} databases. In this problem, a user wishes to retrieve a single message out of $M$ messages (of size $L$) without revealing any information about the identity of the message to individual databases. Each database stores $mu ML$ symbols, i.e., a $mu$ fraction of the entire library, where $frac{1}{N} leq mu leq 1$. Our goal is to characterize the optimal tradeoff curve for the storage cost (captured by $mu$) and the normalized download cost ($D/L$). We show that the download cost can be reduced by employing a hybrid storage scheme that combines emph{MDS coding} ideas with emph{uncoded partial replication} ideas. When there is no coding, our scheme reduces to Attia-Kumar-Tandon storage scheme, which was initially introduced by Maddah-Ali-Niesen in the context of the caching problem, and when there is no uncoded partial replication, our scheme reduces to Banawan-Ulukus storage scheme; in general, our scheme outperforms both.
106 - Tao Guo , Ruida Zhou , Chao Tian 2019
We consider information leakage to the user in private information retrieval (PIR) systems. Information leakage can be measured in terms of individual message leakage or total leakage. Individual message leakage, or simply individual leakage, is defi ned as the amount of information that the user can obtain on any individual message that is not being requested, and the total leakage is defined as the amount of information that the user can obtain about all the other messages except the one being requested. In this work, we characterize the tradeoff between the minimum download cost and the individual leakage, and that for the total leakage, respectively. New codes are proposed to achieve these optimal tradeoffs, which are also shown to be optimal in terms of the message size. We further characterize the optimal tradeoff between the minimum amount of common randomness and the total leakage. Moreover, we show that under individual leakage, common randomness is in fact unnecessary when there are more than two messages.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا