ترغب بنشر مسار تعليمي؟ اضغط هنا

Privacy-Preserving Identification of Target Patients from Outsourced Patient Data

70   0   0.0 ( 0 )
 نشر من قبل Xiaojie Zhu
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

With the increasing affordability and availability of patient data, hospitals tend to outsource their data to cloud service providers (CSPs) for the purpose of storage and analytics. However, the concern of data privacy significantly limits the data owners choice. In this work, we propose the first solution, to the best of our knowledge, that allows a CSP to perform efficient identification of target patients (e.g., pre-processing for a genome-wide association study - GWAS) over multi-tenant encrypted phenotype data (owned by multiple hospitals or data owners). We first propose an encryption mechanism for phenotype data, where each data owner is allowed to encrypt its data with a unique secret key. Moreover, the ciphertext supports privacy-preserving search and, consequently, enables the selection of the target group of patients (e.g., case and control groups). In addition, we provide a per-query based authorization mechanism for a client to access and operate on the data stored at the CSP. Based on the identified patients, the proposed scheme can either (i) directly conduct GWAS (i.e., computation of statistics about genomic variants) at the CSP or (ii) provide the identified groups to the client to directly query the corresponding data owners and conduct GWAS using existing distributed solutions. We implement the proposed scheme and run experiments over a real-life genomic dataset to show its effectiveness. The result shows that the proposed solution is capable to efficiently identify the case/control groups in a privacy-preserving way.



قيم البحث

اقرأ أيضاً

Convolutional neural network is a machine-learning model widely applied in various prediction tasks, such as computer vision and medical image analysis. Their great predictive power requires extensive computation, which encourages model owners to hos t the prediction service in a cloud platform. Recent researches focus on the privacy of the query and results, but they do not provide model privacy against the model-hosting server and may leak partial information about the results. Some of them further require frequent interactions with the querier or heavy computation overheads, which discourages querier from using the prediction service. This paper proposes a new scheme for privacy-preserving neural network prediction in the outsourced setting, i.e., the server cannot learn the query, (intermediate) results, and the model. Similar to SecureML (S&P17), a representative work that provides model privacy, we leverage two non-colluding servers with secret sharing and triplet generation to minimize the usage of heavyweight cryptography. Further, we adopt asynchronous computation to improve the throughput, and design garbled circuits for the non-polynomial activation function to keep the same accuracy as the underlying network (instead of approximating it). Our experiments on MNIST dataset show that our scheme achieves an average of 122x, 14.63x, and 36.69x reduction in latency compared to SecureML, MiniONN (CCS17), and EzPC (EuroS&P19), respectively. For the communication costs, our scheme outperforms SecureML by 1.09x, MiniONN by 36.69x, and EzPC by 31.32x on average. On the CIFAR dataset, our scheme achieves a lower latency by a factor of 7.14x and 3.48x compared to MiniONN and EzPC, respectively. Our scheme also provides 13.88x and 77.46x lower communication costs than MiniONN and EzPC on the CIFAR dataset.
275 - Di Zhuang , J. Morris Chang 2020
In the big data era, more and more cloud-based data-driven applications are developed that leverage individual data to provide certain valuable services (the utilities). On the other hand, since the same set of individual data could be utilized to in fer the individuals certain sensitive information, it creates new channels to snoop the individuals privacy. Hence it is of great importance to develop techniques that enable the data owners to release privatized data, that can still be utilized for certain premised intended purpose. Existing data releasing approaches, however, are either privacy-emphasized (no consideration on utility) or utility-driven (no guarantees on privacy). In this work, we propose a two-step perturbation-based utility-aware privacy-preserving data releasing framework. First, certain predefined privacy and utility problems are learned from the public domain data (background knowledge). Later, our approach leverages the learned knowledge to precisely perturb the data owners data into privatized data that can be successfully utilized for certain intended purpose (learning to succeed), without jeopardizing certain predefined privacy (training to fail). Extensive experiments have been conducted on Human Activity Recognition, Census Income and Bank Marketing datasets to demonstrate the effectiveness and practicality of our framework.
The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems.
As machine learning becomes a practice and commodity, numerous cloud-based services and frameworks are provided to help customers develop and deploy machine learning applications. While it is prevalent to outsource model training and serving tasks in the cloud, it is important to protect the privacy of sensitive samples in the training dataset and prevent information leakage to untrusted third parties. Past work have shown that a malicious machine learning service provider or end user can easily extract critical information about the training samples, from the model parameters or even just model outputs. In this paper, we propose a novel and generic methodology to preserve the privacy of training data in machine learning applications. Specifically we introduce an obfuscate function and apply it to the training data before feeding them to the model training task. This function adds random noise to existing samples, or augments the dataset with new samples. By doing so sensitive information about the properties of individual samples, or statistical properties of a group of samples, is hidden. Meanwhile the model trained from the obfuscated dataset can still achieve high accuracy. With this approach, the customers can safely disclose the data or models to third-party providers or end users without the need to worry about data privacy. Our experiments show that this approach can effective defeat four existing types of machine learning privacy attacks at negligible accuracy cost.
Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries . On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: an output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا