Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Enabling Privacy-Preserving GWAS in Heterogeneous Human Populations

85 0 0.0 ( 0 )

Download Cite

Added by Sean Simmons

Publication date 2016

fields Biology Mathematical Statistics

and research's language is English

Authors Sean Simmons - Cenk Sahinalp -

Quantitative Methods Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The projected increase of genotyping in the clinic and the rise of large genomic databases has led to the possibility of using patient medical data to perform genomewide association studies (GWAS) on a larger scale and at a lower cost than ever before. Due to privacy concerns, however, access to this data is limited to a few trusted individuals, greatly reducing its impact on biomedical research. Privacy preserving methods have been suggested as a way of allowing more people access to this precious data while protecting patients. In particular, there has been growing interest in applying the concept of differential privacy to GWAS results. Unfortunately, previous approaches for performing differentially private GWAS are based on rather simple statistics that have some major limitations. In particular, they do not correct for population stratification, a major issue when dealing with the genetically diverse populations present in modern GWAS. To address this concern we introduce a novel computational framework for performing GWAS that tailors ideas from differential privacy to protect private phenotype information, while at the same time correcting for population stratification. This framework allows us to produce privacy preserving GWAS results based on two of the most commonly used GWAS statistics: EIGENSTRAT and linear mixed model (LMM) based statistics. We test our differentially private statistics, PrivSTRAT and PrivLMM, on both simulated and real GWAS datasets and find that they are able to protect privacy while returning meaningful GWAS results.

rate research

DIMY: Enabling Privacy-preserving Contact Tracing

91 - Nadeem Ahmed , Regio A. Michelin , Wanli Xue 2021

The infection rate of COVID-19 and lack of an approved vaccine has forced governments and health authorities to adopt lockdowns, increased testing, and contact tracing to reduce the spread of the virus. Digital contact tracing has become a supplement to the traditional manual contact tracing process. However, although there have been a number of digital contact tracing apps proposed and deployed, these have not been widely adopted owing to apprehensions surrounding privacy and security. In this paper, we propose a blockchain-based privacy-preserving contact tracing protocol, Did I Meet You (DIMY), that provides full-lifecycle data privacy protection on the devices themselves as well as on the back-end servers, to address most of the privacy concerns associated with existing protocols. We have employed Bloom filters to provide efficient privacy-preserving storage, and have used the Diffie-Hellman key exchange for secret sharing among the participants. We show that DIMY provides resilience against many well known attacks while introducing negligible overheads. DIMYs footprint on the storage space of clients devices and back-end servers is also significantly lower than other similar state of the art apps.

Cryptography and Security

Statistical Methods and Workflow for Analyzing Human Metabolomics Data

94 - Joseph Antonelli , Brian Claggett , Mir Henglin 2017

High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity and mechanisms underlying human health and disease. Large-scale metabolomics data, generated using targeted or nontargeted platforms, are increasingly more common. Appropriate statistical analysis of these complex high-dimensional data is critical for extracting meaningful results from such large-scale human metabolomics studies. Herein, we consider the main statistical analytical approaches that have been employed in human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we propose a step-by-step framework for pursuing statistical analyses of human metabolomics data. We discuss the range of options and potential approaches that may be employed at each stage of data management, analysis, and interpretation, and offer guidance on analytical considerations that are important for implementing an analysis workflow. Certain pervasive analytical challenges facing human metabolomics warrant ongoing research. Addressing these challenges will allow for more standardization in the field and lead to analytical advances in metabolomics investigations with the potential to elucidate novel mechanisms underlying human health and disease.

Quantitative Methods Applications

Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data

80 - Brian L. Claggett , Joseph Antonelli , Mir Henglin 2017

Background. Emerging technologies now allow for mass spectrometry based profiling of up to thousands of small molecule metabolites (metabolomics) in an increasing number of biosamples. While offering great promise for revealing insight into the pathogenesis of human disease, standard approaches have yet to be established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes including disease outcomes. To determine optimal statistical approaches for metabolomics analysis, we sought to formally compare traditional statistical as well as newer statistical learning methods across a range of metabolomics dataset types. Results. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observed that with an increasing number of study subjects, univariate compared to multivariate methods resulted in a higher false discovery rate due to substantial correlations among metabolites. In scenarios wherein the number of assayed metabolites increases, as in the application of nontargeted versus targeted metabolomics measures, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. Conclusion. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small sized cohorts, sparse multivariate models exhibited the most robust statistical power with more consistent results. These findings have important implications for the analysis of metabolomics studies of human disease.

Quantitative Methods Applications

Flimma: a federated and privacy-preserving tool for differential gene expression analysis

77 - Olga Zolotareva 2020

Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, if class labels are inhomogeneously distributed between cohorts, their accuracy may drop. Flimma (https://exbio.wzw.tum.de/flimma/) addresses this issue by implementing the state-of-the-art workflow limma voom in a privacy-preserving manner, i.e. patient data never leaves its source site. Flimma results are identical to those generated by limma voom on combined datasets even in imbalanced scenarios where meta-analysis approaches fail.

Quantitative Methods Genomics

Privacy-Preserving Self-Taught Federated Learning for Heterogeneous Data

105 - Kai-Fung Chu , Lintao Zhang 2021

Many application scenarios call for training a machine learning model among multiple participants. Federated learning (FL) was proposed to enable joint training of a deep learning model using the local data in each party without revealing the data to others. Among various types of FL methods, vertical FL is a category to handle data sources with the same ID space and different feature spaces. However, existing vertical FL methods suffer from limitations such as restrictive neural network structure, slow training speed, and often lack the ability to take advantage of data with unmatched IDs. In this work, we propose an FL method called self-taught federated learning to address the aforementioned issues, which uses unsupervised feature extraction techniques for distributed supervised deep learning tasks. In this method, only latent variables are transmitted to other parties for model training, while privacy is preserved by storing the data and parameters of activations, weights, and biases locally. Extensive experiments are performed to evaluate and demonstrate the validity and efficiency of the proposed method.

Machine Learning Cryptography and Security

comments

Fetching comments

Arab International University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Enabling Privacy-Preserving GWAS in Heterogeneous Human Populations

Ask ChatGPT about the research

No Arabic abstract

Read More