ترغب بنشر مسار تعليمي؟ اضغط هنا

Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimers disease

80   0   0.0 ( 0 )
 نشر من قبل Stefan Konigorski
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

For precision medicine and personalized treatment, we need to identify predictive markers of disease. We focus on Alzheimers disease (AD), where magnetic resonance imaging scans provide information about the disease status. By combining imaging with genome sequencing, we aim at identifying rare genetic markers associated with quantitative traits predicted from convolutional neural networks (CNNs), which traditionally have been derived manually by experts. Kernel-based tests are a powerful tool for associating sets of genetic variants, but how to optimally model rare genetic variants is still an open research question. We propose a generalized set of kernels that incorporate prior information from various annotations and multi-omics data. In the analysis of data from the Alzheimers Disease Neuroimaging Initiative (ADNI), we evaluate whether (i) CNNs yield precise and reliable brain traits, and (ii) the novel kernel-based tests can help to identify loci associated with AD. The results indicate that CNNs provide a fast, scalable and precise tool to derive quantitative AD traits and that new kernels integrating domain knowledge can yield higher power in association tests of very rare variants.

قيم البحث

اقرأ أيضاً

As societies around the world are ageing, the number of Alzheimers disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessm ent. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) markers. The Bayesian treatment of neural networks both automatically controls model complexity and provides a predictive distribution over the target function, giving uncertainty bounds for our regression task. It is therefore well suited to clinical neuroscience, where data sets are typically sparse and practitioners require a precise assessment of the predictive uncertainty. We use data of one of the largest prospective AD EEG trials ever conducted to demonstrate the potential of Bayesian deep learning in this domain, while comparing two distinct Bayesian neural network approaches, i.e., Monte Carlo dropout and Hamiltonian Monte Carlo.
Multi-view data refers to a setting where features are divided into feature sets, for example because they correspond to different sources. Stacked penalized logistic regression (StaPLR) is a recently introduced method that can be used for classifica tion and automatically selecting the views that are most important for prediction. We show how this method can easily be extended to a setting where the data has a hierarchical multi-view structure. We apply StaPLR to Alzheimers disease classification where different MRI measures have been calculated from three scan types: structural MRI, diffusion-weighted MRI, and resting-state fMRI. StaPLR can identify which scan types and which MRI measures are most important for classification, and it outperforms elastic net regression in classification performance.
Predictive modeling based on genomic data has gained popularity in biomedical research and clinical practice by allowing researchers and clinicians to identify biomarkers and tailor treatment decisions more efficiently. Analysis incorporating pathway information can boost discovery power and better connect new findings with biological mechanisms. In this article, we propose a general framework, Pathway-based Kernel Boosting (PKB), which incorporates clinical information and prior knowledge about pathways for prediction of binary, continuous and survival outcomes. We introduce appropriate loss functions and optimization procedures for different outcome types. Our prediction algorithm incorporates pathway knowledge by constructing kernel function spaces from the pathways and use them as base learners in the boosting procedure. Through extensive simulations and case studies in drug response and cancer survival datasets, we demonstrate that PKB can substantially outperform other competing methods, better identify biological pathways related to drug response and patient survival, and provide novel insights into cancer pathogenesis and treatment response.
128 - Xiuyuan Cheng , Yao Xie 2021
We present a study of kernel MMD two-sample test statistics in the manifold setting, assuming the high-dimensional observations are close to a low-dimensional manifold. We characterize the property of the test (level and power) in relation to the ker nel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, we show that when data densities are supported on a $d$-dimensional sub-manifold $mathcal{M}$ embedded in an $m$-dimensional space, the kernel MMD two-sample test for data sampled from a pair of distributions $(p, q)$ that are Holder with order $beta$ is consistent and powerful when the number of samples $n$ is greater than $delta_2(p,q)^{-2-d/beta}$ up to certain constant, where $delta_2$ is the squared $ell_2$-divergence between two distributions on manifold. Moreover, to achieve testing consistency under this scaling of $n$, our theory suggests that the kernel bandwidth $gamma$ scales with $n^{-1/(d+2beta)}$. These results indicate that the kernel MMD two-sample test does not have a curse-of-dimensionality when the data lie on the low-dimensional manifold. We demonstrate the validity of our theory and the property of the MMD test for manifold data using several numerical experiments.
Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا