بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Identification and correction of sample mix-ups in expression genetic data: A case study

265 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Karl Broman

تاريخ النشر 2014

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Karl W. Broman - Mark P. Keller - Aimee Teo Broman

تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individuals eQTL genotype based on expression data alone. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes (though we further identified a small number of sample mix-ups in each of the six panels of gene expression microarrays). Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems. Our methods have been implemented in an R package, R/lineup.

قيم البحث

358 - J. Gimenez , A.C. Frery , Ana Georgina Flesia 2014

The Potts model is frequently used to describe the behavior of image classes, since it allows to incorporate contextual information linking neighboring pixels in a simple way. Its isotropic version has only one real parameter beta, known as smoothnes s parameter or inverse temperature, which regulates the classes map homogeneity. The classes are unavailable, and estimating them is central in important image processing procedures as, for instance, image classification. Methods for estimating the classes which stem from a Bayesian approach under the Potts model require to adequately specify a value for beta. The estimation of such parameter can be efficiently made solving the Pseudo Maximum likelihood (PML) equations in two different schemes, using the prior or the posterior model. Having only radiometric data available, the first scheme needs the computation of an initial segmentation, while the second uses both the segmentation and the radiometric data to make the estimation. In this paper, we compare these two PML estimators by computing the mean square error (MSE), bias, and sensitivity to deviations from the hypothesis of the model. We conclude that the use of extra data does not improve the accuracy of the PML, moreover, under gross deviations from the model, this extra information introduces unpredictable distortions and bias.

تطبيقات الإحصاء

Prediction of Alzheimers disease-associated genes by integration of GWAS summary data and expression data

142 - Sicheng Hao , Rui Wang , Yu Zhang 2018

Alzheimers disease is the most common cause of dementia. It is the fifth-leading cause of death among elderly people. With high genetic heritability (79%), finding disease causal genes is a crucial step in find treatment for AD. Following the Interna tional Genomics of Alzheimers Project (IGAP), many disease-associated genes have been identified; however, we dont have enough knowledge about how those disease-associated genes affect gene expression and disease-related pathways. We integrated GWAS summary data from IGAP and five different expression level data by using TWAS method and identified 15 disease causal genes under strict multiple testing (alpha<0.05), 4 genes are newly identified; identified additional 29 potential disease causal genes under false discovery rate(alpha < 0.05), 21 of them are newly identified. Many genes we identified are also associated with some autoimmune disorder.

تطبيقات الإحصاء الجينوم

Data-driven Fair Resource Allocation For Novel Emerging Epidemics: A COVID-19 Convalescent Plasma Case Study

79 - Maryam Akbari-Moghaddam , Na Li , Douglas G. Down 2021

Epidemics are a serious public health threat, and the resources for mitigating their effects are typically limited. Decision-makers face challenges in forecasting the demand for these resources as prior information about the disease is often not avai lable, the behaviour of the disease can periodically change (either naturally or as a result of public health policies) and can differ by geographical region. In this work, we discuss a model that is suitable for short-term real-time supply and demand forecasting during emerging outbreaks without having to rely on demographic information. We propose a data-driven mixed-integer programming (MIP) resource allocation model that assigns available resources to maximize a notion of fairness among the resource-demanding entities. Numerical results from applying our MIP model to a COVID-19 Convalescent Plasma (CCP) case study suggest that our approach can help balance the supply and demand of limited products such as CCP and minimize the unmet demand ratios of the demand entities.

تطبيقات الإحصاء

Estimating cellular redundancy in networks of genetic expression

67 - Raffaella Mulas , Michael J. Casey 2021

Networks of genetic expression can be modelled by hypergraphs with the additional structure that real coefficients are given to each vertex-edge incidence. The spectra, i.e. the multiset of the eigenvalues, of such hypergraphs, are known to encode st ructural information of the data. We show how these spectra can be used, in particular, in order to give an estimation of cellular redundancy of the network. We analyze some simulated and real data sets of gene expression for illustrating the new method proposed here.

الشبكات الجزيئية نظرية الطيف

Variable Prioritization in Nonlinear Black Box Methods: A Genetic Association Case Study

210 - Lorin Crawford , Seth R. Flaxman , Daniel E. Runcie 2018

The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the RelATive cEntrality (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other black box methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association mapping studies, we show that applying RATE enables an explanation for this improved performance.

المنهجية الأساليب الكمية تطبيقات الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة وهران احمد بن بله

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Identification and correction of sample mix-ups in expression genetic data: A case study

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً