ترغب بنشر مسار تعليمي؟ اضغط هنا

Essential guidelines for computational method benchmarking

112   0   0.0 ( 0 )
 نشر من قبل Lukas Weber
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.

قيم البحث

اقرأ أيضاً

The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these problems. Three family of methods, decision trees, kNN (including advanced and adaptive kNN) and probability density evaluation with radial basis functions, are used for classification and risk estimation. Several pre-processing approaches were implemented and compared. The best of them are used to create the diagnostic system. For the differential diagnosis the best solution gives the sensitivity and specificity of 83.5% and 77%, respectively (using three input features, CRP, Haptoglobin and standard clinical symptom). For the screening task, the decision tree method provides the best result, with sensitivity and specificity of 81.4% and >99%, respectively (using the same input features). If the clinical symptoms (Lymphadenopathy) are considered as unknown then a decision tree with CRP and Hapt only provides sensitivity 69% and specificity 83.5%. The lymphoma risk evaluation problem is formulated and solved. The best models are selected as the system for computational lymphoma diagnosis and evaluation the risk of lymphoma as well. These methods are implemented into a special web-accessed software and are applied to problem of monitoring dogs with lymphoma after treatment. It detects recurrence of lymphoma up to two months prior to the appearance of clinical signs. The risk map visualisation provides a friendly tool for explanatory data analysis.
Objectives: Current standards for comparing stunting across human populations assume a universal model of child growth. Such comparisons ignore population differences that are independent of deprivation and health outcomes. This paper partitions vari ation in height-for-age that is specifically associated with deprivation and health outcomes to provide a basis for cross-population comparisons. Materials & Methods: Using a multi-level model with a sigmoid relationship of resources and growth, we partition variation in height-for-age z-scores (HAZ) from 1,522,564 children across 70 countries into two components: 1) accrued HAZ shaped by environmental inputs (e.g., undernutrition, infectious disease, inadequate sanitation, poverty), and 2) a country-specific basal HAZ independent of such inputs. We validate these components against population-level infant mortality rates, and assess how these basal differences may affect cross-population comparisons of stunting. Results: Basal HAZ differs reliably across countries (range of 1.5 SD) and is independent of measures of infant mortality. By contrast, accrued HAZ captures stunting as impaired growth due to deprivation and is more closely associated with infant mortality than observed HAZ. Ranking populations by accrued HAZ suggest that populations in West Africa and the Caribbean suffer much greater levels of stunting than suggested by observed HAZ. Discussion: Current universal standards may dramatically underestimate stunting in populations with taller basal HAZ. Relying on observed HAZ rather than accrued HAZ may also lead to inappropriate cross-population comparisons, such as concluding that Haitian children enjoy better conditions for growth than do Indian or Guatemalan children.
High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity and mechanisms underlying human health and disease. Large-scale metabolomics data, gen erated using targeted or nontargeted platforms, are increasingly more common. Appropriate statistical analysis of these complex high-dimensional data is critical for extracting meaningful results from such large-scale human metabolomics studies. Herein, we consider the main statistical analytical approaches that have been employed in human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we propose a step-by-step framework for pursuing statistical analyses of human metabolomics data. We discuss the range of options and potential approaches that may be employed at each stage of data management, analysis, and interpretation, and offer guidance on analytical considerations that are important for implementing an analysis workflow. Certain pervasive analytical challenges facing human metabolomics warrant ongoing research. Addressing these challenges will allow for more standardization in the field and lead to analytical advances in metabolomics investigations with the potential to elucidate novel mechanisms underlying human health and disease.
Apoptosis is essential for numerous processes, such as development, resistance to infections, and suppression of tumorigenesis. Here, we investigate the influence of the nutrient sensing and longevity-assuring enzyme SIRT6 on the dynamics of apoptosi s triggered by serum starvation. Specifically, we characterize the progression of apoptosis in wild type and SIRT6 deficient mouse embryonic fibroblasts using time-lapse flow cytometry and computational modelling based on rate-equations and cell distribution analysis. We find that SIRT6 deficient cells resist apoptosis by delaying its initiation. Interestingly, once apoptosis is initiated, the rate of its progression is higher in SIRT6 null cells compared to identically cultured wild type cells. However, SIRT6 null cells succumb to apoptosis more slowly, not only in response to nutrient deprivation but also in response to other stresses. Our data suggest that SIRT6 plays a role in several distinct steps of apoptosis. Overall, we demonstrate the utility of our computational model to describe stages of apoptosis progression and the integrity of the cellular membrane. Such measurements will be useful in a broad range of biological applications. We describe a computational method to evaluate the progression of apoptosis through different stages. Using this method, we describe how cells devoid of SIRT6 longevity gene respond to apoptosis stimuli, specifically, how they respond to starvation. We find that SIRT6 cells resist apoptosis initiation; however, once initiated, they progress through the apoptosis at a faster rate. These data are first of the kind and suggest that SIRT6 activities might play different roles at different stages of apoptosis. The model that we propose can be used to quantitatively evaluate progression of apoptosis and will be useful in studies of cancer treatments and other areas where apoptosis is involved.
Background. Emerging technologies now allow for mass spectrometry based profiling of up to thousands of small molecule metabolites (metabolomics) in an increasing number of biosamples. While offering great promise for revealing insight into the patho genesis of human disease, standard approaches have yet to be established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes including disease outcomes. To determine optimal statistical approaches for metabolomics analysis, we sought to formally compare traditional statistical as well as newer statistical learning methods across a range of metabolomics dataset types. Results. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observed that with an increasing number of study subjects, univariate compared to multivariate methods resulted in a higher false discovery rate due to substantial correlations among metabolites. In scenarios wherein the number of assayed metabolites increases, as in the application of nontargeted versus targeted metabolomics measures, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. Conclusion. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small sized cohorts, sparse multivariate models exhibited the most robust statistical power with more consistent results. These findings have important implications for the analysis of metabolomics studies of human disease.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا