No Arabic abstract
Identifying individuals who are at high risk of cancer due to inherited germline mutations is critical for effective implementation of personalized prevention strategies. Most existing models to identify these individuals focus on specific syndromes by including family and personal history for a small number of cancers. Recent evidence from multi-gene panel testing has shown that many syndromes once thought to be distinct are overlapping, motivating the development of models that incorporate family history information on several cancers and predict mutations for more comprehensive panels of genes. Once such class of models are Mendelian risk prediction models, which use family history information and Mendelian laws of inheritance to estimate the probability of carrying genetic mutations, as well as future risk of developing associated cancers. To flexibly model the complexity of many cancer-mutation associations, we present a new software tool called PanelPRO, a R package that extends the previously developed BayesMendel R package to user-selected lists of susceptibility genes and associated cancers. The model identifies individuals at an increased risk of carrying cancer susceptibility gene mutations and predicts future risk of developing hereditary cancers associated with those genes. Additional functionalities adjust for prophylactic interventions, known genetic testing results, and risk modifiers such as race and ancestry. The package comes with a customizable database with default parameter values estimated from published studies. The PanelPRO package is open-source and provides a fast and flexible back-end for multi-gene, multi-cancer risk modeling with pedigree data. The software enables the identification of high-risk individuals, which will have an impact on personalized prevention strategies for cancer and individualized decision making about genetic testing.
Risk evaluation to identify individuals who are at greater risk of cancer as a result of heritable pathogenic variants is a valuable component of individualized clinical management. Using principles of Mendelian genetics, Bayesian probability theory, and variant-specific knowledge, Mendelian models derive the probability of carrying a pathogenic variant and developing cancer in the future, based on family history. Existing Mendelian models are widely employed, but are generally limited to specific genes and syndromes. However, the upsurge of multi-gene panel germline testing has spurred the discovery of many new gene-cancer associations that are not presently accounted for in these models. We have developed PanelPRO, a flexible, efficient Mendelian risk prediction framework that can incorporate an arbitrary number of genes and cancers, overcoming the computational challenges that arise because of the increased model complexity. We implement an eleven-gene, eleven-cancer model, the largest Mendelian model created thus far, based on this framework. Using simulations and a clinical cohort with germline panel testing data, we evaluate model performance, validate the reverse-compatibility of our approach with existing Mendelian models, and illustrate its usage. Our implementation is freely available for research use in the PanelPRO R package.
Modeling the diameter distribution of trees in forest stands is a common forestry task that supports key biologically and economically relevant management decisions. The choice of model used to represent the diameter distribution and how to estimate its parameters has received much attention in the forestry literature; however, accessible software that facilitates comprehensive comparison of the myriad modeling approaches is not available. To this end, we developed an R package called ForestFit that simplifies estimation of common probability distributions used to model tree diameter distributions, including the two- and three-parameter Weibull distributions, Johnsons SB distribution, Birnbaum-Saunders distribution, and finite mixture distributions. Frequentist and Bayesian techniques are provided for individual tree diameter data, as well as grouped data. Additional functionality facilitates fitting growth curves to height-diameter data. The package also provides a set of functions for computing probability distributions and simulating random realizations from common finite mixture models.
Over the past years, many applications aim to assess the causal effect of treatments assigned at the community level, while data are still collected at the individual level among individuals of the community. In many cases, one wants to evaluate the effect of a stochastic intervention on the community, where all communities in the target population receive probabilistically assigned treatments based on a known specified mechanism (e.g., implementing a community-level intervention policy that target stochastic changes in the behavior of a target population of communities). The tmleCommunity package is recently developed to implement targeted minimum loss-based estimation (TMLE) of the effect of community-level intervention(s) at a single time point on an individual-based outcome of interest, including the average causal effect. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and the G-computation formula (GCOMP) are also available. The package supports multivariate arbitrary (i.e., static, dynamic or stochastic) interventions with a binary or continuous outcome. Besides, it allows user-specified data-adaptive machine learning algorithms through SuperLearner, sl3 and h2oEnsemble packages. The usage of the tmleCommunity package, along with a few examples, will be described in this paper.
Microbiome data analyses require statistical tools that can simultaneously decode microbes reactions to the environment and interactions among microbes. We introduce CARlasso, the first user-friendly open-source and publicly available R package to fit a chain graph model for the inference of sparse microbial networks that represent both interactions among nodes and effects of a set of predictors. Unlike in standard regression approaches, the edges represent the correct conditional structure among responses and predictors that allows the incorporation of prior knowledge from controlled experiments. In addition, CARlasso 1) enforces sparsity in the network via LASSO; 2) allows for an adaptive extension to include different shrinkage to different edges; 3) is computationally inexpensive through an efficient Gibbs sampling algorithm so it can equally handle small and big data; 4) allows for continuous, binary, counting and compositional responses via proper hierarchical structure, and 5) has a similar syntax to lm for ease of use. The package also supports Bayesian graphical LASSO and several of its hierarchical models as well as lower level one-step sampling functions of the CAR-LASSO model for users to extend.
BACKGROUND: The uncoupling protein (UCP) genes belong to the superfamily of electron transport carriers of the mitochondrial inner membrane. Members of the uncoupling protein family are involved in thermogenesis and determining the functional evolution of UCP genes is important to understand the evolution of thermo-regulation in vertebrates. RESULTS: Sequence similarity searches of genome and scaffold data identified homologues of UCP in eutherians, teleosts and the first squamates uncoupling proteins. Phylogenetic analysis was used to characterize the family evolutionary history by identifying two duplications early in vertebrate evolution and two losses in the avian lineage (excluding duplications within a species, excluding the losses due to incompletely sequenced taxa and excluding the losses and duplications inferred through mismatch of species and gene trees). Estimates of synonymous and nonsynonymous substitution rates (dN/dS) and more complex branch and site models suggest that the duplication events were not associated with positive Darwinian selection and that the UCP is constrained by strong purifying selection except for a single site which has undergone positive Darwinian selection, demonstrating that the UCP gene family must be highly conserved. CONCLUSION: We present a phylogeny describing the evolutionary history of the UCP gene family and show that the genes have evolved through duplications followed by purifying selection except for a single site in the mitochondrial matrix between the 5th and 6th alpha-helices which has undergone positive selection.