ترغب بنشر مسار تعليمي؟ اضغط هنا

Visualizing hierarchies in scRNA-seq data using a density tree-biased autoencoder

52   0   0.0 ( 0 )
 نشر من قبل Laurent Najman
 تاريخ النشر 2021
والبحث باللغة English
 تأليف Quentin Garrido




اسأل ChatGPT حول البحث

Single cell RNA sequencing (scRNA-seq) data makes studying the development of cells possible at unparalleled resolution. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data is expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree-structure in two dimensions is highly desirable for biological interpretation and exploratory analysis. Our two contributions are an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data, and a visualization method respecting the tree-structure. We extract the tree structure by means of a density based minimum spanning tree on a vector quantization of the data and show that it captures biological information well. We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space. We compare to other dimension reduction methods and demonstrate the success of our method experimentally. Our implementation relying on PyTorch and Higra is available at github.com/hci-unihd/DTAE.



قيم البحث

اقرأ أيضاً

Identification and quantification of condition-specific transcripts using RNA-Seq is vital in transcriptomics research. While initial efforts using mathematical or statistical modeling of read counts or per-base exonic signal have been successful, th ey may suffer from model overfitting since not all the reference transcripts in a database are expressed under a specific biological condition. Standard shrinkage approaches, such as Lasso, shrink all the transcript abundances to zero in a non-discriminative manner. Thus it does not necessarily yield the set of condition-specific transcripts. Informed shrinkage approaches, using the observed exonic coverage signal, are thus desirable. Motivated by ubiquitous uncovered exonic regions in RNA-Seq data, termed as naked exons, we propose a new computational approach that first filters out the reference transcripts not supported by splicing and paired-end reads, then followed by fitting a new mathematical model of per-base exonic coverage signal and the underlying transcripts structure. We introduce a tuning parameter to penalize the specific regions of the selected transcripts that were not supported by the naked exons. Our approach compares favorably with the selected competing methods in terms of both time complexity and accuracy using simulated and real-world data. Our method is implemented in SAMMate, a GUI software suite freely available from http://sammate.sourceforge.net
Rule-based modeling is a powerful way to model kinetic interactions in biochemical systems. Rules enable a precise encoding of biochemical interactions at the resolution of sites within molecules, but obtaining an integrated global view from sets of rules remains challenging. Current automated approaches to rule visualization fail to address the complexity of interactions between rules, limiting either the types of rules that are allowed or the set of interactions that can be visualized simultaneously. There is a need for scalable visualization approaches that present the information encoded in rules in an intuitive and useful manner at different levels of detail. We have developed new automated approaches for visualizing both individual rules and complete rule-based models. We find that a more compact representation of an individual rule promotes promotes understanding the model assumptions underlying each rule. For global visualization of rule interactions, we have developed a method to synthesize a network of interactions between sites and processes from a rule-based model and then use a combination of user-defined and automated approaches to compress this network into a readable form. The resulting diagrams enable modelers to identify signaling motifs such as cascades, feedback loops, and feed-forward loops in complex models, as we demonstrate using several large-scale models. These capabilities are implemented within the BioNetGen framework but the approach is equally applicable to rule-based models specified in other formats.
Motivated by applications in systems biology, we seek a probabilistic framework based on Markov processes to represent intracellular processes. We review the formal relationships between different stochastic models referred to in the systems biology literature. As part of this review, we present a novel derivation of the differential Chapman-Kolmogorov equation for a general multidimensional Markov process made up of both continuous and jump processes. We start with the definition of a time-derivative for a probability density but place no restrictions on the probability distribution, in particular, we do not assume it to be confined to a region that has a surface (on which the probability is zero). In our derivation, the master equation gives the jump part of the Markov process while the Fokker-Planck equation gives the continuous part. We thereby sketch a {}``family tree for stochastic models in systems biology, providing explicit derivations of their formal relationship and clarifying assumptions involved.
RNA-Seq technology allows for studying the transcriptional state of the cell at an unprecedented level of detail. Beyond quantification of whole-gene expression, it is now possible to disentangle the abundance of individual alternatively spliced tran script isoforms of a gene. A central question is to understand the regulatory processes that lead to differences in relative abundance variation due to external and genetic factors. Here, we present a mixed model approach that allows for (i) joint analysis and genetic mapping of multiple transcript isoforms and (ii) mapping of isoform-specific effects. Central to our approach is to comprehensively model the causes of variation and correlation between transcript isoforms, including the genomic background and technical quantification uncertainty. As a result, our method allows to accurately test for shared as well as transcript-specific genetic regulation of transcript isoforms and achieves substantially improved calibration of these statistical tests. Experiments on genotype and RNA-Seq data from 126 human HapMap individuals demonstrate that our model can help to obtain a more fine-grained picture of the genetic basis of gene expression variation.
145 - Carlo R. Contaldi 2020
Timely estimation of the current value for COVID-19 reproduction factor $R$ has become a key aim of efforts to inform management strategies. $R$ is an important metric used by policy-makers in setting mitigation levels and is also important for accur ate modelling of epidemic progression. This brief paper introduces a method for estimating $R$ from biased case testing data. Using testing data, rather than hospitalisation or death data, provides a much earlier metric along the symptomatic progression scale. This can be hugely important when fighting the exponential nature of an epidemic. We develop a practical estimator and apply it to Scottish case testing data to infer a current (20 May 2020) $R$ value of $0.74$ with $95%$ confidence interval $[0.48 - 0.86]$.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا