ترغب بنشر مسار تعليمي؟ اضغط هنا

Quantitative methods for studying biodiversity have been traditionally rooted in the classical theory of finite frequency tables analysis. However, with the help of modern experimental tools, like high throughput sequencing, we now begin to unlock th e outstanding diversity of genomic data in plants and animals reflective of the long evolutionary history of our planet. This molecular data often defies the classical frequency/contingency tables assumptions and seems to require sparse tables with very large number of categories and highly unbalanced cell counts, e.g., following heavy tailed distributions (for instance, power laws). Motivated by the molecular diversity studies, we propose here a frequency-based framework for biodiversity analysis in the asymptotic regime where the number of categories grows with sample size (an infinite contingency table). Our approach is rooted in information theory and based on the Gaussian limit results for the effective number of species (the Hill numbers) and the empirical Renyi entropy and divergence. We argue that when applied to molecular biodiversity analysis our methods can properly account for the complicated data frequency patterns on one hand and the practical sample size limitations on the other. We illustrate this principle with two specific RNA sequencing examples: a comparative study of T-cell receptor populations and a validation of some preselected molecular hepatocellular carcinoma (HCC) markers.
In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of species overlap and diversity borrowed from ecological studies. While this approach is robust i n a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method which adjusts for the data under-sampling as well as provides a unifying approach to simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in ecological studies of biodiversity patterns. The procedure for evaluating models fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.
We present herein an extension of an algebraic statistical method for inferring biochemical reaction networks from experimental data, proposed recently in [3]. This extension allows us to analyze reaction networks that are not necessarily full-dimens ional, i.e., the dimension of their stoichiometric space is smaller than the number of species. Specifically, we propose to augment the original algebraic-statistical algorithm for network inference with a preprocessing step that identifies the subspace spanned by the correct reaction vectors, within the space spanned by the species. This dimension reduction step is based on principal component analysis of the input data and its relationship with various subspaces generated by sets of candidate reaction vectors. Simulated examples are provided to illustrate the main ideas involved in implementing this method, and to asses its performance.
We present a novel method for identifying a biochemical reaction network based on multiple sets of estimated reaction rates in the corresponding reaction rate equations arriving from various (possibly different) experiments. The current method, unlik e some of the graphical approaches proposed in the literature, uses the values of the experimental measurements only relative to the geometry of the biochemical reactions under the assumption that the underlying reaction network is the same for all the experiments. The proposed approach utilizes algebraic statistical methods in order to parametrize the set of possible reactions so as to identify the most likely network structure, and is easily scalable to very complicated biochemical systems involving a large number of species and reactions. The method is illustrated with a numerical example of a hypothetical network arising form a mass transfer-type model.
We derive herein the limiting laws for certain stationary distributions of birth-and-death processes related to the classical model of chemical adsorption-desorption reactions due to Langmuir. The model has been recently considered in the context of a hybridization reaction on an oligonucleotide DNA microarray. Our results imply that the truncated gamma- and beta- type distributions can be used as approximations to the observed distributions of the fluorescence readings of the oligo-probes on a microarray. These findings might be useful in developing new model-based, probe-specific methods of extracting target concentrations from array fluorescence readings.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا