ﻻ يوجد ملخص باللغة العربية
Quantitative methods for studying biodiversity have been traditionally rooted in the classical theory of finite frequency tables analysis. However, with the help of modern experimental tools, like high throughput sequencing, we now begin to unlock the outstanding diversity of genomic data in plants and animals reflective of the long evolutionary history of our planet. This molecular data often defies the classical frequency/contingency tables assumptions and seems to require sparse tables with very large number of categories and highly unbalanced cell counts, e.g., following heavy tailed distributions (for instance, power laws). Motivated by the molecular diversity studies, we propose here a frequency-based framework for biodiversity analysis in the asymptotic regime where the number of categories grows with sample size (an infinite contingency table). Our approach is rooted in information theory and based on the Gaussian limit results for the effective number of species (the Hill numbers) and the empirical Renyi entropy and divergence. We argue that when applied to molecular biodiversity analysis our methods can properly account for the complicated data frequency patterns on one hand and the practical sample size limitations on the other. We illustrate this principle with two specific RNA sequencing examples: a comparative study of T-cell receptor populations and a validation of some preselected molecular hepatocellular carcinoma (HCC) markers.
In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of species overlap and diversity borrowed from ecological studies. While this approach is robust i
This paper studies forward and reverse projections for the R{e}nyi divergence of order $alpha in (0, infty)$ on $alpha$-convex sets. The forward projection on such a set is motivated by some works of Tsallis {em et al.} in statistical physics, and th
It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this article, we identify key limitations that are intrinsic to d
Background: High-throughput techniques bring novel tools but also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcript
Fawzi and Fawzi recently defined the sharp Renyi divergence, $D_alpha^#$, for $alpha in (1, infty)$, as an additional quantum Renyi divergence with nice mathematical properties and applications in quantum channel discrimination and quantum communicat