ﻻ يوجد ملخص باللغة العربية
Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajimas D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package phasty, and R code for the reproduction of our results is available as an accompanying vignette.
Let $X_{nr}$ be the $r$th largest of a random sample of size $n$ from a distribution $F (x) = 1 - sum_{i = 0}^infty c_i x^{-alpha - i beta}$ for $alpha > 0$ and $beta > 0$. An inversion theorem is proved and used to derive an expansion for the quanti
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varyi
Graphical models express conditional independence relationships among variables. Although methods for vector-valued data are well established, functional data graphical models remain underdeveloped. We introduce a notion of conditional independence b
In this paper the method of simulated quantiles (MSQ) of Dominicy and Veredas (2013) and Dominick et al. (2013) is extended to a general multivariate framework (MMSQ) and to provide a sparse estimator of the scale matrix (sparse-MMSQ). The MSQ, like
Regression models describing the joint distribution of multivariate response variables conditional on covariate information have become an important aspect of contemporary regression analysis. However, a limitation of such models is that they often r