No Arabic abstract
Admixed populations are formed by the merging of two or more ancestral populations, and the ancestry of each locus in an admixed genome derives from either source. Consider a simple pulse admixture model, where populations A and B merged t generations ago without subsequent gene flow. We derive the distribution of the proportion of an admixed chromosome that has A (or B) ancestry, as a function of the chromosome length L, t, and the initial contribution of the A source, m. We demonstrate that these results can be used for inference of the admixture parameters. For more complex admixture models, we derive an expression in Laplace space for the distribution of ancestry proportions that depends on having the distribution of the lengths of segments of each ancestry. We obtain explicit results for the special case of a two-wave admixture model, where population A contributed additional migrants in one of the generations between the present and the initial admixture event. Specifically, we derive formulas for the distribution of A and B segment lengths and numerical results for the distribution of ancestry proportions. We show that for recent admixture, data generated under a two-wave model can hardly be distinguished from that generated under a pulse model.
The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the model, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for HGDP individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations---including previously undetected admixture in Sardinians and Basques---involving a proportion of 20--40% ancient northern Eurasian ancestry.
Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.
We scanned through the genomes of 29,141 African Americans, searching for loci where the average proportion of African ancestry deviates significantly from the genome-wide average. We failed to find any genome-wide significant deviations, and conclude that any selection in African Americans since admixture is sufficiently weak that it falls below the threshold of our power to detect it using a large sample size. These results stand in contrast to the findings of a recent study of selection in African Americans. That study, which had 15 times fewer samples, reported six loci with significant deviations. We show that the discrepancy is likely due to insufficient correction for multiple hypothesis testing in the previous study. The same study reported 14 loci that showed greater population differentiation between African Americans and Nigerian Yoruba than would be expected in the absence of natural selection. Four such loci were previously shown to be genome-wide significant and likely to be affected by selection, but we show that most of the 10 additional loci are likely to be false positives. Additionally, the most parsimonious explanation for the loci that have significant evidence of unusual differentiation in frequency between Nigerians and Africans Americans is selection in Africa prior to their forced migration to the Americas.
The genetic structure of human populations is extraordinarily complex and of fundamental importance to studies of anthropology, evolution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple origins. Misclassification of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease studies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individuals. reAdmix is an online tool available at http://chcb.saban-chla.usc.edu/reAdmix/.
Recent statistical and computational analyses have shown that a genealogical most recent common ancestor (MRCA) may have lived in the recent past. However, coalescent-based approaches show that genetic most recent common ancestors for a given non-recombining locus are typically much more ancient. It is not immediately clear how these two perspectives interact. This paper investigates relationships between the number of descendant alleles of an ancestor allele and the number of genealogical descendants of the individual who possessed that allele for a simple diploid genetic model extending the genealogical model of Joseph Chang.