Reconstructing pedigrees: a stochastic perspective

344 0 0.0 ( 0 )

Download Cite

Added by Bhalchandra Thatte

Publication date 2007

fields Biology

and research's language is English

Authors Bhalchandra D. Thatte - Mike Steel

Populations and Evolution

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

A pedigree is a directed graph that describes how individuals are related through ancestry in a sexually-reproducing population. In this paper we explore the question of whether one can reconstruct a pedigree by just observing sequence data for present day individuals. This is motivated by the increasing availability of genomic sequences, but in this paper we take a more theoretical approach and consider what models of sequence evolution might allow pedigree reconstruction (given sufficiently long sequences). Our results complement recent work that showed that pedigree reconstruction may be fundamentally impossible if one uses just the degrees of relatedness between different extant individuals. We find that for certain stochastic processes, pedigrees can be recovered up to isomorphism from sufficiently long sequences.

rate research

Reconstructing pedigrees: some identifiability questions for a recombination-mutation model

385 - Bhalchandra D. Thatte 2010

Pedigrees are directed acyclic graphs that represent ancestral relationships between individuals in a population. Based on a schematic recombination process, we describe two simple Markov models for sequences evolving on pedigrees - Model R (recombinations without mutations) and Model RM (recombinations with mutations). For these models, we ask an identifiability question: is it possible to construct a pedigree from the joint probability distribution of extant sequences? We present partial identifiability results for general pedigrees: we show that when the crossover probabilities are sufficiently small, certain spanning subgraph sequences can be counted from the joint distribution of extant sequences. We demonstrate how pedigrees that earlier seemed difficult to distinguish are distinguished by counting their spanning subgraph sequences.

Populations and Evolution Combinatorics

Efficient Reconstruction of Stochastic Pedigrees

64 - Younhun Kim , Elchanan Mossel , Govind Ramnarayan 2020

We introduce a new algorithm called {sc Rec-Gen} for reconstructing the genealogy or textit{pedigree} of an extant population purely from its genetic data. We justify our approach by giving a mathematical proof of the effectiveness of {sc Rec-Gen} when applied to pedigrees from an idealized generative model that replicates some of the features of real-world pedigrees. Our algorithm is iterative and provides an accurate reconstruction of a large fraction of the pedigree while having relatively low emph{sample complexity}, measured in terms of the length of the genetic sequences of the population. We propose our approach as a prototype for further investigation of the pedigree reconstruction problem toward the goal of applications to real-world examples. As such, our results have some conceptual bearing on the increasingly important issue of genomic privacy.

Data Structures and Algorithms Machine Learning Populations and Evolution

Reconstructing Roma history from genome-wide data

526 - Priya Moorjani , Nick Patterson , Po-Ru Loh 2012

The Roma people, living throughout Europe, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1000-1500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry-deriving from a combination of European and South Asian sources- and that the date of admixture of South Asian and European ancestry was about 850 years ago. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which we hypothesize was followed by a major demographic expansion once the population arrived in Europe.

Populations and Evolution Genomics

A new perspective on the dynamics of fragmented populations

363 - Anders Eriksson 2008

Understanding the time evolution of fragmented animal populations and their habitats, connected by migration, is a problem of both theoretical and practical interest. This paper presents a method for calculating the time evolution of the habitats population size distribution from a general stochastic dynamic within each habitat, using a deterministic approximation which becomes exact for an infinite number of habitats. Fragmented populations are usually thought to be characterized by a separation of time scale between, on the one hand, colonization and extinction of habitats and, on the other hand, the local population dynamics within each habitat. The analysis in this paper suggests an alternative view: the effective population dynamic stems from a law of large numbers, where stochastic fluctuations in population size of single habitats are buffered through the dispersal pool so that the global population dynamic remains approximately smooth. For illustration, the deterministic approximation is compared to simulations of a stochastic model with density dependent local recruitment and mortality. The article is concluded with a discussion of the general implications of the results, and possible extensions of the method.

Populations and Evolution

Reconstructing subclonal composition and evolution from whole genome sequencing of tumors

460 - Amit G. Deshwar , Shankar Vembu , Christina K. Yung 2014

Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, that can be applied to WGS data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods.

Populations and Evolution Machine Learning Machine Learning