-
Proceedings of the National Academy of... Nov 2020Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked...
Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked genealogies without leaf labels unlock opportunities in the analysis of evolutionary trees. In particular, comparisons between ranked genealogies facilitate the study of evolutionary processes of different organisms sampled at multiple time periods. We propose metrics on ranked tree shapes and ranked genealogies for lineages isochronously and heterochronously sampled. Our proposed tree metrics make it possible to conduct statistical analyses of ranked tree shapes and timed ranked tree shapes or ranked genealogies. Such analyses allow us to assess differences in tree distributions, quantify estimation uncertainty, and summarize tree distributions. We show the utility of our metrics via simulations and an application in infectious diseases.
Topics: Biological Evolution; Computer Simulation; Genetics, Population; Models, Genetic; Pedigree; Phylogeny; Sequence Analysis, DNA
PubMed: 33139566
DOI: 10.1073/pnas.1922851117 -
American Journal of Primatology Mar 2018Knowing the density or abundance of primate populations is essential for their conservation management and contextualizing socio-demographic and behavioral observations.... (Review)
Review
Knowing the density or abundance of primate populations is essential for their conservation management and contextualizing socio-demographic and behavioral observations. When direct counts of animals are not possible, genetic analysis of non-invasive samples collected from wildlife populations allows estimates of population size with higher accuracy and precision than is possible using indirect signs. Furthermore, in contrast to traditional indirect survey methods, prolonged or periodic genetic sampling across months or years enables inference of group membership, movement, dynamics, and some kin relationships. Data may also be used to estimate sex ratios, sex differences in dispersal distances, and detect gene flow among locations. Recent advances in capture-recapture models have further improved the precision of population estimates derived from non-invasive samples. Simulations using these methods have shown that the confidence interval of point estimates includes the true population size when assumptions of the models are met, and therefore this range of population size minima and maxima should be emphasized in population monitoring studies. Innovations such as the use of sniffer dogs or anti-poaching patrols for sample collection are important to ensure adequate sampling, and the expected development of efficient and cost-effective genotyping by sequencing methods for DNAs derived from non-invasive samples will automate and speed analyses.
Topics: Animals; Censuses; Conservation of Natural Resources; Genetics, Population; Population Density; Primates
PubMed: 29457631
DOI: 10.1002/ajp.22743 -
Current Opinion in Microbiology Feb 2015Parasites, defined as eukaryotic microbes and parasitic worms that cause global diseases of human and veterinary importance, span many lineages in the eukaryotic Tree of... (Review)
Review
Parasites, defined as eukaryotic microbes and parasitic worms that cause global diseases of human and veterinary importance, span many lineages in the eukaryotic Tree of Life. Historically challenging to study due to their complicated life-cycles and association with impoverished settings, their inherent complexities are now being elucidated by genome sequencing. Over the course of the last decade, projects in large sequencing centers, and increasingly frequently in individual research labs, have sequenced dozens of parasite reference genomes and field isolates from patient populations. This 'tsunami' of genomic data is answering questions about parasite genetic diversity, signatures of evolution orchestrated through anti-parasitic drug and host immune pressure, and the characteristics of populations. This brief review focuses on the state of the art of parasitic protist genomics, how the peculiar genomes of parasites are driving creative methods for their sequencing, and the impact that next-generation sequencing is having on our understanding of parasite population genomics and control of the diseases they cause.
Topics: Animals; Evolution, Molecular; Genetic Variation; Genetics, Population; Genomics; Humans; Parasites; Parasitic Diseases
PubMed: 25461572
DOI: 10.1016/j.mib.2014.11.001 -
PLoS Genetics Jan 2021FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship...
FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and heritability estimation. The most frequently-used estimators of FST and kinship are method-of-moments estimators whose accuracies depend strongly on the existence of simple underlying forms of structure, such as the independent subpopulations model of non-overlapping, independently evolving subpopulations. However, modern data sets have revealed that these simple models of structure likely do not hold in many populations, including humans. In this work, we analyze the behavior of these estimators in the presence of arbitrarily-complex population structures, which results in an improved estimation framework specifically designed for arbitrary population structures. After generalizing the definition of FST to arbitrary population structures and establishing a framework for assessing bias and consistency of genome-wide estimators, we calculate the accuracy of existing FST and kinship estimators under arbitrary population structures, characterizing biases and estimation challenges unobserved under their originally-assumed models of structure. We then present our new approach, which consistently estimates kinship and FST when the minimum kinship value in the dataset is estimated consistently. We illustrate our results using simulated genotypes from an admixture model, constructing a one-dimensional geographic scenario that departs nontrivially from the independent subpopulations model. Our simulations reveal the potential for severe biases in estimates of existing approaches that are overcome by our new framework. This work may significantly improve future analyses that rely on accurate kinship and FST estimates.
Topics: Genetics, Population; Genome-Wide Association Study; Genotype; Humans; Inbreeding; Models, Genetic; Pedigree; Polymorphism, Single Nucleotide
PubMed: 33465078
DOI: 10.1371/journal.pgen.1009241 -
Methods in Molecular Biology (Clifton,... 2020Coalescent simulation is a fundamental tool in modern population genetics. The msprime library provides unprecedented scalability in terms of both the simulations that...
Coalescent simulation is a fundamental tool in modern population genetics. The msprime library provides unprecedented scalability in terms of both the simulations that can be performed and the efficiency with which the results can be processed. We show how coalescent models for population structure and demography can be constructed using a simple Python API, as well as how we can process the results of such simulations to efficiently calculate statistics of interest. We illustrate msprime's flexibility by implementing a simple (but functional) approximate Bayesian computation inference method in just a few tens of lines of code.
Topics: Algorithms; Bayes Theorem; Computational Biology; Genetics, Population; Models, Genetic
PubMed: 31975169
DOI: 10.1007/978-1-0716-0199-0_9 -
Molecular Ecology Jan 2020Genetic time-series data from historical samples greatly facilitate inference of past population dynamics and species evolution. Yet, although climate and landscape... (Review)
Review
Genetic time-series data from historical samples greatly facilitate inference of past population dynamics and species evolution. Yet, although climate and landscape change are often touted as post-hoc explanations of biological change, our understanding of past climate and landscape change influences on evolutionary processes is severely hindered by the limited application of methods that directly relate environmental change to species dynamics through time. Increased integration of spatiotemporal environmental and genetic data will revolutionize the interpretation of environmental influences on past population processes and the quantification of recent anthropogenic impacts on species, and vastly improve prediction of species responses under future climate change scenarios, yielding widespread revelations across evolutionary biology, landscape ecology and conservation genetics. This review encourages greater use of spatiotemporal landscape genetic analyses that explicitly link landscape, climate and genetic data through time by providing an overview of analytical approaches for integrating historical genetic and environmental data in five key research areas: population genetic structure, demography, phylogeography, metapopulation connectivity and adaptation. We also include a tabular summary of key methodological information, suggest approaches for mitigating the particular difficulties in applying these techniques to ancient DNA and palaeoclimate data, and highlight areas for future methodological development.
Topics: Climate Change; Ecology; Genetics, Population; Phylogeography; Population Dynamics
PubMed: 31758601
DOI: 10.1111/mec.15315 -
Molecular Ecology Resources Aug 2022The measurement of biodiversity at all levels of organization is an essential first step to understand the ecological and evolutionary processes that drive spatial...
The measurement of biodiversity at all levels of organization is an essential first step to understand the ecological and evolutionary processes that drive spatial patterns of biodiversity. Ecologists have explored the use of a large range of different summary statistics and have come to the view that information-based summary statistics, and in particular so-called Hill numbers, are a useful tool to measure biodiversity. Population geneticists, on the other hand, have focused largely on summary statistics based on heterozygosity and measures of allelic richness. However, recent studies proposed the adoption of information-based summary statistics in population genetics studies. Here, we performed a comprehensive assessment of the power of this family of summary statistics to inform regarding spatial patterns of genetic diversity and we compared it with that of traditional population genetics approaches, namely measures based on allelic richness and heterozygosity. To give an unbiased evaluation, we used three machine learning methods to test the performance of different sets of summary statistics to discriminate between spatial scenarios. We defined three distinct sets, (i) one based on allelic richness measures which included the Jaccard index, (ii) a set based on heterozygosity that included F and (iii) a set based on Hill numbers derived from Shannon entropy, which included the recently proposed Shannon differentiation, ΔD. The results showed that the last of these performed as well or, under some specific spatial scenarios, even better than the traditional population genetics measures. Interestingly, we found that a rarely or never used genetic differentiation measure based on allelic richness, Jaccard dissimilarity (J), showed the highest discriminatory power to discriminate among spatial scenarios, followed by Shannon differentiation ΔD. We concluded, therefore, that information-based measures as well as Jaccard dissimilarity represent excellent additions to the population genetics toolkit.
Topics: Alleles; Biodiversity; Genetic Drift; Genetic Variation; Genetics, Population
PubMed: 35255178
DOI: 10.1111/1755-0998.13606 -
Molecular Ecology Jun 2017In populations occupying discrete habitat patches, gene flow between habitat patches may form an intricate population structure. In such structures, the evolutionary... (Review)
Review
In populations occupying discrete habitat patches, gene flow between habitat patches may form an intricate population structure. In such structures, the evolutionary dynamics resulting from interaction of gene-flow patterns with other evolutionary forces may be exceedingly complex. Several models describing gene flow between discrete habitat patches have been presented in the population-genetics literature; however, these models have usually addressed relatively simple settings of habitable patches and have stopped short of providing general methodologies for addressing nontrivial gene-flow patterns. In the last decades, network theory - a branch of discrete mathematics concerned with complex interactions between discrete elements - has been applied to address several problems in population genetics by modelling gene flow between habitat patches using networks. Here, we present the idea and concepts of modelling complex gene flows in discrete habitats using networks. Our goal is to raise awareness to existing network theory applications in molecular ecology studies, as well as to outline the current and potential contribution of network methods to the understanding of evolutionary dynamics in discrete habitats. We review the main branches of network theory that have been, or that we believe potentially could be, applied to population genetics and molecular ecology research. We address applications to theoretical modelling and to empirical population-genetic studies, and we highlight future directions for extending the integration of network science with molecular ecology.
Topics: Biological Evolution; Ecology; Ecosystem; Gene Flow; Genetics, Population; Models, Genetic; Population Dynamics
PubMed: 28207956
DOI: 10.1111/mec.14059 -
The Journal of Heredity 2014Alaska caribou (Rangifer tarandus granti) in southwestern Alaska are a poorly understood system, with differing descriptions of their regional population structure,...
Alaska caribou (Rangifer tarandus granti) in southwestern Alaska are a poorly understood system, with differing descriptions of their regional population structure, population abundance that has varied greatly through time and instances of the release of domestic reindeer (R. t. tarandus) into their range. Here, we use 21 microsatellites and 297 individuals to investigate the genetic population structure of herds and examine for population bottlenecks. Then, using genetic characteristics of existing reindeer populations, we examine introgression into the wild caribou populations. Caribou of the area are genetically diverse (H E between 0.69 and 0.84), with diversity decreasing along the Alaska Peninsula (AP). Using G ST and Jost's D, we find extensive structuring among all herds; Migrate-n finds that AP herds share few effective migrants with other herds, with Southern AP and Unimak Island herds having the least. Bayesian clustering techniques are able to resolve all but Denali and Mulchatna caribou herds. Using a conservative assignment threshold of q reindeer ≥ 0.2, 3% of caribou show signs of domestic introgression. Denali herd has the most introgressed individuals (6.9%); those caribou herds that were historically adjacent to smaller reindeer herds, or were historically without adjacent herding, show no admixture. This domestic introgression persists despite the lack of managed reindeer in the region since the 1940s. Our results suggest that despite previous movement data indicating metapopulation-like dispersal in this region, there may be unknown barriers to reproduction by dispersing individuals. Finally, our results support findings that wild and domestic Rangifer can hybridize and show this introgression may persist dozens of generations after domestics are no longer present.
Topics: Alaska; Alleles; Animals; Bayes Theorem; Genetic Loci; Genetic Variation; Genetics, Population; Microsatellite Repeats; Phylogeography; Reindeer
PubMed: 24842565
DOI: 10.1093/jhered/esu030 -
Current Protocols in Bioinformatics Dec 2019Many evolutionary biologists collect genetic data from natural populations and then need to investigate the relationship among these populations to compare different...
Many evolutionary biologists collect genetic data from natural populations and then need to investigate the relationship among these populations to compare different biogeographic hypotheses. MIGRATE, a useful tool for exploring relationships between populations and comparing hypotheses, has existed since 1998. Throughout the years, it has steadily improved in both the quality of algorithms used and in the efficiency of carrying out those calculations, thus allowing for a larger number of loci to be evaluated. This efficiency has been enhanced, as MIGRATE has been developed to perform many of its calculations concurrently when running on a computer cluster. The program is based on the coalescence theory and uses Bayesian inference to estimate posterior probability densities of all the parameters of a user-specified population model. Complex models, which include migration and colonization parameters, can be specified. These models can be evaluated using marginal likelihoods, thus allowing a user to compare the merits of different hypotheses. The three presented protocols will help novice users to develop sophisticated analysis techniques useful for their research projects. © 2019 The Authors. Basic Protocol 1: First steps with MIGRATE Basic Protocol 2: Population model specification Basic Protocol 3: Prior distribution specification Basic Protocol 4: Model selection Support Protocol 1: Installing the program MIGRATE Support Protocol 2: Installation of parallel MIGRATE.
Topics: Algorithms; Bayes Theorem; Cluster Analysis; Computer Simulation; Genetics, Population; Humans; Likelihood Functions; Models, Genetic; Phylogeny; Software
PubMed: 31756024
DOI: 10.1002/cpbi.87