-
Genome Biology and Evolution 2013Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms....
Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a "shell" of moderately common genes, and a "cloud" of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.
Topics: Archaea; Bacteria; Evolution, Molecular; Gene Frequency; Genome, Archaeal; Genome, Bacterial; Models, Genetic; Selection, Genetic
PubMed: 23315380
DOI: 10.1093/gbe/evt002 -
Nature Reviews. Genetics Jun 2013As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very... (Review)
Review
As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population genomic data sets. Such data hold the potential to resolve long-standing questions in evolutionary biology about the role of gene exchange in species formation. In principle, the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However, there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here.
Topics: Animals; Evolution, Molecular; Gene Flow; Gene Frequency; Genetic Speciation; Genome, Human; Humans; Likelihood Functions; Linkage Disequilibrium; Models, Genetic; Polymorphism, Genetic
PubMed: 23657479
DOI: 10.1038/nrg3446 -
Genetics Jan 2022The Patterson F- and D-statistics are commonly used measures for quantifying population relationships and for testing hypotheses about demographic history. These...
The Patterson F- and D-statistics are commonly used measures for quantifying population relationships and for testing hypotheses about demographic history. These statistics make use of allele frequency information across populations to infer different aspects of population history, such as population structure and introgression events. Inclusion of related or inbred individuals can bias such statistics, which may often lead to the filtering of such individuals. Here, we derive statistical properties of the F- and D-statistics, including their biases due to the inclusion of related or inbred individuals, their variances, and their corresponding mean squared errors. Moreover, for those statistics that are biased, we develop unbiased estimators and evaluate the variances of these new quantities. Comparisons of the new unbiased statistics to the originals demonstrates that our newly derived statistics often have lower error across a wide population parameter space. Furthermore, we apply these unbiased estimators using several global human populations with the inclusion of related individuals to highlight their application on an empirical dataset. Finally, we implement these unbiased estimators in open-source software package funbiased for easy application by the scientific community.
Topics: Gene Frequency
PubMed: 34849832
DOI: 10.1093/genetics/iyab090 -
Nature Reviews. Genetics Jun 2015Next-generation sequencing technology has facilitated the discovery of millions of genetic variants in human genomes. A sizeable fraction of these variants are predicted... (Review)
Review
Next-generation sequencing technology has facilitated the discovery of millions of genetic variants in human genomes. A sizeable fraction of these variants are predicted to be deleterious. Here, we review the pattern of deleterious alleles as ascertained in genome sequencing data sets and ask whether human populations differ in their predicted burden of deleterious alleles - a phenomenon known as mutation load. We discuss three demographic models that are predicted to affect mutation load and relate these models to the evidence (or the lack thereof) for variation in the efficacy of purifying selection in diverse human genomes. We also emphasize why accurate estimation of mutation load depends on assumptions regarding the distribution of dominance and selection coefficients - quantities that remain poorly characterized for current genomic data sets.
Topics: Founder Effect; Gene Frequency; Genes, Dominant; Genetic Drift; Genome, Human; Human Migration; Humans; Models, Genetic; Mutation; Selection, Genetic
PubMed: 25963372
DOI: 10.1038/nrg3931 -
Theoretical Population Biology Apr 2016With the great advances in ancient DNA extraction, genetic data are now obtained from geographically separated individuals from both present and past. However,...
With the great advances in ancient DNA extraction, genetic data are now obtained from geographically separated individuals from both present and past. However, population genetics theory about the joint effect of space and time has not been thoroughly studied. Based on the classical stepping-stone model, we develop the theory of Isolation by distance and time. We derive the correlation of allele frequencies between demes in the case where ancient samples are present, and investigate the impact of edge effects with forward-in-time simulations. We also derive results about coalescent times in circular and toroidal models. As one of the most common ways to investigate population structure is principal components analysis (PCA), we evaluate the impact of our theory on PCA plots. Our results demonstrate that time between samples is an important factor. Ancient samples tend to be drawn to the center of a PCA plot.
Topics: Gene Flow; Gene Frequency; Genetics, Population; Humans; Models, Genetic; Principal Component Analysis
PubMed: 26592162
DOI: 10.1016/j.tpb.2015.11.003 -
Genes Dec 2022Due to their continuing geographic isolation, the Amerindian populations of the Brazilian Amazon present a different genetic profile when compared to other continental...
Due to their continuing geographic isolation, the Amerindian populations of the Brazilian Amazon present a different genetic profile when compared to other continental populations. Few studies have investigated genetic variants present in these populations, especially in the context of next-generation sequencing. Knowledge of the molecular profile of a population is one of the bases for inferences about human evolutionary history, in addition, it has the ability to assist in the validation of molecular biomarkers of susceptibility to complex and rare diseases, and in the improvement of specific precision medicine protocols applied to these populations and to populations with high Amerindian ancestry, such as Brazilians. DNA polymerases play essential roles in DNA replication, repair, recombination, or damage repair, and their influence on various clinical phenotypes has been demonstrated in the specialized literature. Thus, the aim of this study is to characterize the molecular profile of , , , , and genes in Amerindian populations from the Brazilian Amazon, comparing these findings with genomic data from five continental populations described in the gnomAD database, and with data from the Brazilian population described in ABraOM. We performed the whole exome sequencing (WES) of 63 Indigenous individuals. Our study described for the first time the allele frequency of 45 variants already described in the other continental populations, but never before described in the investigated Amerindian populations. Our results also describe eight unique variants of the investigated Amerindians populations, with predictions of moderate, modifier and high clinical impact. Our findings demonstrate the unique genetic profile of the Indigenous population of the Brazilian Amazon, reinforcing the need for further studies on these populations, and may contribute to the creation of public policies that optimize not only the quality of life of this population, but also of the Brazilian population.
Topics: Humans; Quality of Life; Gene Frequency; DNA-Directed DNA Polymerase; Brazil; DNA-Binding Proteins
PubMed: 36672794
DOI: 10.3390/genes14010053 -
American Journal of Human Genetics Dec 1996
Topics: Forensic Medicine; Gene Frequency; Humans; Likelihood Functions; Models, Genetic; Repetitive Sequences, Nucleic Acid
PubMed: 8940288
DOI: No ID Found -
Hematology/oncology and Stem Cell... Sep 2016Hematologic myeloid neoplasms represent a heterogeneous group of disorders with defined clinical and pathologic characteristics. However, intensive investigation into... (Review)
Review
Hematologic myeloid neoplasms represent a heterogeneous group of disorders with defined clinical and pathologic characteristics. However, intensive investigation into the genetic abnormalities of these diseases has not only significantly advanced our understanding, but also revolutionized our diagnostic and prognostic capabilities. Moreover, more recent discovery on the impact of clonal burden has highlighted the critical and dynamic role of clonal evolution over time, which is integrally linked to a patient's clinical trajectory. This review will highlight the evidence supporting the incorporation of allelic burden of somatic mutations into clinical practice for the diagnosis and prognosis of myeloid neoplasms.
Topics: Gene Frequency; High-Throughput Nucleotide Sequencing; Humans; Mutation; Myeloproliferative Disorders; Phenotype; Treatment Outcome
PubMed: 27187622
DOI: 10.1016/j.hemonc.2016.04.003 -
Genetics Mar 2010Sewall Wright and R. A. Fisher often differed, including on the meaning of inbreeding and random gene frequency drift. Fisher regarded them as quite distinct processes,... (Review)
Review
Sewall Wright and R. A. Fisher often differed, including on the meaning of inbreeding and random gene frequency drift. Fisher regarded them as quite distinct processes, whereas Wright thought that because his inbreeding coefficient measured both they should be regarded as the same. Since the effective population numbers for inbreeding and random drift are different, this would argue for the Fisher view.
Topics: Gene Frequency; Genetic Drift; Genetics, Population; History, 20th Century; History, 21st Century; Inbreeding; Models, Genetic
PubMed: 20332416
DOI: 10.1534/genetics.109.110023 -
American Journal of Human Genetics Jul 2023Previous studies suggested that severe epilepsies, e.g., developmental and epileptic encephalopathies (DEEs), are mainly caused by ultra-rare de novo genetic variants....
Previous studies suggested that severe epilepsies, e.g., developmental and epileptic encephalopathies (DEEs), are mainly caused by ultra-rare de novo genetic variants. For milder disease, rare genetic variants could contribute to the phenotype. To determine the importance of rare variants for different epilepsy types, we analyzed a whole-exome sequencing cohort of 9,170 epilepsy-affected individuals and 8,436 control individuals. Here, we separately analyzed three different groups of epilepsies: severe DEEs, genetic generalized epilepsy (GGE), and non-acquired focal epilepsy (NAFE). We required qualifying rare variants (QRVs) to occur in control individuals with an allele count ≥ 1 and a minor allele frequency ≤ 1:1,000, to be predicted as deleterious (CADD ≥ 20), and to have an odds ratio in individuals with epilepsy ≥ 2. We identified genes enriched with QRVs primarily in NAFE (n = 72), followed by GGE (n = 32) and DEE (n = 21). This suggests that rare variants may play a more important role for causality of NAFE than for DEE. Moreover, we found that genes harboring QRVs, e.g., HSGP2, FLNA, or TNC, encode proteins that are involved in structuring the brain extracellular matrix. The present study confirms an involvement of rare variants for NAFE that occur also in the general population, while in DEE and GGE, the contribution of such variants appears more limited.
Topics: Humans; Epilepsy, Generalized; Phenotype; Alleles; Brain; Gene Frequency
PubMed: 37369202
DOI: 10.1016/j.ajhg.2023.06.004