-
Heredity Jan 2022Linkage disequilibrium (LD) is the non-random association of alleles at different loci. Squared LD coefficients r (for phased genotypes) and [Formula: see text] (for...
Linkage disequilibrium (LD) is the non-random association of alleles at different loci. Squared LD coefficients r (for phased genotypes) and [Formula: see text] (for unphased genotypes) will converge to constants that are determined by the sample size, the recombination frequency, the effective population size and the mating system. LD can therefore be used for gene mapping and the estimation of effective population size. However, current methods work only with diploids. To resolve this problem, we here extend the linkage disequilibrium measures to include polysomic inheritance. We derive the values of r and [Formula: see text] at equilibrium state for various mating systems and different ploidy levels. For unlinked loci, [Formula: see text] for monoecious and dioecious (with random pairing) mating systems or [Formula: see text] for dioecious mating systems (with lifetime pairing), where f is the number of females in a half-sib family and η is a constant related to the ploidy level. We simulate the application of estimating N using unphased genotypes. We find that estimating N in polyploids requires similar sample sizes and numbers of loci as in diploids, with the main source of bias due to using 0.5 as the recombination frequency.
Topics: Genetics, Population; Genotype; Linkage Disequilibrium; Models, Genetic; Population Density
PubMed: 34983965
DOI: 10.1038/s41437-021-00482-1 -
Genetics Apr 2022The statistical associations between mutations, collectively known as linkage disequilibrium, encode important information about the evolutionary forces acting within a...
The statistical associations between mutations, collectively known as linkage disequilibrium, encode important information about the evolutionary forces acting within a population. Yet in contrast to single-site analogues like the site frequency spectrum, our theoretical understanding of linkage disequilibrium remains limited. In particular, little is currently known about how mutations with different ages and fitness costs contribute to expected patterns of linkage disequilibrium, even in simple settings where recombination and genetic drift are the major evolutionary forces. Here, I introduce a forward-time framework for predicting linkage disequilibrium between pairs of neutral and deleterious mutations as a function of their present-day frequencies. I show that the dynamics of linkage disequilibrium become much simpler in the limit that mutations are rare, where they admit a simple heuristic picture based on the trajectories of the underlying lineages. I use this approach to derive analytical expressions for a family of frequency-weighted linkage disequilibrium statistics as a function of the recombination rate, the frequency scale, and the additive and epistatic fitness costs of the mutations. I find that the frequency scale can have a dramatic impact on the shapes of the resulting linkage disequilibrium curves, reflecting the broad range of time scales over which these correlations arise. I also show that the differences between neutral and deleterious linkage disequilibrium are not purely driven by differences in their mutation frequencies and can instead display qualitative features that are reminiscent of epistasis. I conclude by discussing the implications of these results for recent linkage disequilibrium measurements in bacteria. This forward-time approach may provide a useful framework for predicting linkage disequilibrium across a range of evolutionary scenarios.
Topics: Biological Evolution; Genetic Drift; Linkage Disequilibrium; Models, Genetic; Mutation; Mutation Rate; Selection, Genetic
PubMed: 35100407
DOI: 10.1093/genetics/iyac004 -
Bioinformatics (Oxford, England) Dec 2021A few algorithms have been developed for splitting the genome in nearly independent blocks of linkage disequilibrium. Due to the complexity of this problem, these...
MOTIVATION
A few algorithms have been developed for splitting the genome in nearly independent blocks of linkage disequilibrium. Due to the complexity of this problem, these algorithms rely on heuristics, which makes them suboptimal.
RESULTS
Here, we develop an optimal solution for this problem using dynamic programming.
AVAILABILITY
This is now implemented as function snp_ldsplit as part of R package bigsnpr.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Humans; Algorithms; Genome, Human; Linkage Disequilibrium; Software; Computational Biology
PubMed: 34260708
DOI: 10.1093/bioinformatics/btab519 -
Genetics Mar 2023Genetic sequences collected over time provide an exciting opportunity to study natural selection. In such studies, it is important to account for linkage disequilibrium...
Genetic sequences collected over time provide an exciting opportunity to study natural selection. In such studies, it is important to account for linkage disequilibrium to accurately measure selection and to distinguish between selection and other effects that can cause changes in allele frequencies, such as genetic hitchhiking or clonal interference. However, most high-throughput sequencing methods cannot directly measure linkage due to short-read lengths. Here we develop a simple method to estimate linkage disequilibrium from time-series allele frequencies. This reconstructed linkage information can then be combined with other inference methods to infer the fitness effects of individual mutations. Simulations show that our approach reliably outperforms inference that ignores linkage disequilibrium and, with sufficient sampling, performs similarly to inference using the true linkage information. We also introduce two regularization methods derived from random matrix theory that help to preserve its performance under limited sampling effects. Overall, our method enables the use of linkage-aware inference methods even for data sets where only allele frequency time series are available.
Topics: Linkage Disequilibrium; Gene Frequency; Selection, Genetic; Mutation; High-Throughput Nucleotide Sequencing
PubMed: 36610715
DOI: 10.1093/genetics/iyac189 -
Genome Biology and Evolution Nov 2022By revealing the influence of recombinational activity beyond what can be achieved with controlled crosses, measures of linkage disequilibrium (LD) in natural...
By revealing the influence of recombinational activity beyond what can be achieved with controlled crosses, measures of linkage disequilibrium (LD) in natural populations provide a powerful means of defining the recombinational landscape within which genes evolve. In one of the most comprehensive studies of this sort ever performed, involving whole-genome analyses on nearly 1,000 individuals of the cyclically parthenogenetic microcrustacean Daphnia pulex, the data suggest a relatively uniform pattern of recombination across the genome. Patterns of LD are quite consistent among populations; average rates of recombination are quite similar for all chromosomes; and although some chromosomal regions have elevated recombination rates, the degree of inflation is not large, and the overall spatial pattern of recombination is close to the random expectation. Contrary to expectations for models in which crossing-over is the primary mechanism of recombination, and consistent with data for other species, the distance-dependent pattern of LD indicates excessively high levels at both short and long distances and unexpectedly low levels of decay at long distances, suggesting significant roles for factors such as nonindependent mutation, population subdivision, and recombination mechanisms unassociated with crossing over. These observations raise issues regarding the classical LD equilibrium model widely applied in population genetics to infer recombination rates across various length scales on chromosomes.
Topics: Animals; Linkage Disequilibrium; Daphnia; Genetics, Population; Genome; Mutation
PubMed: 36170345
DOI: 10.1093/gbe/evac145 -
Genetics May 2022Archeogenetics has been revolutionary, revealing insights into demographic history and recent positive selection. However, most studies to date have ignored the...
Archeogenetics has been revolutionary, revealing insights into demographic history and recent positive selection. However, most studies to date have ignored the nonrandom association of genetic variants at different loci (i.e. linkage disequilibrium). This may be in part because basic properties of linkage disequilibrium in samples from different times are still not well understood. Here, we derive several results for summary statistics of haplotypic variation under a model with time-stratified sampling: (1) The correlation between the number of pairwise differences observed between time-staggered samples (πΔt) in models with and without strict population continuity; (2) The product of the linkage disequilibrium coefficient, D, between ancient and modern samples, which is a measure of haplotypic similarity between modern and ancient samples; and (3) The expected switch rate in the Li and Stephens haplotype copying model. The latter has implications for genotype imputation and phasing in ancient samples with modern reference panels. Overall, these results provide a characterization of how haplotype patterns are affected by sample age, recombination rates, and population sizes. We expect these results will help guide the interpretation and analysis of haplotype data from ancient and modern samples.
Topics: Archaeology; Genetics, Population; Genotype; Haplotypes; Humans; Linkage Disequilibrium; Population Density
PubMed: 35294015
DOI: 10.1093/genetics/iyac038 -
PloS One 2022Estimation of genetic diversity in rapeseed is important for sustainable breeding program to provide an option for the development of new breeding lines. The objective...
Estimation of genetic diversity in rapeseed is important for sustainable breeding program to provide an option for the development of new breeding lines. The objective of this study was to elucidate the patterns of genetic diversity within and among different structural groups, and measure the extent of linkage disequilibrium (LD) of 383 globally distributed rapeseed germplasm using 8,502 single nucleotide polymorphism (SNP) markers. We divided the germplasm collection into five subpopulations (P1 to P5) according to geographic and growth habit-related patterns. All subpopulations showed moderate genetic diversity (average H = 0.22 and I = 0.34). The pairwise Fst comparison revealed a great degree of divergence (Fst > 0.24) between most of the combinations. The rutabaga type showed highest divergence with spring and winter types. Higher divergence was also found between winter and spring types. Admixture model based structure analysis, principal component and neighbor-joining tree analysis placed all subpopulations into three distinct clusters. Admixed genotype constituted 29.24% of total genotypes, while remaining 70.76% belongs to identified clusters. Overall, mean linkage disequilibrium was 0.03 and it decayed to its half maximum within < 45 kb distance for whole genome. The LD decay was slower in C genome (< 93 kb); relative to the A genome (< 21 kb) which was confirmed by availability of larger haplotype blocks in C genome than A genome. The findings regarding LD pattern and population structure will help to utilize the collection as an important resource for association mapping efforts to identify genes useful in crop improvement as well as for selection of parents for hybrid breeding.
Topics: Linkage Disequilibrium
PubMed: 35231054
DOI: 10.1371/journal.pone.0250310 -
BMC Genomics Apr 2022The influence of linkage disequilibrium (LD), epistasis, and inbreeding on genotypic variance continues to be an important area of investigation in genetics and...
BACKGROUND
The influence of linkage disequilibrium (LD), epistasis, and inbreeding on genotypic variance continues to be an important area of investigation in genetics and evolution. Although the current knowledge about biological pathways and gene networks indicates that epistasis is important in determining quantitative traits, the empirical evidence for a range of species and traits is that the genotypic variance is most additive. This has been confirmed by some recent theoretical studies. However, because these investigations assumed linkage equilibrium, considered only additive effects, or used simplified assumptions for two- and higher-order epistatic effects, the objective of this investigation was to provide additional information about the impact of LD and epistasis on genetic variances in noninbred and inbred populations, using a simulated dataset.
RESULTS
In general, the most important component of the genotypic variance was additive variance. Because of positive LD values, after 10 generations of random crosses there was generally a decrease in all genetic variances and covariances, especially the nonepistatic variances. Thus, the epistatic variance/genotypic variance ratio is inversely proportional to the LD level. Increasing inbreeding increased the magnitude of the additive, additive x additive, additive x dominance, and dominance x additive variances, and decreased the dominance and dominance x dominance variances. Except for duplicate epistasis with 100% interacting genes, the epistatic variance/genotypic variance ratio was proportional to the inbreeding level. In general, the additive x additive variance was the most important component of the epistatic variance. Concerning the genetic covariances, in general, they showed lower magnitudes relative to the genetic variances and positive and negative signs. The epistatic variance/genotypic variance ratio was maximized under duplicate and dominant epistasis and minimized assuming recessive and complementary epistasis. Increasing the percentage of epistatic genes from 30 to 100% increased the epistatic variance/genotypic variance ratio by a rate of 1.3 to 12.6, especially in inbred populations. The epistatic variance/genotypic variance ratio was maximized in the noninbred and inbred populations with intermediate LD and an average allelic frequency of the dominant genes of 0.3 and in the noninbred and inbred populations with low LD and an average allelic frequency of 0.5.
CONCLUSIONS
Additive variance is in general the most important component of genotypic variance. LD and inbreeding have a significant effect on the magnitude of the genetic variances and covariances. In general, the additive x additive variance is the most important component of epistatic variance. The maximization of the epistatic variance/genotypic variance ratio depends on the LD level, degree of inbreeding, epistasis type, percentage of interacting genes, and average allelic frequency.
Topics: Epistasis, Genetic; Gene Frequency; Genetic Variation; Linkage Disequilibrium; Models, Genetic
PubMed: 35397494
DOI: 10.1186/s12864-022-08335-9 -
Molecular Ecology Resources Feb 2022In genomic-scale data sets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent...
In genomic-scale data sets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df') compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here, we measured pseudoreplication (quantified by the ratio df'/df) for a common metric of genetic differentiation (F ) and a common measure of linkage disequilibrium between pairs of loci (r ). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df' and df'/df by measuring the rate of decline in the variance of mean F and mean r as more loci were used. For both indices, df' increases with N and genome size, as expected. However, even for large N and large genomes, df' for mean r plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for F , but df'/df ≤0.01 can occur in data sets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var (F ), producing very conservative confidence intervals. Predicting df' based on our modelling results as a function of N , L, S, and genome size provides a robust way to quantify precision associated with genomic-scale data sets.
Topics: Genome Size; Genomics; Linkage Disequilibrium; Models, Genetic; Pedigree; Population Density
PubMed: 34351073
DOI: 10.1111/1755-0998.13482 -
Genetics Jul 2022Selected mutations interfere and interact with evolutionary processes at nearby loci, distorting allele frequency trajectories and creating correlations between pairs of...
Selected mutations interfere and interact with evolutionary processes at nearby loci, distorting allele frequency trajectories and creating correlations between pairs of mutations. Recent studies have used patterns of linkage disequilibrium between selected variants to test for selective interference and epistatic interactions, with some disagreement over interpreting observations from data. Interpretation is hindered by a lack of analytic or even numerical expectations for patterns of variation between pairs of loci under the combined effects of selection, dominance, epistasis, and demography. Here, I develop a numerical approach to compute the expected two-locus sampling distribution under diploid selection with arbitrary epistasis and dominance, recombination, and variable population size. I use this to explore how epistasis and dominance affect expected signed linkage disequilibrium, including for nonsteady-state demography relevant to human populations. Using whole-genome sequencing data from humans, I explore genome-wide patterns of linkage disequilibrium within protein-coding genes. I show that positive linkage disequilibrium between missense mutations within genes is driven by strong positive allele-frequency correlations between mutations that fall within the same annotated conserved domain, pointing to compensatory mutations or antagonistic epistasis as the prevailing mode of interaction within conserved genic elements. Linkage disequilibrium between missense mutations is reduced outside of conserved domains, as expected under Hill-Robertson interference. This variation in both mutational fitness effects and selective interactions within protein-coding genes calls for more refined inferences of the joint distribution of fitness and interactive effects, and the methods presented here should prove useful in that pursuit.
Topics: Biological Evolution; Epistasis, Genetic; Gene Frequency; Humans; Linkage Disequilibrium; Models, Genetic; Selection, Genetic
PubMed: 35736370
DOI: 10.1093/genetics/iyac097