-
Bioinformatics (Oxford, England) Jan 2023Genotyping by sequencing is a powerful tool for investigating genetic variation in plants, but many economically important plants are allopolyploids, where homoeologous...
MOTIVATION
Genotyping by sequencing is a powerful tool for investigating genetic variation in plants, but many economically important plants are allopolyploids, where homoeologous similarity obscures the subgenomic origin of reads and confounds allelic and homoeologous SNPs. Recent polyploid genotyping methods use allelic frequencies, rate of heterozygosity, parental cross or other information to resolve read assignment, but good subgenomic references offer the most direct information. The typical strategy aligns reads to the joint reference, performs diploid genotyping within each subgenome, and filters the results, but persistent read misassignment results in an excess of false heterozygous calls.
RESULTS
We introduce the Comprehensive Allopolyploid Genotyper (CAPG), which formulates an explicit likelihood to weight read alignments against both subgenomic references and genotype individual allopolyploids from whole-genome resequencing data. We demonstrate CAPG in allotetraploids, where it performs better than Genome Analysis Toolkit's HaplotypeCaller applied to reads aligned to the combined subgenomic references.
AVAILABILITY AND IMPLEMENTATION
Code and tutorials are available at https://github.com/Kkulkarni1/CAPG.git.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Genotype; Genotyping Techniques; Sequence Analysis, DNA; Heterozygote; Alleles; Software; High-Throughput Nucleotide Sequencing
PubMed: 36367243
DOI: 10.1093/bioinformatics/btac729 -
Human Genomics Feb 2022CYP2D6 is a key drug-metabolizing enzyme implicated in the biotransformation of approximately 25% of currently prescribed drugs. Interindividual and interethnic... (Review)
Review
CYP2D6 is a key drug-metabolizing enzyme implicated in the biotransformation of approximately 25% of currently prescribed drugs. Interindividual and interethnic differences in CYP2D6 enzymatic activity, and hence variability in substrate drug efficacy and safety, are attributed to a highly polymorphic corresponding gene. This study aims at reviewing the frequencies of the most clinically relevant CYP2D6 alleles in the Arabs countries. Articles published before May 2021 that reported CYP2D6 genotype and allelic frequencies in the Arab populations of the Middle East and North Africa (MENA) region were retrieved from PubMed and Google Scholar databases. This review included 15 original articles encompassing 2737 individuals from 11 countries of the 22 members of the League of Arab States. Active CYP2D6 gene duplications reached the highest frequencies of 28.3% and 10.4% in Algeria and Saudi Arabia, respectively, and lowest in Egypt (2.41%) and Palestine (4.9%). Frequencies of the loss-of-function allele CYP2D6*4 ranged from 3.5% in Saudi Arabia to 18.8% in Egypt. The disparity in frequencies of the reduced-function CYP2D6*10 allele was perceptible, with the highest frequency reported in Jordan (14.8%) and the lowest in neighboring Palestine (2%), and in Algeria (0%). The reduced-function allele CYP2D6*41 was more prevalent in the Arabian Peninsula countries; Saudi Arabia (18.4%) and the United Arab Emirates (15.2%), in comparison with the Northern Arab-Levantine Syria (9.7%) and Algeria (8.3%). Our study demonstrates heterogeneity of CYP2D6 alleles among Arab populations. The incongruities of the frequencies of alleles in neighboring countries with similar demographic composition emphasize the necessity for harmonizing criteria of genotype assignment and conducting comprehensive studies on larger MENA Arab populations to determine their CYP2D6 allelic makeup and improve therapeutic outcomes of CYP2D6- metabolized drugs.
Topics: Alleles; Arabs; Cytochrome P-450 CYP2D6; Gene Frequency; Humans; Polymorphism, Genetic
PubMed: 35123571
DOI: 10.1186/s40246-022-00378-z -
Genome Biology and Evolution Aug 2020Allele-specific expression is when one allele of a gene shows higher levels of expression compared with the other allele, in a diploid organism. Recent work has...
Allele-specific expression is when one allele of a gene shows higher levels of expression compared with the other allele, in a diploid organism. Recent work has identified allele-specific expression in a number of Hymenopteran species. However, the molecular mechanism which drives this allelic expression bias remains unknown. In mammals, DNA methylation is often associated with genes which show allele-specific expression. DNA methylation systems have been described in species of Hymenoptera, providing a candidate mechanism. Using previously generated RNA-Seq and whole-genome bisulfite sequencing from reproductive and sterile bumblebee (Bombus terrestris) workers, we have identified genome-wide allele-specific expression and allele-specific DNA methylation. The majority of genes displaying allele-specific expression are common between reproductive and sterile workers and the proportion of allele-specific expression bias generally varies between genetically distinct colonies. We have also identified genome-wide allele-specific DNA methylation patterns in both reproductive and sterile workers, with reproductive workers showing significantly more genes with allele-specific methylation. Finally, there is no significant overlap between genes showing allele-specific expression and allele-specific methylation. These results indicate that cis-acting DNA methylation does not directly drive genome-wide allele-specific expression in this species.
Topics: Alleles; Animals; Bees; DNA Methylation; Female; Gene Expression; Genome, Insect
PubMed: 32597949
DOI: 10.1093/gbe/evaa132 -
Epigenetics Mar 2020We previously identified sequence-dependent allele-specific methylation (sd-ASM) in adult human peripheral blood leukocytes, in which ASM occurs in cis depending on...
We previously identified sequence-dependent allele-specific methylation (sd-ASM) in adult human peripheral blood leukocytes, in which ASM occurs in cis depending on adjacent polymorphic sequences. A number of groups have identified sd-ASM sites in the human and mouse genomes, illustrating the prevalence of sd-ASM in mammalian genomes. In addition, sd-ASM can lead to sequence-dependent allele-specific expression of neighbouring genes. Imprinted genes also often exhibit parent-of-origin-dependent allele-specific methylation (pd-ASM), which causes parent-of-origin-dependent allele-specific expression. However, whether most of the already known sd-ASM and pd-ASM sites are methylated or hydroxymethylated remains unclear due to technical restrictions. Accordingly, a novel method that enables examination of allelic methylation and hydroxymethylation status and also overcomes the drawbacks of conventional methods is needed. Such a method could also be used to elucidate the mechanisms underlying polymorphism-associated inter-individual differences in disease susceptibility and the mechanism of genomic imprinting. Here, we developed a simple method to determine allelic hydroxymethylation status and identified novel sequence- and parent-of-origin-dependent allele-specific hydroxymethylation sites. Correlation analyses of TF binding sequences and methylation or hydroxymethylation between three mouse strains revealed the involvement of in strain-specific methylation and hydroxymethylation in exon 7 of .
Topics: Alleles; Animals; DNA Methylation; Epigenomics; Mice; Mice, Inbred C57BL; Protein Binding; Sequence Analysis, DNA; Transcription Factors
PubMed: 31533538
DOI: 10.1080/15592294.2019.1664228 -
Scientific Reports May 2022The emergence of genome-wide association studies (GWAS) has led to the creation of large repositories of human genetic variation, creating enormous opportunities for... (Meta-Analysis)
Meta-Analysis
The emergence of genome-wide association studies (GWAS) has led to the creation of large repositories of human genetic variation, creating enormous opportunities for genetic research and worldwide collaboration. Methods that are based on GWAS summary statistics seek to leverage such records, overcoming barriers that often exist in individual-level data access while also offering significant computational savings. Such summary-statistics-based applications include GWAS meta-analysis, with and without sample overlap, and case-case GWAS. We compare performance of leading methods for summary-statistics-based genomic analysis and also introduce a novel framework that can unify usual summary-statistics-based implementations via the reconstruction of allelic and genotypic frequencies and counts (ReACt). First, we evaluate ASSET, METAL, and ReACt using both synthetic and real data for GWAS meta-analysis (with and without sample overlap) and find that, while all three methods are comparable in terms of power and error control, ReACt and METAL are faster than ASSET by a factor of at least hundred. We then proceed to evaluate performance of ReACt vs an existing method for case-case GWAS and show comparable performance, with ReACt requiring minimal underlying assumptions and being more user-friendly. Finally, ReACt allows us to evaluate, for the first time, an implementation for calculating polygenic risk score (PRS) for groups of cases and controls based on summary statistics. Our work demonstrates the power of GWAS summary-statistics-based methodologies and the proposed novel method provides a unifying framework and allows further extension of possibilities for researchers seeking to understand the genetics of complex disease.
Topics: Alleles; Genome-Wide Association Study; Genotype; Humans; Phenotype; Polymorphism, Single Nucleotide
PubMed: 35581276
DOI: 10.1038/s41598-022-12185-6 -
Proceedings of the National Academy of... Sep 2022Selection accumulates information in the genome-it guides stochastically evolving populations toward states (genotype frequencies) that would be unlikely under...
Selection accumulates information in the genome-it guides stochastically evolving populations toward states (genotype frequencies) that would be unlikely under neutrality. This can be quantified as the Kullback-Leibler (KL) divergence between the actual distribution of genotype frequencies and the corresponding neutral distribution. First, we show that this population-level information sets an upper bound on the information at the level of genotype and phenotype, limiting how precisely they can be specified by selection. Next, we study how the accumulation and maintenance of information is limited by the cost of selection, measured as the genetic load or the relative fitness variance, both of which we connect to the control-theoretic KL cost of control. The information accumulation rate is upper bounded by the population size times the cost of selection. This bound is very general, and applies across models (Wright-Fisher, Moran, diffusion) and to arbitrary forms of selection, mutation, and recombination. Finally, the cost of maintaining information depends on how it is encoded: Specifying a single allele out of two is expensive, but one bit encoded among many weakly specified loci (as in a polygenic trait) is cheap.
Topics: Alleles; Biological Evolution; Gene Frequency; Genetics, Population; Models, Genetic; Selection, Genetic
PubMed: 36037343
DOI: 10.1073/pnas.2123152119 -
Current Opinion in Neurobiology Dec 2019Typically, it is assumed that the maternal and paternal alleles for most genes are equally expressed. Known exceptions include canonical imprinted genes, random... (Review)
Review
Typically, it is assumed that the maternal and paternal alleles for most genes are equally expressed. Known exceptions include canonical imprinted genes, random X-chromosome inactivation, olfactory receptors and clustered protocadherins. Here, we highlight recent studies showing that allele-specific expression is frequent in the genome and involves subtypes of epigenetic allelic effects that differ in terms of heritability, clonality and stability over time. Different forms of epigenetic allele regulation could have different roles in brain development, function, and disease. An emerging area involves understanding allelic effects in a cell-type and developmental stage-specific manner and determining how these effects influence the impact of genetic variants and mutations on the brain. A deeper understanding of epigenetics at the allele and cellular level in the brain could help clarify the mechanisms underlying phenotypic variance.
Topics: Alleles; Brain; Epigenesis, Genetic; Genomic Imprinting; X Chromosome Inactivation
PubMed: 31153086
DOI: 10.1016/j.conb.2019.04.012 -
Mammalian Genome : Official Journal of... Aug 2017The widespread use of CRISPR/Cas and other targeted endonuclease technologies in many species has led to an explosion in the generation of new mutations and alleles. The... (Review)
Review
The widespread use of CRISPR/Cas and other targeted endonuclease technologies in many species has led to an explosion in the generation of new mutations and alleles. The ability to generate many different mutations from the same target sequence either by homology-directed repair with a donor sequence or non-homologous end joining-induced insertions and deletions necessitates a means for representing these mutations in literature and databases. Standardized nomenclature can be used to generate unambiguous, concise, and specific symbols to represent mutations and alleles. The research communities of a variety of species using CRISPR/Cas and other endonuclease-mediated mutation technologies have developed different approaches to naming and identifying such alleles and mutations. While some organism-specific research communities have developed allele nomenclature that incorporates the method of generation within the official allele or mutant symbol, others use metadata tags that include method of generation or mutagen. Organism-specific research community databases together with organism-specific nomenclature committees are leading the way in providing standardized nomenclature and metadata to facilitate the integration of data from alleles and mutations generated using CRISPR/Cas and other targeted endonucleases.
Topics: Alleles; Animals; CRISPR-Cas Systems; Clustered Regularly Interspaced Short Palindromic Repeats; Endonucleases; Gene Editing; Gene Targeting; Humans; Mutation; Terminology as Topic
PubMed: 28589392
DOI: 10.1007/s00335-017-9698-3 -
Genome Biology Sep 2021With the recent increase in RNA sequencing efforts using large cohorts of individuals, surveying allele-specific gene expression is becoming increasingly frequent. Here,...
With the recent increase in RNA sequencing efforts using large cohorts of individuals, surveying allele-specific gene expression is becoming increasingly frequent. Here, we report that, despite not containing explicit variant information, a list of genes known to be allele-specific in an individual is enough to recover key variants and link the individuals back to their genotypes and phenotypes. This creates a privacy conundrum.
Topics: Alleles; Genotype; Humans; Phenotype; Polymorphism, Single Nucleotide
PubMed: 34493313
DOI: 10.1186/s13059-021-02477-x -
DNA Research : An International Journal... Feb 2019The current RNA-Seq method analyses fragments of mRNAs, from which it is occasionally difficult to reconstruct the entire transcript structure. Here, we performed and...
The current RNA-Seq method analyses fragments of mRNAs, from which it is occasionally difficult to reconstruct the entire transcript structure. Here, we performed and evaluated the recent procedure for full-length cDNA sequencing using the Nanopore sequencer MinION. We applied MinION RNA-Seq for various applications, which would not always be easy using the usual RNA-Seq by Illumina. First, we examined and found that even though the sequencing accuracy was still limited to 92.3%, practically useful RNA-Seq analysis is possible. Particularly, taking advantage of the long-read nature of MinION, we demonstrate the identification of splicing patterns and their combinations as a form of full-length cDNAs without losing precise information concerning their expression levels. Transcripts of fusion genes in cancer cells can also be identified and characterized. Furthermore, the full-length cDNA information can be used for phasing of the SNPs detected by WES on the transcripts, providing essential information to identify allele-specific transcriptional events. We constructed a catalogue of full-length cDNAs in seven major organs for two particular individuals and identified allele-specific transcription and splicing. Finally, we demonstrate that single-cell sequencing is also possible. RNA-Seq on the MinION platform should provide a novel approach that is complementary to the current RNA-Seq.
Topics: Alleles; DNA, Complementary; Gene Expression Profiling; High-Throughput Nucleotide Sequencing; Humans; Polymorphism, Single Nucleotide; RNA Splicing; Sequence Analysis, RNA
PubMed: 30462165
DOI: 10.1093/dnares/dsy038