-
Genes Jan 2023The regulatory elements in proximal and distal regions of genes are involved in the regulation of gene expression. Risk alleles in intronic and intergenic regions may...
The regulatory elements in proximal and distal regions of genes are involved in the regulation of gene expression. Risk alleles in intronic and intergenic regions may alter gene expression by modifying the binding affinity and stability of diverse DNA-binding proteins implicated in gene expression regulation. By focusing on the local ancestral structure of coding and regulatory regions using the paired whole-genome sequence and tissue-wide transcriptome datasets from the Genotype-Tissue Expression project, we investigated the impact of genetic variants, in aggregate, on tissue-specific gene expression regulation. Local ancestral origins of the coding region, immediate and distant upstream regions, and distal regulatory region were determined using RFMix with the reference panel from the 1000 Genomes Project. For each tissue, inter-individual variation of gene expression levels explained by concordant or discordant local ancestry between coding and regulatory regions was estimated. Compared to European, African descent showed more frequent change in local ancestral structure, with shorter haplotype blocks. The expression level of the Adenosine Deaminase Like ( gene was significantly associated with admixed ancestral structure in the regulatory region across multiple tissue types. Further validations are required to understand the impact of the local ancestral structure of regulatory regions on gene expression regulation in humans and other species.
Topics: Humans; Alleles; Black People; Gene Expression Regulation; Haplotypes; White People
PubMed: 36672888
DOI: 10.3390/genes14010147 -
Genetics Jul 2014A novel haplotype association method is presented, and its power is demonstrated. Relying on a statistical model for linkage disequilibrium (LD), the method first infers...
A novel haplotype association method is presented, and its power is demonstrated. Relying on a statistical model for linkage disequilibrium (LD), the method first infers ancestral haplotypes and their loadings at each marker for each individual. The loadings are then used to quantify local haplotype sharing between individuals at each marker. A statistical model was developed to link the local haplotype sharing and phenotypes to test for association. We devised a novel method to fit the LD model, reducing the complexity from putatively quadratic to linear (in the number of ancestral haplotypes). Therefore, the LD model can be fitted to all study samples simultaneously, and, consequently, our method is applicable to big data sets. Compared to existing haplotype association methods, our method integrated out phase uncertainty, avoided arbitrariness in specifying haplotypes, and had the same number of tests as the single-SNP analysis. We applied our method to data from the Wellcome Trust Case Control Consortium and discovered eight novel associations between seven gene regions and five disease phenotypes. Among these, GRIK4, which encodes a protein that belongs to the glutamate-gated ionic channel family, is strongly associated with both coronary artery disease and rheumatoid arthritis. A software package implementing methods described in this article is freely available at http://www.haplotype.org.
Topics: Algorithms; Alleles; Bayes Theorem; Case-Control Studies; Computer Simulation; Databases, Genetic; Genetic Association Studies; Genetic Predisposition to Disease; Haplotypes; Humans; Linkage Disequilibrium; Models, Genetic; Phenotype; Polymorphism, Single Nucleotide
PubMed: 24812308
DOI: 10.1534/genetics.114.164814 -
Bioinformatics (Oxford, England) May 2021Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of...
MOTIVATION
Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference.
RESULTS
We introduce SPEARS, a pipeline for the simulation-based appraisal of genome-wide haplotype maps constructed from sparse genotype data. Using a specified pedigree, the pipeline generates virtual genotypes (known data) with genotyping errors and missing data structure. It then proceeds to mimic analysis in practice, capturing sources of error due to genotyping, imputation and haplotype inference. Standard metrics allow researchers to assess different population designs and which features of haplotype structure or regions of the genome are sufficiently accurate for analysis. Haplotype maps for 1000 outcross progeny from a multi-parent population of maize are used to demonstrate SPEARS.
AVAILABILITYAND IMPLEMENTATION
SPEARS, the protocol and suite of scripts, are publicly available under an MIT license at GitHub (https://github.com/maizeatlas/spears).
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Computer Simulation; Genome; Genotype; Haplotypes; Humans; Polymorphism, Single Nucleotide; Software
PubMed: 32840564
DOI: 10.1093/bioinformatics/btaa749 -
Nucleic Acids Research Nov 2022Profiling gametes of an individual enables the construction of personalised haplotypes and meiotic crossover landscapes, now achievable at larger scale than ever through...
Profiling gametes of an individual enables the construction of personalised haplotypes and meiotic crossover landscapes, now achievable at larger scale than ever through the availability of high-throughput single-cell sequencing technologies. However, high-throughput single-gamete data commonly have low depth of coverage per gamete, which challenges existing gamete-based haplotype phasing methods. In addition, haplotyping a large number of single gametes from high-throughput single-cell DNA sequencing data and constructing meiotic crossover profiles using existing methods requires intensive processing. Here, we introduce efficient software tools for the essential tasks of generating personalised haplotypes and calling crossovers in gametes from single-gamete DNA sequencing data (sgcocaller), and constructing, visualising, and comparing individualised crossover landscapes from single gametes (comapr). With additional data pre-possessing, the tools can also be applied to bulk-sequenced samples. We demonstrate that sgcocaller is able to generate impeccable phasing results for high-coverage datasets, on which it is more accurate and stable than existing methods, and also performs well on low-coverage single-gamete sequencing datasets for which current methods fail. Our tools achieve highly accurate results with user-friendly installation, comprehensive documentation, efficient computation times and minimal memory usage.
Topics: Algorithms; Germ Cells; Haplotypes; High-Throughput Nucleotide Sequencing; Polymorphism, Single Nucleotide; Sequence Analysis, DNA; Single-Cell Gene Expression Analysis; Software; Crossing Over, Genetic
PubMed: 36107768
DOI: 10.1093/nar/gkac764 -
International Journal of Molecular... Feb 2024In direct seeding, hypoxia is a major stress faced by rice plants. Therefore, dissecting the response mechanism of rice to hypoxia stress and the molecular regulatory... (Review)
Review
In direct seeding, hypoxia is a major stress faced by rice plants. Therefore, dissecting the response mechanism of rice to hypoxia stress and the molecular regulatory network is critical to the development of hypoxia-tolerant rice varieties and direct seeding of rice. This review summarizes the morphological, physiological, and ecological changes in rice under hypoxia stress, the discovery of hypoxia-tolerant and germination-related genes/QTLs, and the latest research on candidate genes, and explores the linkage of hypoxia tolerance genes and their distribution in indica and japonica rice through population variance analysis and haplotype network analysis. Among the candidate genes, is a typical gene located on the MAPK cascade reaction for indica-japonica divergence; MHZ6 is involved in both the MAPK signaling and phytohormone transduction pathway. has three major haplotypes and one rare haplotype, with Hap3 being dominated by indica rice varieties, and promotes internode elongation in deep-water rice by activating the gene. and Adh1 have similar indica-japonica varietal differentiation, and are mainly present in indica varieties. There are three high-frequency haplotypes of , namely Hap1 (n = 1109), Hap2 (n = 1349), and Hap3 (n = 217); Hap2 is more frequent in japonica, and the genetic background of was derived from the japonica rice subpopulation. Further artificial selection, natural domestication, and other means to identify more resistance mechanisms of this gene may facilitate future research to breed superior rice cultivars. Finally, this study discusses the application of rice hypoxia-tolerant germplasm in future breeding research.
Topics: Oryza; Plant Breeding; Quantitative Trait Loci; Haplotypes; Hypoxia
PubMed: 38396854
DOI: 10.3390/ijms25042177 -
Bioinformatics (Oxford, England) Sep 2014Accurate haplotyping-determining from which parent particular portions of the genome are inherited-is still mostly an unresolved problem in genomics. This problem has...
MOTIVATION
Accurate haplotyping-determining from which parent particular portions of the genome are inherited-is still mostly an unresolved problem in genomics. This problem has only recently started to become tractable, thanks to the development of new long read sequencing technologies. Here, we introduce ProbHap, a haplotyping algorithm targeted at such technologies. The main algorithmic idea of ProbHap is a new dynamic programming algorithm that exactly optimizes a likelihood function specified by a probabilistic graphical model and which generalizes a popular objective called the minimum error correction. In addition to being accurate, ProbHap also provides confidence scores at phased positions.
RESULTS
On a standard benchmark dataset, ProbHap makes 11% fewer errors than current state-of-the-art methods. This accuracy can be further increased by excluding low-confidence positions, at the cost of a small drop in haplotype completeness.
AVAILABILITY
Our source code is freely available at: https://github.com/kuleshov/ProbHap.
Topics: Algorithms; Genome, Human; Genomics; Haplotypes; Humans; Likelihood Functions; Models, Statistical; Sequence Analysis, DNA
PubMed: 25161223
DOI: 10.1093/bioinformatics/btu484 -
Nucleic Acids Research Jun 2022Single-cell whole-genome haplotyping allows simultaneous detection of haplotypes associated with monogenic diseases, chromosome copy-numbering and subsequently, has...
Single-cell whole-genome haplotyping allows simultaneous detection of haplotypes associated with monogenic diseases, chromosome copy-numbering and subsequently, has revealed mosaicism in embryos and embryonic stem cells. Methods, such as karyomapping and haplarithmisis, were deployed as a generic and genome-wide approach for preimplantation genetic testing (PGT) and are replacing traditional PGT methods. While current methods primarily rely on single-nucleotide polymorphism (SNP) array, we envision sequencing-based methods to become more accessible and cost-efficient. Here, we developed a novel sequencing-based methodology to haplotype and copy-number profile single cells. Following DNA amplification, genomic size and complexity is reduced through restriction enzyme digestion and DNA is genotyped through sequencing. This single-cell genotyping-by-sequencing (scGBS) is the input for haplarithmisis, an algorithm we previously developed for SNP array-based single-cell haplotyping. We established technical parameters and developed an analysis pipeline enabling accurate concurrent haplotyping and copy-number profiling of single cells. We demonstrate its value in human blastomere and trophectoderm samples as application for PGT for monogenic disorders. Furthermore, we demonstrate the method to work in other species through analyzing blastomeres of bovine embryos. Our scGBS method opens up the path for single-cell haplotyping of any species with diploid genomes and could make its way into the clinic as a PGT application.
Topics: Animals; Cattle; Chromosome Aberrations; Female; Genetic Testing; Genotype; Haplotypes; Humans; Pregnancy; Preimplantation Diagnosis
PubMed: 35212381
DOI: 10.1093/nar/gkac134 -
Genetics May 2022Archeogenetics has been revolutionary, revealing insights into demographic history and recent positive selection. However, most studies to date have ignored the...
Archeogenetics has been revolutionary, revealing insights into demographic history and recent positive selection. However, most studies to date have ignored the nonrandom association of genetic variants at different loci (i.e. linkage disequilibrium). This may be in part because basic properties of linkage disequilibrium in samples from different times are still not well understood. Here, we derive several results for summary statistics of haplotypic variation under a model with time-stratified sampling: (1) The correlation between the number of pairwise differences observed between time-staggered samples (πΔt) in models with and without strict population continuity; (2) The product of the linkage disequilibrium coefficient, D, between ancient and modern samples, which is a measure of haplotypic similarity between modern and ancient samples; and (3) The expected switch rate in the Li and Stephens haplotype copying model. The latter has implications for genotype imputation and phasing in ancient samples with modern reference panels. Overall, these results provide a characterization of how haplotype patterns are affected by sample age, recombination rates, and population sizes. We expect these results will help guide the interpretation and analysis of haplotype data from ancient and modern samples.
Topics: Archaeology; Genetics, Population; Genotype; Haplotypes; Humans; Linkage Disequilibrium; Population Density
PubMed: 35294015
DOI: 10.1093/genetics/iyac038 -
Genes May 2022Signatures of positive selection in the genome are a characteristic mark of adaptation that can reveal an ongoing, recent, or ancient response to environmental change... (Review)
Review
Signatures of positive selection in the genome are a characteristic mark of adaptation that can reveal an ongoing, recent, or ancient response to environmental change throughout the evolution of a population. New sources of food, climate conditions, and exposure to pathogens are only some of the possible sources of selective pressure, and the rise of advantageous genetic variants is a crucial determinant of survival and reproduction. In this context, the ability to detect these signatures of selection may pinpoint genetic variants that are responsible for a significant change in gene regulation, gene expression, or protein synthesis, structure, and function. This review focuses on statistical methods that take advantage of linkage disequilibrium and haplotype determination to reveal signatures of positive selection in whole-genome sequencing data, showing that they emerge from different descriptions of the same underlying event. Moreover, considerations are provided around the application of these statistics to different species, their suitability for ancient DNA, and the usefulness of discovering variants under selection for biomedicine and public health in an evolutionary medicine framework.
Topics: Genome; Haplotypes; Linkage Disequilibrium; Selection, Genetic; Whole Genome Sequencing
PubMed: 35627311
DOI: 10.3390/genes13050926 -
Briefings in Functional Genomics Mar 2020Genomic analysis of individuals or organisms is predicated on the availability of high-quality reference and genotype information. With the rapidly dropping costs of...
Genomic analysis of individuals or organisms is predicated on the availability of high-quality reference and genotype information. With the rapidly dropping costs of high-throughput DNA sequencing, this is becoming readily available for diverse organisms and for increasingly large populations of individuals. Despite these advances, there are still aspects of genome sequencing that remain challenging for existing sequencing methods. This includes the generation of long-range contiguity during genome assembly, identification of structural variants in both germline and somatic tissues, the phasing of haplotypes in diploid organisms and the resolution of genome sequence for organisms derived from complex samples. These types of information are valuable for understanding the role of genome sequence and genetic variation on genome function, and numerous approaches have been developed to address them. Recently, chromosome conformation capture (3C) experiments, such as the Hi-C assay, have emerged as powerful tools to aid in these challenges for genome reconstruction. We will review the current use of Hi-C as a tool for aiding in genome sequencing, addressing the applications, strengths, limitations and potential future directions for the use of 3C data in genome analysis. We argue that unique features of Hi-C experiments make this data type a powerful tool to address challenges in genome sequencing, and that future integration of Hi-C data with alternative sequencing assays will facilitate the continuing revolution in genomic analysis and genome sequencing.
Topics: Animals; Chromosome Mapping; Chromosomes; Genomics; Haplotypes; Humans; Metagenomics
PubMed: 31875884
DOI: 10.1093/bfgp/elz026