-
Nature Methods Feb 2021Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous...
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
Topics: Algorithms; Genome; Haplotypes; Sequence Analysis, DNA
PubMed: 33526886
DOI: 10.1038/s41592-020-01056-5 -
Human Genetics May 2017The male-specific part of the human Y chromosome is widely used in forensic DNA analysis, particularly in cases where standard autosomal DNA profiling is not... (Review)
Review
The male-specific part of the human Y chromosome is widely used in forensic DNA analysis, particularly in cases where standard autosomal DNA profiling is not informative. A Y-chromosomal gene fragment is applied for inferring the biological sex of a crime scene trace donor. Haplotypes composed of Y-chromosomal short tandem repeat polymorphisms (Y-STRs) are used to characterise paternal lineages of unknown male trace donors, especially suitable when males and females have contributed to the same trace, such as in sexual assault cases. Y-STR haplotyping applied in crime scene investigation can (i) exclude male suspects from involvement in crime, (ii) identify the paternal lineage of male perpetrators, (iii) highlight multiple male contributors to a trace, and (iv) provide investigative leads for finding unknown male perpetrators. Y-STR haplotype analysis is employed in paternity disputes of male offspring and other types of paternal kinship testing, including historical cases, as well as in special cases of missing person and disaster victim identification involving men. Y-chromosome polymorphisms are applied for inferring the paternal bio-geographic ancestry of unknown trace donors or missing persons, in cases where autosomal DNA profiling is uninformative. In this overview, all different forensic applications of Y-chromosome DNA are described. To illustrate the necessity of forensic Y-chromosome analysis, the investigation of a prominent murder case is described, which initiated two changes in national forensic DNA legislation both covering Y-chromosome use, and was finally solved via an innovative Y-STR dragnet involving thousands of volunteers after 14 years. Finally, expectations for the future of forensic Y-chromosome DNA analysis are discussed.
Topics: Chromosomes, Human, Y; DNA Fingerprinting; Forensic Genetics; Genotype; Haplotypes; Humans; Male; Microsatellite Repeats; Phylogeography; Polymorphism, Single Nucleotide
PubMed: 28315050
DOI: 10.1007/s00439-017-1776-9 -
Nature Biotechnology Mar 2023Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to...
Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor heterogeneity in a single assay. Here we present a computational method, Numbat, that integrates haplotype information obtained from population-based phasing with allele and expression signals to enhance detection of copy number variations from scRNA-seq. Numbat exploits the evolutionary relationships between subclones to iteratively infer single-cell copy number profiles and tumor clonal phylogeny. Analysis of 22 tumor samples, including multiple myeloma, gastric, breast and thyroid cancers, shows that Numbat can reconstruct the tumor copy number profile and precisely identify malignant cells in the tumor microenvironment. We identify genetic subpopulations with transcriptional signatures relevant to tumor progression and therapy resistance. Numbat requires neither sample-matched DNA data nor a priori genotyping, and is applicable to a wide range of experimental settings and cancer types.
Topics: Humans; Transcriptome; DNA Copy Number Variations; Haplotypes; Multiple Myeloma; Phylogeny; Single-Cell Analysis; Tumor Microenvironment
PubMed: 36163550
DOI: 10.1038/s41587-022-01468-y -
Nature Genetics Apr 2017Adjacent CpG sites in mammalian genomes can be co-methylated owing to the processivity of methyltransferases or demethylases, yet discordant methylation patterns have...
Adjacent CpG sites in mammalian genomes can be co-methylated owing to the processivity of methyltransferases or demethylases, yet discordant methylation patterns have also been observed, which are related to stochastic or uncoordinated molecular processes. We focused on a systematic search and investigation of regions in the full human genome that show highly coordinated methylation. We defined 147,888 blocks of tightly coupled CpG sites, called methylation haplotype blocks, after analysis of 61 whole-genome bisulfite sequencing data sets and validation with 101 reduced-representation bisulfite sequencing data sets and 637 methylation array data sets. Using a metric called methylation haplotype load, we performed tissue-specific methylation analysis at the block level. Subsets of informative blocks were further identified for deconvolution of heterogeneous samples. Finally, using methylation haplotypes we demonstrated quantitative estimation of tumor load and tissue-of-origin mapping in the circulating cell-free DNA of 59 patients with lung or colorectal cancer.
Topics: Chromosome Mapping; CpG Islands; DNA; DNA Methylation; Genome, Human; Haplotypes; High-Throughput Nucleotide Sequencing; Humans; Sequence Analysis, DNA
PubMed: 28263317
DOI: 10.1038/ng.3805 -
Trends in Ecology & Evolution Mar 2020The particular combinations of alleles that define haplotypes along individual chromosomes can be determined with increasing ease and accuracy by using current... (Review)
Review
The particular combinations of alleles that define haplotypes along individual chromosomes can be determined with increasing ease and accuracy by using current sequencing technologies. Beyond allele frequencies, haplotype data collected in population samples contain information about the history of allelic associations in gene genealogies, and this is of tremendous potential for conservation genomics. We provide an overview of how haplotype information can be used to assess historical demography, gene flow, selection, and the evolutionary outcomes of hybridization across different timescales relevant to conservation issues. We address technical aspects of applying such approaches to nonmodel species. We conclude that there is much to be gained by integrating haplotype-based analyses in future conservation genomics studies.
Topics: Alleles; Gene Flow; Gene Frequency; Genomics; Haplotypes
PubMed: 31810774
DOI: 10.1016/j.tree.2019.10.012 -
Genes May 2022Nowadays, the use of Y-chromosome polymorphisms forms an essential part of many forensic DNA investigations. However, this was not always the case. Only since 1992 have... (Review)
Review
Nowadays, the use of Y-chromosome polymorphisms forms an essential part of many forensic DNA investigations. However, this was not always the case. Only since 1992 have we seen that some forensic scientists started to have an interest in this chromosome. In this review, I will sketch a brief history focusing on the forensic use of Y-chromosome polymorphisms. Before describing the various applications of short-tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) on the Y-chromosome, I will discuss a few often ignored aspects influencing proper use and interpretation of Y-chromosome information: (i) genotyping Y-SNPs and Y-STRs, (ii) Y-STR haplotypes shared identical by state (IBS) or identical by descent (IBD), and (iii) Y-haplotype database frequencies.
Topics: Chromosomes, Human, Y; DNA; Haplotypes; Humans; Microsatellite Repeats; Polymorphism, Single Nucleotide
PubMed: 35627283
DOI: 10.3390/genes13050898 -
Plant Biotechnology Journal Jun 2022Genome phasing is a recently developed assembly method that separates heterozygous eukaryotic genomic regions and builds haplotype-resolved assemblies. Because... (Review)
Review
Genome phasing is a recently developed assembly method that separates heterozygous eukaryotic genomic regions and builds haplotype-resolved assemblies. Because differences between haplotypes are ignored in most published de novo genomes, assemblies are available as consensus genomes consisting of haplotype mixtures, thus increasing the need for genome phasing. Here, we review the operating principles and characteristics of several freely available and widely used phasing tools (TrioCanu, FALCON-Phase, and ALLHiC). An examination of downstream analyses using haplotype-resolved genome assemblies in plants indicated significant differences among haplotypes regarding chromosomal rearrangements, sequence insertions, and expression of specific alleles that contribute to the acquisition of the biological characteristics of plant species. Finally, we suggest directions to solve addressing limitations of current genome-phasing methods. This review provides insights into the current progress, limitations, and future directions of de novo genome phasing, which will enable researchers to easily access and utilize genome-phasing in studies involving highly heterozygous complex plant genomes.
Topics: Alleles; Genome, Plant; Genomics; Haplotypes; Plants; Sequence Analysis, DNA
PubMed: 35332665
DOI: 10.1111/pbi.13815 -
Molecular Ecology Mar 2023The term "haplotype block" is commonly used in the developing field of haplotype-based inference methods. We argue that the term should be defined based on the structure...
The term "haplotype block" is commonly used in the developing field of haplotype-based inference methods. We argue that the term should be defined based on the structure of the Ancestral Recombination Graph (ARG), which contains complete information on the ancestry of a sample. We use simulated examples to demonstrate key features of the relationship between haplotype blocks and ancestral structure, emphasizing the stochasticity of the processes that generate them. Even the simplest cases of neutrality or of a "hard" selective sweep produce a rich structure, often missed by commonly used statistics. We highlight a number of novel methods for inferring haplotype structure, based on the full ARG, or on a sequence of trees, and illustrate how they can be used to define haplotype blocks using an empirical data set. While the advent of new, computationally efficient methods makes it possible to apply these concepts broadly, they (and additional new methods) could benefit from adding features to explore haplotype blocks, as we define them. Understanding and applying the concept of the haplotype block will be essential to fully exploit long and linked-read sequencing technologies.
Topics: Haplotypes; Algorithms; Models, Genetic
PubMed: 36433653
DOI: 10.1111/mec.16793 -
Journal of Medicine and Life Feb 2022Y-chromosome DNA profiles are promising tools in population genetics and forensic science. Analysis of Y-chromosome variety was performed on a total of 191 unrelated...
Y-chromosome DNA profiles are promising tools in population genetics and forensic science. Analysis of Y-chromosome variety was performed on a total of 191 unrelated males throughout different regions in Basrah. The Y-chromosome variety was explored utilizing 17 markers system. For the uniparental system, the large majority of the haplogroups observed in the Basrah population are (R1b, E1b1b, G2a, and J1) considered to have begun in the Middle East and to have later spread all over Western Eurasia. 30% of the Y-chromosomes, in all likelihood, represent landings from inaccessible distant geographic regions. The level of haplotype diversity and its implication for statistics are evaluated. The distinctive extent of long go genetic input observed for the Y chromosome shows that gene flow events to this area might have involved mainly males.
Topics: Chromosomes, Human, Y; Genetics, Population; Haplotypes; Humans; Male; Microsatellite Repeats
PubMed: 35419101
DOI: 10.25122/jml-2021-0281 -
Bioinformatics (Oxford, England) Jan 2020The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological,...
MOTIVATION
The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes.
RESULTS
We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.
AVAILABILITY AND IMPLEMENTATION
Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Algorithms; Genome; Haplotypes; Sequence Analysis, DNA; Software
PubMed: 31406990
DOI: 10.1093/bioinformatics/btz575