-
Human Mutation Nov 2022The advancements made in next-generation sequencing (NGS) technology over the past two decades have transformed our understanding of genetic variation in humans and had... (Review)
Review
The advancements made in next-generation sequencing (NGS) technology over the past two decades have transformed our understanding of genetic variation in humans and had a profound impact on our ability to diagnose patients with rare genetic diseases. In this review, we discuss the recently developed application of rapid NGS techniques, used to diagnose pediatric patients with suspected rare diseases who are critically ill. We highlight the challenges associated with performing such clinical diagnostics tests in terms of the laboratory infrastructure, bioinformatic analysis pipelines, and the ethical considerations that need to be addressed. We end by looking at what future developments in this field may look like and how they can be used to augment the genetic data to further improve the diagnostic rates for these high-priority patients.
Topics: Child; Chromosome Mapping; High-Throughput Nucleotide Sequencing; Humans; Pediatrics
PubMed: 36086948
DOI: 10.1002/humu.24466 -
The Veterinary Clinics of North... Mar 2023Next-generation sequencing (NGS) was initially developed to aid sequencing of the human genome. This molecular method is cost effective for sequencing and characterizing... (Review)
Review
Next-generation sequencing (NGS) was initially developed to aid sequencing of the human genome. This molecular method is cost effective for sequencing and characterizing genomes, not only those of humans or animals but also those of bacteria and other pathogens. However, rather than sequencing a single organism, a targeted NGS method can be used to specifically amplify pathogens of interest in a clinical sample for detection and characterization by sequencing. Targeted NGS is an ideal method for ruminant syndromic testing due to its ability to detect a variety of pathogens in a sample with a single test.
Topics: Humans; Animals; Bacteria; High-Throughput Nucleotide Sequencing
PubMed: 36731996
DOI: 10.1016/j.cvfa.2022.09.003 -
Briefings in Bioinformatics Sep 2022High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and... (Review)
Review
High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and complete genomes, which plays a key role in genome annotation, mutation detection, evolutionary analysis, gene function research, comparative genomics and other aspects. However, genome-wide short-read sequencing is difficult to produce a complete genome in the face of a complex genome with high duplication and multiple heterozygosity. The emergence of long-read sequencing technology has greatly improved the integrity of complex genome assembly. We review a variety of computational methods for complex genome assembly and describe in detail the theories, innovations and shortcomings of collapsed, semi-collapsed and uncollapsed assemblers based on long reads. Among the three methods, uncollapsed assembly is the most correct and complete way to represent genomes. In addition, genome assembly is closely related to haplotype reconstruction, that is uncollapsed assembly realizes haplotype reconstruction, and haplotype reconstruction promotes uncollapsed assembly. We hope that gapless, telomere-to-telomere and accurate assembly of complex genomes can be truly routinely achieved using only a simple process or a single tool in the future.
Topics: Chromosome Mapping; Genome; Genomics; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA
PubMed: 35940845
DOI: 10.1093/bib/bbac305 -
Journal of Bioinformatics and... Dec 2019
Topics: Computational Biology; Congresses as Topic; High-Throughput Nucleotide Sequencing; Proteins
PubMed: 32019407
DOI: 10.1142/S0219720019020049 -
Bioinformatics (Oxford, England) May 2023We describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than...
SUMMARY
We describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than gzip, at significantly faster compression and decompression speeds. We evaluated our algorithm on 533 BUS files from scRNA-seq experiments with a total size of 1TB. Our compression is 2.2× faster than the fastest gzip option 35% slower than the fastest zstd option and results in 1.5× smaller files than both methods. This amounts to an 8.3× reduction in the file size, resulting in a compressed size of 122GB for the dataset.
AVAILABILITY AND IMPLEMENTATION
A complete description of the format is available at https://github.com/BUStools/BUSZ-format and an implementation at https://github.com/BUStools/bustools. The code to reproduce the results of this article is available at https://github.com/pmelsted/BUSZ_paper.
Topics: High-Throughput Nucleotide Sequencing; Algorithms; Software; Data Compression; Exome Sequencing
PubMed: 37129540
DOI: 10.1093/bioinformatics/btad295 -
Bioinformatics (Oxford, England) Aug 2023Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of... (Review)
Review
MOTIVATION
Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools.
RESULTS
In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy. We examine the performance of three general-purpose short-read aligners-BWA-MEM, Bowtie 2, and Arioc-in conjunction with three germline variant callers: DeepVariant, FreeBayes, and GATK HaplotypeCaller. We discuss the behavior of the read aligners with regard to the data elements on which the variant callers rely, and illustrate how the runtime configurations of these software tools combine to affect variant-calling performance.
AVAILABILITY AND IMPLEMENTATION
The quick brown fox jumps over the lazy dog.
Topics: High-Throughput Nucleotide Sequencing; Software; Germ Cells; Sequence Analysis, DNA
PubMed: 37527006
DOI: 10.1093/bioinformatics/btad480 -
Methods in Molecular Biology (Clifton,... 2023The ultimate goal of de novo assembly of reads sequenced from a diploid individual is the separate reconstruction of the sequences corresponding to the two copies of...
The ultimate goal of de novo assembly of reads sequenced from a diploid individual is the separate reconstruction of the sequences corresponding to the two copies of each chromosome. Unfortunately, the allele linkage information needed to perform phased genome assemblies has been difficult to generate. Hence, most current genome assemblies are a haploid mixture of the two underlying chromosome copies present in the sequenced individual. Sequencing technologies providing long (20 kb) and accurate reads are the basis to generate phased genome assemblies. This chapter provides a brief overview of the main milestones in traditional genome assembly, focusing on the bioinformatic techniques developed to generate haplotype information from different specialized protocols. Using these techniques as a knowledge background, the chapter reviews the current algorithms to generate phased assemblies from long reads with low error rates. Current techniques perform haplotype-aware error correction steps to increase the quality of the raw reads. In addition, variations on the traditional overlap-layout-consensus (OLC) graph have been developed in an effort to eliminate edges between reads sequenced from different chromosome copies. This allows for large presence-absence variants between the chromosome copies to be taken into account. The development of these algorithms, along with the improved sequencing technologies has been crucial to finish chromosome-level assemblies of complex genomes.
Topics: Sequence Analysis, DNA; Haplotypes; Algorithms; Computational Biology; Alleles; High-Throughput Nucleotide Sequencing
PubMed: 36335504
DOI: 10.1007/978-1-0716-2819-5_16 -
Methods in Molecular Biology (Clifton,... 2024Detailed transcription maps of bacteriophages are not usually explored, limiting our understanding of molecular phage biology and restricting their exploitation and... (Review)
Review
Detailed transcription maps of bacteriophages are not usually explored, limiting our understanding of molecular phage biology and restricting their exploitation and engineering. The ONT-cappable-seq method described here brings phage transcriptomics to the accessible nanopore sequencing platform and provides an affordable and more detailed overview of transcriptional features compared to traditional RNA-seq experiments. With ONT-cappable-seq, primary transcripts are specifically capped, enriched, and prepared for long-read sequencing on the nanopore sequencing platform. This enables end-to-end sequencing of unprocessed transcripts covering both phage and host genome, thus providing insight on their operons. The subsequent analysis pipeline makes it possible to rapidly identify the most important transcriptional features such as transcription start and stop sites. The obtained data can thus provide a comprehensive overview of the transcription by your phage of interest.
Topics: Transcriptome; Bacteriophages; Gene Expression Profiling; Operon; Sequence Analysis, RNA; High-Throughput Nucleotide Sequencing
PubMed: 38526733
DOI: 10.1007/978-1-0716-3798-2_14 -
Methods in Molecular Biology (Clifton,... 2023Clinically relevant sequencing methodologies continue to expand in number, diversity, complexity, and scale. This evolving and varied landscape requires unique...
Clinically relevant sequencing methodologies continue to expand in number, diversity, complexity, and scale. This evolving and varied landscape requires unique implementations in all aspects of the assay, including the wet bench, bioinformatics, and reporting. Following implementation, the informatics of many of these tests continue to change over time, from software and annotation source updates, guidelines, and knowledgebase changes to changes in underlying information technology (IT) infrastructure. Key principles can be applied when implementing the informatics of a new clinical test which can greatly improve the lab's ability to deal with these updates rapidly and reliably. In this chapter, we discuss a variety of informatics issues which span all NGS applications. In particular, there is the need for implementing a reliable, repeatable, redundant, and version-controlled bioinformatics pipeline and architecture and a discussion of common methodologies to address these needs.
Topics: Informatics; Computational Biology; Software; High-Throughput Nucleotide Sequencing
PubMed: 37041438
DOI: 10.1007/978-1-0716-2950-5_3 -
Annual Review of Microbiology Sep 2020Shotgun metagenomic sequencing has revolutionized our ability to detect and characterize the diversity and function of complex microbial communities. In this review, we... (Review)
Review
Shotgun metagenomic sequencing has revolutionized our ability to detect and characterize the diversity and function of complex microbial communities. In this review, we highlight the benefits of using metagenomics as well as the breadth of conclusions that can be made using currently available analytical tools, such as greater resolution of species and strains across phyla and functional content, while highlighting challenges of metagenomic data analysis. Major challenges remain in annotating function, given the dearth of functional databases for environmental bacteria compared to model organisms, and the technical difficulties of metagenome assembly and phasing in heterogeneous environmental samples. In the future, improvements and innovation in technology and methodology will lead to lowered costs. Data integration using multiple technological platforms will lead to a better understanding of how to harness metagenomes. Subsequently, we will be able not only to characterize complex microbiomes but also to manipulate communities to achieve prosperous outcomes for health, agriculture, and environmental sustainability.
Topics: Bacteria; Computational Biology; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Microbiota
PubMed: 32603623
DOI: 10.1146/annurev-micro-012520-072314