-
Trends in Biotechnology Dec 2023The impact of next-generation sequencing (NGS) cannot be overestimated. The technology has transformed the field of life science, contributing to a dramatic expansion in... (Review)
Review
The impact of next-generation sequencing (NGS) cannot be overestimated. The technology has transformed the field of life science, contributing to a dramatic expansion in our understanding of human health and disease and our understanding of biology and ecology. The vast majority of the major NGS systems today are based on the concept of 'sequencing by synthesis' (SBS) with sequential detection of nucleotide incorporation using an engineered DNA polymerase. Based on this strategy, various alternative platforms have been developed, including the use of either native nucleotides or reversible terminators and different strategies for the attachment of DNA to a solid support. In this review, some of the key concepts leading to this remarkable development are discussed.
Topics: Humans; Sequence Analysis, DNA; DNA; Nucleotides; DNA-Directed DNA Polymerase; High-Throughput Nucleotide Sequencing
PubMed: 37482467
DOI: 10.1016/j.tibtech.2023.06.007 -
Bioinformatics (Oxford, England) Aug 2023Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of... (Review)
Review
MOTIVATION
Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools.
RESULTS
In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy. We examine the performance of three general-purpose short-read aligners-BWA-MEM, Bowtie 2, and Arioc-in conjunction with three germline variant callers: DeepVariant, FreeBayes, and GATK HaplotypeCaller. We discuss the behavior of the read aligners with regard to the data elements on which the variant callers rely, and illustrate how the runtime configurations of these software tools combine to affect variant-calling performance.
AVAILABILITY AND IMPLEMENTATION
The quick brown fox jumps over the lazy dog.
Topics: High-Throughput Nucleotide Sequencing; Software; Germ Cells; Sequence Analysis, DNA
PubMed: 37527006
DOI: 10.1093/bioinformatics/btad480 -
Acta Myologica : Myopathies and... 2023Massive parallel sequencing methods, such as exome, genome, and targeted DNA sequencing, have aided molecular diagnosis of genetic diseases in the last 20 years.... (Review)
Review
Massive parallel sequencing methods, such as exome, genome, and targeted DNA sequencing, have aided molecular diagnosis of genetic diseases in the last 20 years. However, short-read sequencing methods still have several limitations, such inaccurate genome assembly, the inability to detect large structural variants, and variants located in hard-to-sequence regions like highly repetitive areas. The recently emerged PacBio single-molecule real-time (SMRT) and Oxford nanopore technology (ONT) long-read sequencing (LRS) methods have been shown to overcome most of these technical issues, leading to an increase in diagnostic rate. LRS methods are contributing to the detection of repeat expansions in novel disease-causing genes (e.g., , and causing an Oculopharyngodistal myopathy or causing a Myopathy with rimmed ubiquitin-positive autophagic vacuolation), of structural variants (e.g., in ), and of single nucleotide variants in repetitive regions ( and ). Moreover, these methods have simplified the characterization of the D4Z4 repeats in , facilitating the diagnosis of Facioscapulohumeral muscular dystrophy (FSHD). We review recent studies that have used either ONT or PacBio SMRT sequencing methods and discuss different types of variants that have been detected using these approaches in individuals with neuromuscular disorders.
Topics: Humans; Sequence Analysis, DNA; Muscular Dystrophy, Facioscapulohumeral; Repetitive Sequences, Nucleic Acid; High-Throughput Nucleotide Sequencing
PubMed: 38406378
DOI: 10.36185/2532-1900-394 -
Journal of Clinical Microbiology Aug 2023Microbial cell-free DNA (mcfDNA) sequencing is an emerging infectious disease diagnostic tool which enables unbiased pathogen detection and quantification from plasma....
Microbial cell-free DNA (mcfDNA) sequencing is an emerging infectious disease diagnostic tool which enables unbiased pathogen detection and quantification from plasma. The Karius Test, a commercial mcfDNA sequencing assay developed by and available since 2017 from Karius, Inc. (Redwood City, CA), detects and quantifies mcfDNA as molecules/μL in plasma. The commercial sample data and results for all tests conducted from April 2018 through mid-September 2021 were evaluated for laboratory quality metrics, reported pathogens, and data from test requisition forms. A total of 18,690 reports were generated from 15,165 patients in a hospital setting among 39 states and the District of Columbia. The median time from sample receipt to reported result was 26 h (interquartile range [IQR] 25 to 28), and 96% of samples had valid test results. Almost two-thirds (65%) of patients were adults, and 29% at the time of diagnostic testing had ICD-10 codes representing a diverse array of clinical scenarios. There were 10,752 (58%) reports that yielded at least one taxon for a total of 22,792 detections spanning 701 unique microbial taxa. The 50 most common taxa detected included 36 bacteria, 9 viruses, and 5 fungi. Opportunistic fungi (374 Aspergillus spp., 258 Pneumocystis jirovecii, 196 , and 33 dematiaceous fungi) comprised 861 (4%) of all detections. Additional diagnostically challenging pathogens (247 zoonotic and vector-borne pathogens, 144 Mycobacterium spp., 80 spp., 78 systemic dimorphic fungi, 69 spp., and 57 protozoan parasites) comprised 675 (3%) of all detections. This is the largest reported cohort of patients tested using plasma mcfDNA sequencing and represents the first report of a clinical grade metagenomic test performed at scale. Data reveal new insights into the breadth and complexity of potential pathogens identified.
Topics: Adult; Humans; Fungi; Bacteria; Viruses; High-Throughput Nucleotide Sequencing; Metagenomics; Sequence Analysis, DNA
PubMed: 37439686
DOI: 10.1128/jcm.01855-22 -
BMB Reports May 2024Mammalian genomes are intricately compacted to form sophisticated 3-dimensional structures within the tiny nucleus, so called 3D genome folding. Despite their shapes... (Review)
Review
Mammalian genomes are intricately compacted to form sophisticated 3-dimensional structures within the tiny nucleus, so called 3D genome folding. Despite their shapes reminiscent of an entangled yarn, the rapid development of molecular and next-generation sequencing technologies (NGS) has revealed that mammalian genomes are highly organized in a hierarchical order that delicately affects transcription activities. An increasing amount of evidence suggests that 3D genome folding is implicated in diseases, giving us a clue on how to identify novel therapeutic approaches. In this review, we will study what 3D genome folding means in epigenetics, what types of 3D genome structures there are, how they are formed, and how the technologies have developed to explore them. We will also discuss the pathological implications of 3D genome folding. Finally, we will discuss how to leverage 3D genome folding and engineering for future studies. [BMB Reports 2024; 57(5): 216-231].
Topics: Humans; Epigenomics; Animals; Epigenesis, Genetic; Genome; High-Throughput Nucleotide Sequencing
PubMed: 38627948
DOI: 10.5483/BMBRep.2023-0249 -
Genes Sep 2023In the last decade, the development of high-throughput sequencing methodologies has significantly improved the gathering of genomic information and consequent...
In the last decade, the development of high-throughput sequencing methodologies has significantly improved the gathering of genomic information and consequent under-standing of the genetic and epigenetic background of complex and monogenetic endocrine disorders [...].
Topics: Humans; Epigenomics; Epigenesis, Genetic; Genomics; Endocrine System Diseases; High-Throughput Nucleotide Sequencing
PubMed: 37761903
DOI: 10.3390/genes14091763 -
BMC Bioinformatics Oct 2023Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed...
BACKGROUND
Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods.
RESULTS
HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads.
CONCLUSIONS
To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).
Topics: Humans; Software; Metagenome; Sequence Analysis, DNA; Microbiota; High-Throughput Nucleotide Sequencing
PubMed: 37784008
DOI: 10.1186/s12859-023-05492-w -
PloS One 2023K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full...
K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.
Topics: Algorithms; Sequence Analysis, DNA; Genomics; Genome; Computational Biology; High-Throughput Nucleotide Sequencing; Software
PubMed: 38019768
DOI: 10.1371/journal.pone.0294415 -
Journal of Virological Methods Oct 2023The ability of viral metagenomic Next-Generation Sequencing (mNGS) to unbiasedly detect nucleic acids in a clinical sample is a powerful tool for advanced diagnosis of...
The ability of viral metagenomic Next-Generation Sequencing (mNGS) to unbiasedly detect nucleic acids in a clinical sample is a powerful tool for advanced diagnosis of viral infections. When clinical symptoms do not provide a clear differential diagnosis, extensive laboratory testing with virus-specific PCR and serology can be replaced by a single viral mNGS analysis. However, widespread diagnostic use of viral mNGS is thus far limited by long sample-to-result times, as most protocols rely on Illumina sequencing, which provides high and accurate sequencing output but is time-consuming and expensive. Here, we describe the development of an mNGS protocol based on the more cost-effective Nanopore Flongle sequencing with decreased turnaround time and lower, yet sufficient sequencing output to provide sensitive virus detection. Sample preparation (6 h) and sequencing (2 h) times are substantially reduced compared to Illumina mNGS and allow detection of DNA/RNA viruses at low input (up to 33-38 cycle threshold of specific qPCR). Although Flongles yield lower sequencing output, direct comparison with Illumina mNGS on diverse clinical samples showed similar results. Collectively, the novel Nanopore mNGS approach is specifically tailored for use in clinical diagnostics and provides a rapid and cost-effective mNGS strategy for individual testing of severe cases.
Topics: Humans; Metagenomics; Nanopores; Virus Diseases; Viruses; RNA Viruses; DNA Viruses; High-Throughput Nucleotide Sequencing; Sensitivity and Specificity
PubMed: 37516367
DOI: 10.1016/j.jviromet.2023.114784 -
Nucleic Acids Research Nov 2023Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read...
Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons.
Topics: Humans; Exons; High-Throughput Nucleotide Sequencing; Protein Isoforms; RNA; RNA Isoforms; RNA-Seq; Sequence Analysis, RNA; Transcriptome
PubMed: 37843096
DOI: 10.1093/nar/gkad810