-
Microbial Biotechnology Jan 2024The human microbiome plays a crucial role in maintaining health, with advances in high-throughput sequencing technology and reduced sequencing costs triggering a surge... (Review)
Review
The human microbiome plays a crucial role in maintaining health, with advances in high-throughput sequencing technology and reduced sequencing costs triggering a surge in microbiome research. Microbiome studies generally incorporate five key phases: design, sampling, sequencing, analysis, and reporting, with sequencing strategy being a crucial step offering numerous options. Present mainstream sequencing strategies include Amplicon sequencing, Metagenomic Next-Generation Sequencing (mNGS), and Targeted Next-Generation Sequencing (tNGS). Two innovative technologies recently emerged, namely MobiMicrobe high-throughput microbial single-cell genome sequencing technology and 2bRAD-M simplified metagenomic sequencing technology, compensate for the limitations of mainstream technologies, each boasting unique core strengths. This paper reviews the basic principles and processes of these three mainstream and two novel microbiological technologies, aiding readers in understanding the benefits and drawbacks of different technologies, thereby guiding the selection of the most suitable method for their research endeavours.
Topics: Humans; Microbiota; Metagenome; High-Throughput Nucleotide Sequencing; Metagenomics; Technology
PubMed: 37929823
DOI: 10.1111/1751-7915.14364 -
Nature Nanotechnology Dec 2023
Topics: Nanopore Sequencing; Sequence Analysis, DNA; Biodiversity; Nanopores; High-Throughput Nucleotide Sequencing
PubMed: 37749223
DOI: 10.1038/s41565-023-01508-x -
Nature Genetics Oct 2023
Topics: Algorithms; High-Throughput Nucleotide Sequencing; Polymorphism, Single Nucleotide
PubMed: 37816888
DOI: 10.1038/s41588-023-01544-2 -
Molecular Ecology Resources Aug 2023Although plastid genome (plastome) structure is highly conserved across most seed plants, investigations during the past two decades have revealed several disparately...
Although plastid genome (plastome) structure is highly conserved across most seed plants, investigations during the past two decades have revealed several disparately related lineages that experienced substantial rearrangements. Most plastomes contain a large inverted repeat and two single-copy regions, and a few dispersed repeats; however, the plastomes of some taxa harbour long repeat sequences (>300 bp). These long repeats make it challenging to assemble complete plastomes using short-read data, leading to misassemblies and consensus sequences with spurious rearrangements. Single-molecule, long-read sequencing has the potential to overcome these challenges, yet there is no consensus on the most effective method for accurately assembling plastomes using long-read data. We generated a pipeline, plastid Genome Assembly Using Long-read data (ptGAUL), to address the problem of plastome assembly using long-read data from Oxford Nanopore Technologies (ONT) or Pacific Biosciences platforms. We demonstrated the efficacy of the ptGAUL pipeline using 16 published long-read data sets. We showed that ptGAUL quickly produces accurate and unbiased assemblies using only ~50× coverage of plastome data. Additionally, we deployed ptGAUL to assemble four new Juncus (Juncaceae) plastomes using ONT long reads. Our results revealed many long repeats and rearrangements in Juncus plastomes compared with basal lineages of Poales. The ptGAUL pipeline is available on GitHub: https://github.com/Bean061/ptgaul.
Topics: Genome, Plastid; Repetitive Sequences, Nucleic Acid; Gene Rearrangement; Plastids; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA
PubMed: 36939021
DOI: 10.1111/1755-0998.13787 -
PloS One 2023K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full...
K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.
Topics: Algorithms; Sequence Analysis, DNA; Genomics; Genome; Computational Biology; High-Throughput Nucleotide Sequencing; Software
PubMed: 38019768
DOI: 10.1371/journal.pone.0294415 -
Molecular Aspects of Medicine Apr 2024Massively parallel sequencing technologies have long been used in both basic research and clinical routine. The recent introduction of digital sequencing has made... (Review)
Review
Massively parallel sequencing technologies have long been used in both basic research and clinical routine. The recent introduction of digital sequencing has made previously challenging applications possible by significantly improving sensitivity and specificity to now allow detection of rare sequence variants, even at single molecule level. Digital sequencing utilizes unique molecular identifiers (UMIs) to minimize sequencing-induced errors and quantification biases. Here, we discuss the principles of UMIs and how they are used in digital sequencing. We outline the properties of different UMI types and the consequences of various UMI approaches in relation to experimental protocols and bioinformatics. Finally, we describe how digital sequencing can be applied in specific research fields, focusing on cancer management where it can be used in screening of asymptomatic individuals, diagnosis, treatment prediction, prognostication, monitoring treatment efficacy and early detection of treatment resistance as well as relapse.
Topics: Humans; High-Throughput Nucleotide Sequencing; Computational Biology; Sensitivity and Specificity
PubMed: 38367531
DOI: 10.1016/j.mam.2024.101253 -
Annals of Oncology : Official Journal... Dec 2023
Topics: Humans; Genome, Human; High-Throughput Nucleotide Sequencing
PubMed: 37816462
DOI: 10.1016/j.annonc.2023.09.3118 -
American Journal of Clinical Pathology Nov 2023To validate a large next-generation sequencing (NGS) panel for comprehensive genomic profiling and improve patient access to more effective precision oncology treatment...
OBJECTIVES
To validate a large next-generation sequencing (NGS) panel for comprehensive genomic profiling and improve patient access to more effective precision oncology treatment strategies.
METHODS
OncoPanScan was designed by targeting 825 cancer-related genes to detect a broad range of genomic alterations. A practical validation strategy was used to evaluate the assay's analytical performance, involving 97 tumor specimens with 25 paired blood specimens, 10 engineered cell lines, and 121 artificial reference DNA samples.
RESULTS
Overall, 1107 libraries were prepared and the sequencing failure rate was 0.18%. Across alteration classes, sensitivity ranged from 0.938 to more than 0.999, specificity ranged from 0.889 to more than 0.999, positive predictive value ranged from 0.867 to more than 0.999, repeatability ranged from 0.908 to more than 0.999, and reproducibility ranged from 0.832 to more than 0.999. The limit of detection for variants was established based on variant frequency, while for tumor mutation burden and microsatellite instability, it was based on tumor content, resulting in a minimum requirement of 20% tumor content. Benchmarking variant calls against validated NGS assays revealed that variations in the dry-bench processes were the primary cause of discordances.
CONCLUSIONS
This study presents a detailed validation framework and empirical recommendations for large panel validation and elucidates the sources of discordant alteration calls by comparing with "gold standard measures."
Topics: Humans; Neoplasms; Mutation; Benchmarking; Reproducibility of Results; Precision Medicine; Genomics; High-Throughput Nucleotide Sequencing
PubMed: 37477357
DOI: 10.1093/ajcp/aqad078 -
Trends in Genetics : TIG Mar 2024In the past decade tRNA sequencing (tRNA-seq) has attracted considerable attention as an important tool for the development of novel approaches to quantify highly... (Review)
Review
In the past decade tRNA sequencing (tRNA-seq) has attracted considerable attention as an important tool for the development of novel approaches to quantify highly modified tRNA species and to propel tRNA research aimed at understanding the cellular physiology and disease and development of tRNA-based therapeutics. Many methods are available to quantify tRNA abundance while accounting for modifications and tRNA charging/acylation. Advances in both library preparation methods and bioinformatic workflows have enabled developments in next-generation sequencing (NGS) workflows. Other approaches forgo NGS applications in favor of hybridization-based approaches. In this review we provide a brief comparative overview of various tRNA quantification approaches, focusing on the advantages and disadvantages of these methods, which together facilitate reliable tRNA quantification.
Topics: RNA, Transfer; High-Throughput Nucleotide Sequencing; Computational Biology; Transfer RNA Aminoacylation
PubMed: 38123442
DOI: 10.1016/j.tig.2023.11.001 -
Emerging Topics in Life Sciences Dec 2023Tandem repeat DNA sequences constitute a significant proportion of the human genome. While previously considered to be functionally inert, these sequences are now... (Review)
Review
Tandem repeat DNA sequences constitute a significant proportion of the human genome. While previously considered to be functionally inert, these sequences are now broadly accepted as important contributors to genetic diversity. However, the polymorphic nature of these sequences can lead to expansion beyond a gene-specific threshold, causing disease. More than 50 pathogenic repeat expansions have been identified to date, many of which have been discovered in the last decade as a result of advances in sequencing technologies and associated bioinformatic tools. Commonly utilised diagnostic platforms including Sanger sequencing, capillary array electrophoresis, and Southern blot are generally low throughput and are often unable to accurately determine repeat size, composition, and epigenetic signature, which are important when characterising repeat expansions. The rapid advances in bioinformatic tools designed specifically to interrogate short-read sequencing and the development of long-read single molecule sequencing is enabling a new generation of high throughput testing for repeat expansion disorders. In this review, we discuss some of the challenges surrounding the identification and characterisation of disease-causing repeat expansions and the technological advances that are poised to translate the promise of genomic medicine to individuals and families affected by these disorders.
Topics: Humans; Tandem Repeat Sequences; Sequence Analysis, DNA; Computational Biology; High-Throughput Nucleotide Sequencing
PubMed: 37888797
DOI: 10.1042/ETLS20230019