-
Journal of Immunology (Baltimore, Md. :... Feb 2024
Topics: Algorithms; Software; Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing
PubMed: 38315949
DOI: 10.4049/jimmunol.2390025 -
Nature Reviews. Neurology Feb 2024The ability to sequence entire exomes and genomes has revolutionized molecular testing in rare movement disorders, and genomic sequencing is becoming an integral part of... (Review)
Review
The ability to sequence entire exomes and genomes has revolutionized molecular testing in rare movement disorders, and genomic sequencing is becoming an integral part of routine diagnostic workflows for these heterogeneous conditions. However, interpretation of the extensive genomic variant information that is being generated presents substantial challenges. In this Perspective, we outline multidimensional strategies for genetic diagnosis in patients with rare movement disorders. We examine bioinformatics tools and computational metrics that have been developed to facilitate accurate prioritization of disease-causing variants. Additionally, we highlight community-driven data-sharing and case-matchmaking platforms, which are designed to foster the discovery of new genotype-phenotype relationships. Finally, we consider how multiomic data integration might optimize diagnostic success by combining genomic, epigenetic, transcriptomic and/or proteomic profiling to enable a more holistic evaluation of variant effects. Together, the approaches that we discuss offer pathways to the improved understanding of the genetic basis of rare movement disorders.
Topics: Humans; Proteomics; Computational Biology; Genomics; High-Throughput Nucleotide Sequencing; Rare Diseases; Movement Disorders
PubMed: 38172289
DOI: 10.1038/s41582-023-00909-9 -
Journal of Virological Methods Oct 2023The ability of viral metagenomic Next-Generation Sequencing (mNGS) to unbiasedly detect nucleic acids in a clinical sample is a powerful tool for advanced diagnosis of...
The ability of viral metagenomic Next-Generation Sequencing (mNGS) to unbiasedly detect nucleic acids in a clinical sample is a powerful tool for advanced diagnosis of viral infections. When clinical symptoms do not provide a clear differential diagnosis, extensive laboratory testing with virus-specific PCR and serology can be replaced by a single viral mNGS analysis. However, widespread diagnostic use of viral mNGS is thus far limited by long sample-to-result times, as most protocols rely on Illumina sequencing, which provides high and accurate sequencing output but is time-consuming and expensive. Here, we describe the development of an mNGS protocol based on the more cost-effective Nanopore Flongle sequencing with decreased turnaround time and lower, yet sufficient sequencing output to provide sensitive virus detection. Sample preparation (6 h) and sequencing (2 h) times are substantially reduced compared to Illumina mNGS and allow detection of DNA/RNA viruses at low input (up to 33-38 cycle threshold of specific qPCR). Although Flongles yield lower sequencing output, direct comparison with Illumina mNGS on diverse clinical samples showed similar results. Collectively, the novel Nanopore mNGS approach is specifically tailored for use in clinical diagnostics and provides a rapid and cost-effective mNGS strategy for individual testing of severe cases.
Topics: Humans; Metagenomics; Nanopores; Virus Diseases; Viruses; RNA Viruses; DNA Viruses; High-Throughput Nucleotide Sequencing; Sensitivity and Specificity
PubMed: 37516367
DOI: 10.1016/j.jviromet.2023.114784 -
BMC Bioinformatics Mar 2024Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and...
BACKGROUND
Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies.
RESULTS
Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure.
CONCLUSIONS
COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Topics: Humans; DNA Copy Number Variations; Reproducibility of Results; Microsatellite Instability; High-Throughput Nucleotide Sequencing; Software
PubMed: 38532317
DOI: 10.1186/s12859-024-05756-z -
Nature Communications Jul 2023Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA...
Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
Topics: Humans; 5-Methylcytosine; Consensus; DNA; Sequence Analysis, DNA; DNA Methylation; High-Throughput Nucleotide Sequencing
PubMed: 37422489
DOI: 10.1038/s41467-023-39784-9 -
Nature Biotechnology May 2024Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic,...
Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic, an algorithm designed for the detection of somatic mutations in single-cell transcriptomic and ATAC-seq (assay for transposase-accessible chromatin sequence) data sets directly without requiring matched bulk or single-cell DNA sequencing data. SComatic distinguishes somatic mutations from polymorphisms, RNA-editing events and artefacts using filters and statistical tests parameterized on non-neoplastic samples. Using >2.6 million single cells from 688 single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data sets spanning cancer and non-neoplastic samples, we show that SComatic detects mutations in single cells accurately, even in differentiated cells from polyclonal tissues that are not amenable to mutation detection using existing methods. Validated against matched genome sequencing and scRNA-seq data, SComatic achieves F1 scores between 0.6 and 0.7 across diverse data sets, in comparison to 0.2-0.4 for the second-best performing method. In summary, SComatic permits de novo mutational signature analysis, and the study of clonal heterogeneity and mutational burdens at single-cell resolution.
Topics: Single-Cell Analysis; Humans; Mutation; High-Throughput Nucleotide Sequencing; Algorithms; Neoplasms
PubMed: 37414936
DOI: 10.1038/s41587-023-01863-z -
Genome Biology Oct 2023Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and... (Review)
Review
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Topics: Benchmarking; Genomics; Computational Biology; Genome; High-Throughput Nucleotide Sequencing
PubMed: 37798733
DOI: 10.1186/s13059-023-03061-1 -
Mutation Research. Reviews in Mutation... 2023Mutations, the irreversible changes in an organism's DNA sequence, are present in tissues at a variant allele frequency (VAF) ranging from ∼10 per bp for a founder... (Review)
Review
Mutations, the irreversible changes in an organism's DNA sequence, are present in tissues at a variant allele frequency (VAF) ranging from ∼10 per bp for a founder mutation to ∼10 for a histologically normal tissue sample containing several independent clones - compared to 1%- 50% for a heterozygous tumor mutation or a polymorphism. The rarity of these events poses a challenge for accurate clinical diagnosis and prognosis, toxicology, and discovering new disease etiologies. Standard Next-Generation Sequencing (NGS) technologies report VAFs as low as 0.5% per nt, but reliably observing rarer precursor events requires additional sophistication to measure ultralow-frequency mutations. We detail the challenge; define terms used to characterize the results, which vary between laboratories and sometimes conflict between biologists and bioinformaticists; and describe recent innovations to improve standard NGS methodologies including: single-strand consensus sequence methods such as Safe-SeqS and SiMSen-Seq; tandem-strand consensus sequence methods such as o2n-Seq and SMM-Seq; and ultrasensitive parent-strand consensus sequence methods such as DuplexSeq, PacBio HiFi, SinoDuplex, OPUSeq, EcoSeq, BotSeqS, Hawk-Seq, NanoSeq, SaferSeq, and CODEC. Practical applications are also noted. Several methods quantify VAF down to 10 at a nt and mutation frequency (MF) in a target region down to 10 per nt. By expanding to > 1 Mb of sites never observed twice, thus forgoing VAF, other methods quantify MF < 10 per nt or < 15 errors per haploid genome. Clonal expansion cannot be directly distinguished from independent mutations by sequencing, so it is essential for a paper to report whether its MF counted only different mutations - the minimum independent-mutation frequency MF - or all mutations observed including recurrences - the larger maximum independent-mutation frequency MF which may reflect clonal expansion. Ultrasensitive methods reveal that, without their use, even mutations with VAF 0.5-1% are usually spurious.
Topics: Humans; Mutation; Neoplasms; Prognosis; High-Throughput Nucleotide Sequencing
PubMed: 37716438
DOI: 10.1016/j.mrrev.2023.108471 -
Methods in Molecular Biology (Clifton,... 2024Modern high-throughput genomic testing using next-generation sequencing (NGS) has led to a significant increase in the successful diagnosis of rare genetic disorders.... (Review)
Review
Modern high-throughput genomic testing using next-generation sequencing (NGS) has led to a significant increase in the successful diagnosis of rare genetic disorders. Recent advances in NGS tools and techniques have led to accurate and timely diagnosis of a large proportion of genetic diseases by finding sequence variations in clinical samples. One of the NGS techniques, exome sequencing (ES), is considered as a powerful and easily approachable method for genetic disorders in terms of rapid and cost-effective diagnostic yields. In this chapter, we describe an overview of whole exome sequencing (ES) in the context of experimental and analytical methodologies. Approaches to ES include sequencing capture technique, quality control processes at various stages of sequencing analysis, exome data filtering strategy that incorporates both primary and secondary filtering, and prioritization of candidate variants in diagnosing genetic diseases.
Topics: Humans; Exome; Exome Sequencing; Genetic Testing; High-Throughput Nucleotide Sequencing; Rare Diseases; Sequence Analysis, DNA
PubMed: 37803113
DOI: 10.1007/978-1-0716-3461-5_5 -
Genome Research Jul 2023The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can...
The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as "peak calling." Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our replicative contrastive learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genomic labels and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance.
Topics: Chromatin Immunoprecipitation Sequencing; Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Chromatin; DNA
PubMed: 37217250
DOI: 10.1101/gr.277677.123