pyrosequencing - OpenMD.com Journal Search

DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing.

Nature Communications Jul 2023

Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA...

Summary PubMed Full Text PDF

Authors: Peng Ni, Fan Nie, Zeyu Zhong...

Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.

Topics: Humans; 5-Methylcytosine; Consensus; DNA; Sequence Analysis, DNA; DNA Methylation; High-Throughput Nucleotide Sequencing

PubMed: 37422489
DOI: 10.1038/s41467-023-39784-9

Error-corrected next generation sequencing - Promises and challenges for genotoxicity and cancer risk assessment.

Mutation Research. Reviews in Mutation... 2023

Error-corrected Next Generation Sequencing (ecNGS) is rapidly emerging as a valuable, highly sensitive and accurate method for detecting and characterizing mutations in... (Review)

Summary PubMed Full Text

Review

Authors: Francesco Marchetti, Renato Cardoso, Connie L Chen...

Error-corrected Next Generation Sequencing (ecNGS) is rapidly emerging as a valuable, highly sensitive and accurate method for detecting and characterizing mutations in any cell type, tissue or organism from which DNA can be isolated. Recent mutagenicity and carcinogenicity studies have used ecNGS to quantify drug-/chemical-induced mutations and mutational spectra associated with cancer risk. ecNGS has potential applications in genotoxicity assessment as a new readout for traditional models, for mutagenesis studies in 3D organotypic cultures, and for detecting off-target effects of gene editing tools. Additionally, early data suggest that ecNGS can measure clonal expansion of mutations as a mechanism-agnostic early marker of carcinogenic potential and can evaluate mutational load directly in human biomonitoring studies. In this review, we discuss promising applications, challenges, limitations, and key data initiatives needed to enable regulatory testing and adoption of ecNGS - including for advancing safety assessment, augmenting weight-of-evidence for mutagenicity and carcinogenicity mechanisms, identifying early biomarkers of cancer risk, and managing human health risk from chemical exposures.

Topics: Humans; High-Throughput Nucleotide Sequencing; Mutagenicity Tests; Mutation; Mutagens; Carcinogens; Carcinogenesis; Risk Assessment

PubMed: 37643677
DOI: 10.1016/j.mrrev.2023.108466

JTK: targeted diploid genome assembler.

Bioinformatics (Oxford, England) Jul 2023

Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to...

Summary PubMed Full Text PDF

Authors: Bansho Masutani, Yoshihiko Suzuki, Yuta Suzuki...

MOTIVATION

Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large segmental duplications (SDs) or structural variations (SVs) because the alignments of reads deriving from these regions tend to be unreliable. Another approach is to use highly accurate PacBio HiFi reads to output diploid assembly directly. Nonetheless, HiFi reads cannot phase homozygous regions longer than their length and require oxford nanopore technology (ONT) reads or Hi-C to produce a fully phased assembly. Is a single long-read sequencing technology sufficient to create an accurate diploid assembly?

RESULTS

Here, we present JTK, a megabase-scale diploid genome assembler. It first randomly samples kilobase-scale sequences (called 'chunks') from the long reads, phases variants found on them, and produces two haplotypes. The novel idea of JTK is to utilize chunks to capture SNVs and SVs simultaneously. From 60-fold ONT reads on the HG002 and a Japanese sample, it fully assembled two haplotypes with approximately 99.9% accuracy on the histocompatibility complex (MHC) and the leukocyte receptor complex (LRC) regions, which was impossible by the reference-based approach. In addition, in the LRC region on a Japanese sample, JTK output an assembly of better contiguity than those built from high-coverage HiFi+Hi-C. In the coming age of pan-genomics, JTK would complement the reference-based phasing method to assemble the difficult-to-assemble but medically important regions.

AVAILABILITY AND IMPLEMENTATION

JTK is available at https://github.com/ban-m/jtk, and the datasets are available at https://doi.org/10.5281/zenodo.7790310 or JGAS000580 in DDBJ.

Topics: Diploidy; Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Genome; Genomics; Haplotypes

PubMed: 37354526
DOI: 10.1093/bioinformatics/btad398

Single-nucleotide variant calling in single-cell sequencing data with Monopogen.

Nature Biotechnology May 2024

Single-cell omics technologies enable molecular characterization of diverse cell types and states, but how the resulting transcriptional and epigenetic profiles depend...

Summary PubMed Full Text PDF

Authors: Jinzhuang Dou, Yukun Tan, Kian Hong Kock...

Single-cell omics technologies enable molecular characterization of diverse cell types and states, but how the resulting transcriptional and epigenetic profiles depend on the cell's genetic background remains understudied. We describe Monopogen, a computational tool to detect single-nucleotide variants (SNVs) from single-cell sequencing data. Monopogen leverages linkage disequilibrium from external reference panels to identify germline SNVs and detects putative somatic SNVs using allele cosegregating patterns at the cell population level. It can identify 100 K to 3 M germline SNVs achieving a genotyping accuracy of 95%, together with hundreds of putative somatic SNVs. Monopogen-derived genotypes enable global and local ancestry inference and identification of admixed samples. It identifies variants associated with cardiomyocyte metabolic levels and epigenomic programs. It also improves putative somatic SNV detection that enables clonal lineage tracing in primary human clonal hematopoiesis. Monopogen brings together population genetics, cell lineage tracing and single-cell omics to uncover genetic determinants of cellular processes.

Topics: Single-Cell Analysis; Humans; Polymorphism, Single Nucleotide; High-Throughput Nucleotide Sequencing; Linkage Disequilibrium; Software; Computational Biology; Genotype

PubMed: 37592035
DOI: 10.1038/s41587-023-01873-x

COSAP: Comparative Sequencing Analysis Platform.

BMC Bioinformatics Mar 2024

Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and...

Summary PubMed Full Text PDF

Authors: Mehmet Arif Ergun, Omer Cinal, Berkant Bakışlı...

BACKGROUND

Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies.

RESULTS

Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure.

CONCLUSIONS

COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.

Topics: Humans; DNA Copy Number Variations; Reproducibility of Results; Microsatellite Instability; High-Throughput Nucleotide Sequencing; Software

PubMed: 38532317
DOI: 10.1186/s12859-024-05756-z

De novo detection of somatic mutations in high-throughput single-cell profiling data sets.

Nature Biotechnology May 2024

Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic,...

Summary PubMed Full Text PDF

Authors: Francesc Muyas, Carolin M Sauer, Jose Espejo Valle-Inclán...

Characterization of somatic mutations at single-cell resolution is essential to study cancer evolution, clonal mosaicism and cell plasticity. Here, we describe SComatic, an algorithm designed for the detection of somatic mutations in single-cell transcriptomic and ATAC-seq (assay for transposase-accessible chromatin sequence) data sets directly without requiring matched bulk or single-cell DNA sequencing data. SComatic distinguishes somatic mutations from polymorphisms, RNA-editing events and artefacts using filters and statistical tests parameterized on non-neoplastic samples. Using >2.6 million single cells from 688 single-cell RNA-seq (scRNA-seq) and single-cell ATAC-seq (scATAC-seq) data sets spanning cancer and non-neoplastic samples, we show that SComatic detects mutations in single cells accurately, even in differentiated cells from polyclonal tissues that are not amenable to mutation detection using existing methods. Validated against matched genome sequencing and scRNA-seq data, SComatic achieves F1 scores between 0.6 and 0.7 across diverse data sets, in comparison to 0.2-0.4 for the second-best performing method. In summary, SComatic permits de novo mutational signature analysis, and the study of clonal heterogeneity and mutational burdens at single-cell resolution.

Topics: Single-Cell Analysis; Humans; Mutation; High-Throughput Nucleotide Sequencing; Algorithms; Neoplasms

PubMed: 37414936
DOI: 10.1038/s41587-023-01863-z

Genomic variant benchmark: if you cannot measure it, you cannot improve it.

Genome Biology Oct 2023

Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and... (Review)

Summary PubMed Full Text PDF

Review

Authors: Sina Majidian, Daniel Paiva Agustinho, Chen-Shan Chin...

Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.

Topics: Benchmarking; Genomics; Computational Biology; Genome; High-Throughput Nucleotide Sequencing

PubMed: 37798733
DOI: 10.1186/s13059-023-03061-1

Unsupervised contrastive peak caller for ATAC-seq.

Genome Research Jul 2023

The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can...

Summary PubMed Full Text PDF

Authors: Ha T H Vu, Yudi Zhang, Geetu Tuteja...

The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as "peak calling." Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our replicative contrastive learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genomic labels and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance.

Topics: Chromatin Immunoprecipitation Sequencing; Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Chromatin; DNA

PubMed: 37217250
DOI: 10.1101/gr.277677.123

Unraveling metagenomics through long-read sequencing: a comprehensive review.

Journal of Translational Medicine Jan 2024

The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics.... (Review)

Summary PubMed Full Text PDF

Review

Authors: Chankyung Kim, Monnat Pongpanich, Thantrira Porntaveetus...

The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.

Topics: RNA, Ribosomal, 16S; High-Throughput Nucleotide Sequencing; Metagenomics; Sequence Analysis, DNA; Genomics

PubMed: 38282030
DOI: 10.1186/s12967-024-04917-1

Utility of long-read sequencing for All of Us.

Nature Communications Jan 2024

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a...

Summary PubMed Full Text PDF

Authors: M Mahmoud, Y Huang, K Garimella...

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.

Topics: Humans; Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Genome, Human; Population Health; INDEL Mutation

PubMed: 38281971
DOI: 10.1038/s41467-024-44804-3