-
Heredity Feb 2023High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough...
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Topics: Humans; Genomics; Software; Genome; High-Throughput Nucleotide Sequencing; Benchmarking
PubMed: 36496447
DOI: 10.1038/s41437-022-00577-3 -
Nature Reviews. Genetics Mar 2023RNA is a key regulator of almost every cellular process, and the structures adopted by RNA molecules are thought to be central to their functions. The recent fast-paced... (Review)
Review
RNA is a key regulator of almost every cellular process, and the structures adopted by RNA molecules are thought to be central to their functions. The recent fast-paced evolution of high-throughput sequencing-based RNA structure mapping methods has enabled the rapid in vivo structural interrogation of entire cellular transcriptomes. Collectively, these studies are shedding new light on the long underestimated complexity of the structural organization of the transcriptome - the RNA structurome. Moreover, recent analyses are challenging the view that the RNA structurome is a static entity by revealing how RNA molecules establish intricate networks of alternative intramolecular and intermolecular interactions and that these ensembles of RNA structures are dynamically regulated to finely tune RNA functions in living cells. This new understanding of how RNA can shape cell phenotypes has important implications for the development of RNA-targeted therapeutic strategies.
Topics: RNA; Nucleic Acid Conformation; Transcriptome; High-Throughput Nucleotide Sequencing; Sequence Analysis, RNA
PubMed: 36348050
DOI: 10.1038/s41576-022-00546-w -
BMC Bioinformatics Feb 2023Bacterial and viral infections may cause or exacerbate various human diseases and to detect microbes in tissue, one method of choice is RNA sequencing. The detection of...
BACKGROUND
Bacterial and viral infections may cause or exacerbate various human diseases and to detect microbes in tissue, one method of choice is RNA sequencing. The detection of specific microbes using RNA sequencing offers good sensitivity and specificity, but untargeted approaches suffer from high false positive rates and a lack of sensitivity for lowly abundant organisms.
RESULTS
We introduce Pathonoia, an algorithm that detects viruses and bacteria in RNA sequencing data with high precision and recall. Pathonoia first applies an established k-mer based method for species identification and then aggregates this evidence over all reads in a sample. In addition, we provide an easy-to-use analysis framework that highlights potential microbe-host interactions by correlating the microbial to the host gene expression. Pathonoia outperforms state-of-the-art methods in microbial detection specificity, both on in silico and real datasets.
CONCLUSION
Two case studies in human liver and brain show how Pathonoia can support novel hypotheses on microbial infection exacerbating disease. The Python package for Pathonoia sample analysis and a guided analysis Jupyter notebook for bulk RNAseq datasets are available on GitHub.
Topics: Humans; RNA-Seq; Algorithms; Sequence Analysis, RNA; Base Sequence; Bacteria; Metagenomics; High-Throughput Nucleotide Sequencing
PubMed: 36803415
DOI: 10.1186/s12859-023-05144-z -
PloS One 2020Microbial community profiles have been associated with a variety of traits, including methane emissions in livestock. These profiles can be difficult and expensive to...
Microbial community profiles have been associated with a variety of traits, including methane emissions in livestock. These profiles can be difficult and expensive to obtain for thousands of samples (e.g. for accurate association of microbial profiles with traits), therefore the objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (reference-based and reference-free) were compared to bacterial 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n = 236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to RE-RRS. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of laboratory and analytical method, and the relationship with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach, which accounted for 53.3±5.9% of reads, and had repeatabilities of 0.49±0.07 and 0.50±0.07 for the first two principal components (PC1 and PC2), phenotypic correlations with methane yield of 0.43±0.06 and 0.46±0.06 for PC1 and PC2, and explained 41±8% of the variation in methane yield. These results were significantly better than for bacterial 16S rRNA gene sequencing of the same samples (p<0.05) except for the correlation between PC2 and methane yield. A Sensitivity study suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, meaning the current work using 118 samples/lane and future proposed 384 samples/lane are well within that threshold. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.
Topics: Animals; Bacteria; Female; Gastrointestinal Microbiome; High-Throughput Nucleotide Sequencing; Male; Metagenome; Metagenomics; Microbiota; RNA, Ribosomal, 16S; Rumen; Sheep
PubMed: 32243481
DOI: 10.1371/journal.pone.0219882 -
Annual Review of Biomedical Data Science Aug 2023The human microbiome is complex, variable from person to person, essential for health, and related to both the risk for disease and the efficacy of our treatments. There... (Review)
Review
The human microbiome is complex, variable from person to person, essential for health, and related to both the risk for disease and the efficacy of our treatments. There are robust techniques to describe microbiota with high-throughput sequencing, and there are hundreds of thousands of already-sequenced specimens in public archives. The promise remains to use the microbiome both as a prognostic factor and as a target for precision medicine. However, when used as an input in biomedical data science modeling, the microbiome presents unique challenges. Here, we review the most common techniques used to describe microbial communities, explore these unique challenges, and discuss the more successful approaches for biomedical data scientists seeking to use the microbiome as an input in their studies.
Topics: Humans; Microbiota; Precision Medicine; High-Throughput Nucleotide Sequencing
PubMed: 37159872
DOI: 10.1146/annurev-biodatasci-020722-043017 -
Microbial Biotechnology Jan 2024The human microbiome plays a crucial role in maintaining health, with advances in high-throughput sequencing technology and reduced sequencing costs triggering a surge... (Review)
Review
The human microbiome plays a crucial role in maintaining health, with advances in high-throughput sequencing technology and reduced sequencing costs triggering a surge in microbiome research. Microbiome studies generally incorporate five key phases: design, sampling, sequencing, analysis, and reporting, with sequencing strategy being a crucial step offering numerous options. Present mainstream sequencing strategies include Amplicon sequencing, Metagenomic Next-Generation Sequencing (mNGS), and Targeted Next-Generation Sequencing (tNGS). Two innovative technologies recently emerged, namely MobiMicrobe high-throughput microbial single-cell genome sequencing technology and 2bRAD-M simplified metagenomic sequencing technology, compensate for the limitations of mainstream technologies, each boasting unique core strengths. This paper reviews the basic principles and processes of these three mainstream and two novel microbiological technologies, aiding readers in understanding the benefits and drawbacks of different technologies, thereby guiding the selection of the most suitable method for their research endeavours.
Topics: Humans; Microbiota; Metagenome; High-Throughput Nucleotide Sequencing; Metagenomics; Technology
PubMed: 37929823
DOI: 10.1111/1751-7915.14364 -
ACS Synthetic Biology Jul 2022Recombinant DNA is a fundamental tool in biotechnology and medicine. These DNA sequences are often built, replicated, and delivered in the form of plasmids. Validation...
Recombinant DNA is a fundamental tool in biotechnology and medicine. These DNA sequences are often built, replicated, and delivered in the form of plasmids. Validation of these plasmid sequences is a critical and time-consuming step, which has been dominated for the last 35 years by Sanger sequencing. As plasmid sequences grow more complex with new DNA synthesis and cloning techniques, we need new approaches that address the corresponding validation challenges at scale. Here we prototype a high-throughput plasmid sequencing approach using DNA transposition and Oxford Nanopore sequencing. Our method, Circuit-seq, creates robust, full-length, and accurate plasmid assemblies without prior knowledge of the underlying sequence. We demonstrate the power of Circuit-seq across a wide range of plasmid sizes and complexities, generating full-length, contiguous plasmid maps. We then leverage our long-read data to characterize epigenetic marks and estimate plasmid contamination levels. Circuit-seq scales to large numbers of samples at a lower per-sample cost than commercial Sanger sequencing, accelerating a key step in synthetic biology, while low equipment costs make it practical for individual laboratories.
Topics: DNA; High-Throughput Nucleotide Sequencing; Nanopore Sequencing; Sequence Analysis, DNA; Synthetic Biology
PubMed: 35695379
DOI: 10.1021/acssynbio.2c00126 -
PLoS Computational Biology Feb 2021
Topics: Computational Biology; High-Throughput Nucleotide Sequencing; Humans; Software; Terminology as Topic; Whole Genome Sequencing; Writing
PubMed: 33600404
DOI: 10.1371/journal.pcbi.1008645 -
Genomics Sep 2021Discovering copy number variation (CNV) in bacteria is not in the spotlight compared to the attention focused on CNV detection in eukaryotes. However, challenges arising...
Discovering copy number variation (CNV) in bacteria is not in the spotlight compared to the attention focused on CNV detection in eukaryotes. However, challenges arising from bacterial drug resistance bring further interest to the topic of CNV and its role in drug resistance. General CNV detection methods do not consider bacteria's features and there is space to improve detection accuracy. Here, we present a CNV detection method called CNproScan focused on bacterial genomes. CNproScan implements a hybrid approach and other bacteria-focused features and depends only on NGS data. We benchmarked our method and compared it to the previously published methods and we can resolve to achieve a higher detection rate together with providing other beneficial features, such as CNV classification. Compared with other methods, CNproScan can detect much shorter CNV events.
Topics: DNA Copy Number Variations; Eukaryota; Genome, Bacterial; High-Throughput Nucleotide Sequencing
PubMed: 34224809
DOI: 10.1016/j.ygeno.2021.06.040 -
Bioinformatics (Oxford, England) Jan 2023With the rapid expansion of the capabilities of the DNA sequencers throughout the different sequencing generations, the quantity of generated data has likewise...
SUMMARY
With the rapid expansion of the capabilities of the DNA sequencers throughout the different sequencing generations, the quantity of generated data has likewise increased. This evolution has also led to new bioinformatical methods, for which in silico data have become crucial when verifying the accuracy of a model or the robustness of a genomic analysis pipeline. Here, we present a multithreaded next-generation simulator for next-generation sequencing data (NGSNGS), which simulates reads faster than currently available methods and programs. NGSNGS can simulate reads with platform-specific characteristics based on nucleotide quality score profiles as well as including a post-mortem damage model which is relevant for simulating ancient DNA. The simulated sequences are sampled (with replacement) from a reference DNA genome, which can represent a haploid genome, polyploid assemblies or even population haplotypes and allows the user to simulate known variable sites directly. The program is implemented in a multithreading framework and is factors faster than currently available tools while extending their feature set and possible output formats.
AVAILABILITY AND IMPLEMENTATION
The method and associated programs are released as open-source software, code and user manual are available at https://github.com/RAHenriksen/NGSNGS.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Software; Genome; Genomics; High-Throughput Nucleotide Sequencing; DNA, Ancient; Sequence Analysis, DNA
PubMed: 36661298
DOI: 10.1093/bioinformatics/btad041