-
BMC Bioinformatics May 2022FragGeneScan is currently the most accurate and popular tool for gene prediction in short and error-prone reads, but its execution speed is insufficient for use on...
BACKGROUND
FragGeneScan is currently the most accurate and popular tool for gene prediction in short and error-prone reads, but its execution speed is insufficient for use on larger data sets. The parallelization which should have addressed this is inefficient. Its alternative implementation FragGeneScan+ is faster, but introduced a number of bugs related to memory management, race conditions and even output accuracy.
RESULTS
This paper introduces FragGeneScanRs, a faster Rust implementation of the FragGeneScan gene prediction model. Its command line interface is backward compatible and adds extra features for more flexible usage. Its output is equivalent to the original FragGeneScan implementation.
CONCLUSIONS
Compared to the current C implementation, shotgun metagenomic reads are processed up to 22 times faster using a single thread, with better scaling for multithreaded execution. The Rust code of FragGeneScanRs is freely available from GitHub under the GPL-3.0 license with instructions for installation, usage and other documentation ( https://github.com/unipept/FragGeneScanRs ).
Topics: Algorithms; Metagenome; Metagenomics; Software
PubMed: 35643462
DOI: 10.1186/s12859-022-04736-5 -
Current Issues in Molecular Biology 2017Surveys of environmental microbial communities using metagenomic approach produce vast volumes of multidimensional data regarding the phylogenetic and functional... (Review)
Review
Surveys of environmental microbial communities using metagenomic approach produce vast volumes of multidimensional data regarding the phylogenetic and functional composition of the microbiota. Faced with such complex data, a metagenomic researcher needs to select the means for data analysis properly. Data visualization became an indispensable part of the exploratory data analysis and serves a key to the discoveries. While the molecular-genetic analysis of even a single bacterium presents multiple layers of data to be properly displayed and perceived, the studies of microbiota are significantly more challenging. Here we present a review of the state-of-art methods for the visualization of metagenomic data in a multi-level manner: from the methods applicable to an in-depth analysis of a single metagenome to the techniques appropriate for large-scale studies containing hundreds of environmental samples.
Topics: Bacteria; Computer Graphics; Databases, Genetic; Metagenome; Metagenomics; Microbiota
PubMed: 28686567
DOI: 10.21775/cimb.024.037 -
Microbiome Feb 2019Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to...
BACKGROUND
Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required.
RESULTS
We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM.
CONCLUSIONS
CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.
Topics: Algorithms; Animals; Computer Simulation; Gastrointestinal Microbiome; Humans; Metagenome; Metagenomics; Mice; Models, Biological; Sequence Analysis, DNA; Software
PubMed: 30736849
DOI: 10.1186/s40168-019-0633-6 -
Bioinformatics (Oxford, England) Oct 2022Shotgun metagenomic sequencing provides the capacity to understand microbial community structure and function at unprecedented resolution; however, the current...
SUMMARY
Shotgun metagenomic sequencing provides the capacity to understand microbial community structure and function at unprecedented resolution; however, the current analytical methods are constrained by a focus on taxonomic classifications that may obfuscate functional relationships. Here, we present expam, a tree-based, taxonomy agnostic tool for the identification of biologically relevant clades from shotgun metagenomic sequencing.
AVAILABILITY AND IMPLEMENTATION
expam is an open-source Python application released under the GNU General Public Licence v3.0. expam installation instructions, source code and tutorials can be found at https://github.com/seansolari/expam.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metagenome; Metagenomics; Microbiota; Software
PubMed: 36029242
DOI: 10.1093/bioinformatics/btac591 -
Bioinformatics (Oxford, England) Dec 2022Recovery of metagenome-assembled genomes (MAGs) from shotgun metagenomic data is an important task for the comprehensive analysis of microbial communities from variable...
MOTIVATION
Recovery of metagenome-assembled genomes (MAGs) from shotgun metagenomic data is an important task for the comprehensive analysis of microbial communities from variable sources. Single binning tools differ in their ability to leverage specific aspects in MAG reconstruction, the use of ensemble binning refinement tools is often time consuming and computational demand increases with community complexity. We introduce MAGScoT, a fast, lightweight and accurate implementation for the reconstruction of highest-quality MAGs from the output of multiple genome-binning tools.
RESULTS
MAGScoT outperforms popular bin-refinement solutions in terms of quality and quantity of MAGs as well as computation time and resource consumption.
AVAILABILITY AND IMPLEMENTATION
MAGScoT is available via GitHub (https://github.com/ikmb/MAGScoT) and as an easy-to-use Docker container (https://hub.docker.com/repository/docker/ikmb/magscot).
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Algorithms; Metagenomics; Metagenome; Microbiota
PubMed: 36264141
DOI: 10.1093/bioinformatics/btac694 -
Medecine Sciences : M/S Nov 2016After more than one and a half century, i.e. since Louis Pasteur work on microbes, fermentation, and diseases, biological science has made a giant step in bacteria... (Review)
Review
After more than one and a half century, i.e. since Louis Pasteur work on microbes, fermentation, and diseases, biological science has made a giant step in bacteria knowledge. Thanks to an ultra-powerful "microscope", i.e. ultra-fast DNA sequencing, scientists have been able to read and group within a catalog over the last decade, the gene code of bacteria, i.e. the metagenome at the surface of our epithelia. More recently, live bacteria within adipose tissue, defining a tissue microbiota, as well as bacterial fragments such as DNA within the liver, the brain and the blood have been identified. Metagenomic analyses from large cohorts of patients have uncovered tight correlations between bacterial genes within our intestine and mouth and diseases such as metabolic diseases, diabetes, obesity, some liver diseases, kidney and heart failure as well as vascular diseases. Some causal mechanisms have been proposed in rodents and can set the soil for novel therapeutic strategies that could interfere with both the microbes and the corresponding host targets.
Topics: Animals; Gastrointestinal Tract; Humans; Metabolic Diseases; Metagenome; Metagenomics; Microbiota
PubMed: 28008835
DOI: 10.1051/medsci/20163211010 -
Microbiology Spectrum Dec 2022Antibiotic resistance genes (ARGs) pose a serious threat to public health and ecological security in the 21st century. However, the resistome only accounts for a tiny...
Antibiotic resistance genes (ARGs) pose a serious threat to public health and ecological security in the 21st century. However, the resistome only accounts for a tiny fraction of metagenomic content, which makes it difficult to investigate low-abundance ARGs in various environmental settings. Thus, a highly sensitive, accurate, and comprehensive method is needed to describe ARG profiles in complex metagenomic samples. In this study, we established a high-throughput sequencing method based on targeted amplification, which could simultaneously detect ARGs ( = 251), mobile genetic element genes ( = 8), and metal resistance genes ( = 19) in metagenomes. The performance of amplicon sequencing was compared with traditional metagenomic shotgun sequencing (MetaSeq). A total of 1421 primer pairs were designed, achieving extremely high coverage of target genes. The amplicon sequencing significantly improved the recovery of target ARGs (~9 × 10-fold), with higher sensitivity and diversity, less cost, and computation burden. Furthermore, targeted enrichment allows deep scanning of single nucleotide polymorphisms (SNPs), and elevated SNPs detection was shown in this study. We further performed this approach for 48 environmental samples (37 feces, 20 soils, and 7 sewage) and 16 clinical samples. All samples tested in this study showed high diversity and recovery of targeted genes. Our results demonstrated that the approach could be applied to various metagenomic samples and served as an efficient tool in the surveillance and evolution assessment of ARGs. Access to the resistome using the enrichment method validated in this study enabled the capture of low-abundance resistomes while being less costly and time-consuming, which can greatly advance our understanding of local and global resistome dynamics. ARGs, an increasing global threat to human health, can be transferred into health-related microorganisms in the environment by horizontal gene transfer, posing a serious threat to public health. Advancing profiling methods are needed for monitoring and predicting the potential risks of ARGs in metagenomes. Our study described a customized amplicon sequencing assay that could enable a high-throughput, targeted, in-depth analysis of ARGs and detect a low-abundance portion of resistomes. This method could serve as an efficient tool to assess the variation and evolution of specific ARGs in the clinical and natural environment.
Topics: Humans; Metagenome; Genes, Bacterial; Anti-Bacterial Agents; Drug Resistance, Microbial; Sewage; Metagenomics
PubMed: 36287061
DOI: 10.1128/spectrum.02297-22 -
BMC Bioinformatics Oct 2022In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of...
BACKGROUND
In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read.
RESULTS
Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy.
CONCLUSIONS
The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
Topics: Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Software; Metagenomics; Metagenome; Algorithms
PubMed: 36316646
DOI: 10.1186/s12859-022-05014-0 -
BMC Genomics Sep 2022To identify operational taxonomy units (OTUs) signaling disease onset in an observational study, a powerful strategy was selecting participants by matched sets and... (Observational Study)
Observational Study
BACKGROUND
To identify operational taxonomy units (OTUs) signaling disease onset in an observational study, a powerful strategy was selecting participants by matched sets and profiling temporal metagenomes, followed by trajectory analysis. Existing trajectory analyses modeled individual OTU or microbial community without adjusting for the within-community correlation and matched-set-specific latent factors.
RESULTS
We proposed a joint model with matching and regularization (JMR) to detect OTU-specific trajectory predictive of host disease status. The between- and within-matched-sets heterogeneity in OTU relative abundance and disease risk were modeled by nested random effects. The inherent negative correlation in microbiota composition was adjusted by incorporating and regularizing the top-correlated taxa as longitudinal covariate, pre-selected by Bray-Curtis distance and elastic net regression. We designed a simulation pipeline to generate true biomarkers for disease onset and the pseudo biomarkers caused by compositionality. We demonstrated that JMR effectively controlled the false discovery and pseudo biomarkers in a simulation study generating temporal high-dimensional metagenomic counts with random intercept or slope. Application of the competing methods in the simulated data and the TEDDY cohort showed that JMR outperformed the other methods and identified important taxa in infants' fecal samples with dynamics preceding host disease status.
CONCLUSION
Our method JMR is a robust framework that models taxon-specific trajectory and host disease status for matched participants without transformation of relative abundance, improving the power of detecting disease-associated microbial features in certain scenarios. JMR is available in R package mtradeR at https://github.com/qianli10000/mtradeR.
Topics: Cohort Studies; Feces; Humans; Metagenome; Metagenomics; Microbiota
PubMed: 36123651
DOI: 10.1186/s12864-022-08890-1 -
Bioinformatics (Oxford, England) Apr 2021Metagenomics refers to the study of complex samples containing of genetic contents of multiple individual organisms and, thus, has been used to elucidate the microbiome...
MOTIVATION
Metagenomics refers to the study of complex samples containing of genetic contents of multiple individual organisms and, thus, has been used to elucidate the microbiome and resistome of a complex sample. The microbiome refers to all microbial organisms in a sample, and the resistome refers to all of the antimicrobial resistance (AMR) genes in pathogenic and non-pathogenic bacteria. Single-nucleotide polymorphisms (SNPs) can be effectively used to 'fingerprint' specific organisms and genes within the microbiome and resistome and trace their movement across various samples. However, to effectively use these SNPs for this traceability, a scalable and accurate metagenomics SNP caller is needed. Moreover, such an SNP caller should not be reliant on reference genomes since 95% of microbial species is unculturable, making the determination of a reference genome extremely challenging. In this article, we address this need.
RESULTS
We present LueVari, a reference-free SNP caller based on the read-colored de Bruijn graph, an extension of the traditional de Bruijn graph that allows repeated regions longer than the k-mer length and shorter than the read length to be identified unambiguously. LueVari is able to identify SNPs in both AMR genes and chromosomal DNA from shotgun metagenomics data with reliable sensitivity (between 91% and 99%) and precision (between 71% and 99%) as the performance of competing methods varies widely. Furthermore, we show that LueVari constructs sequences containing the variation, which span up to 97.8% of genes in datasets, which can be helpful in detecting distinct AMR genes in large metagenomic datasets.
AVAILABILITY AND IMPLEMENTATION
Code and datasets are publicly available at https://github.com/baharpan/cosmo/tree/LueVari.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Algorithms; Metagenome; Metagenomics; Polymorphism, Single Nucleotide; Sequence Analysis, DNA; Software
PubMed: 32049324
DOI: 10.1093/bioinformatics/btaa081