metagenome - OpenMD.com Journal Search

CAMISIM: simulating metagenomes and microbial communities.

Microbiome Feb 2019

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to...

Summary PubMed Full Text PDF

Authors: Adrian Fritz, Peter Hofmann, Stephan Majda...

BACKGROUND

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required.

RESULTS

We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM.

CONCLUSIONS

CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.

Topics: Algorithms; Animals; Computer Simulation; Gastrointestinal Microbiome; Humans; Metagenome; Metagenomics; Mice; Models, Biological; Sequence Analysis, DNA; Software

PubMed: 30736849
DOI: 10.1186/s40168-019-0633-6

Metagenomic binning with assembly graph embeddings.

Bioinformatics (Oxford, England) Sep 2022

Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial...

Summary PubMed Full Text PDF

Authors: Andre Lamurias, Mantas Sereika, Mads Albertsen...

MOTIVATION

Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning.

RESULTS

We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning.

AVAILABILITY AND IMPLEMENTATION

GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Topics: Sequence Analysis, DNA; Metagenomics; Metagenome; Genome, Microbial; Algorithms

PubMed: 35972375
DOI: 10.1093/bioinformatics/btac557

Focus on Metagenomics.

Journal of Biomolecular Techniques : JBT Apr 2017

Summary PubMed Full Text PDF

Authors: Christopher E Mason, Scott Tighe

Topics: DNA; High-Throughput Nucleotide Sequencing; Humans; Metagenome; Metagenomics; RNA

PubMed: 28400709
DOI: 10.7171/jbt.17-2801-010

Adversarial and variational autoencoders improve metagenomic binning.

Communications Biology Oct 2023

Assembly of reads from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct...

Summary PubMed Full Text PDF

Authors: Pau Piera Líndez, Joachim Johansen, Svetlana Kutuzova...

Assembly of reads from metagenomic samples is a hard problem, often resulting in highly fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by re-grouping the sequences by their organism of origin, thus representing a crucial processing step when exploring the biological diversity of metagenomic samples. Here we present Adversarial Autoencoders for Metagenomics Binning (AAMB), an ensemble deep learning approach that integrates sequence co-abundances and tetranucleotide frequencies into a common denoised space that enables precise clustering of sequences into microbial genomes. When benchmarked, AAMB presented similar or better results compared with the state-of-the-art reference-free binner VAMB, reconstructing ~7% more near-complete (NC) genomes across simulated and real data. In addition, genomes reconstructed using AAMB had higher completeness and greater taxonomic diversity compared with VAMB. Finally, we implemented a pipeline Integrating VAMB and AAMB that enabled improved binning, recovering 20% and 29% more simulated and real NC genomes, respectively, compared to VAMB, with moderate additional runtime.

Topics: Metagenome; Genome, Microbial; Metagenomics; Cluster Analysis; Benchmarking

PubMed: 37865678
DOI: 10.1038/s42003-023-05452-3

Recovering complete and draft population genomes from metagenome datasets.

Microbiome Mar 2016

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating... (Review)

Summary PubMed Full Text PDF

Review

Authors: Naseer Sangwan, Fangfang Xia, Jack A Gilbert...

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

Topics: Contig Mapping; Datasets as Topic; Genome, Microbial; Metagenome; Metagenomics; Sequence Analysis, DNA

PubMed: 26951112
DOI: 10.1186/s40168-016-0154-5

Trait biases in microbial reference genomes.

Scientific Data Feb 2023

Common culturing techniques and priorities bias our discovery towards specific traits that may not be representative of microbial diversity in nature. So far, these...

Summary PubMed Full Text PDF

Authors: Sage Albright, Stilianos Louca

Common culturing techniques and priorities bias our discovery towards specific traits that may not be representative of microbial diversity in nature. So far, these biases have not been systematically examined. To address this gap, here we use 116,884 publicly available metagenome-assembled genomes (MAGs, completeness ≥80%) from 203 surveys worldwide as a culture-independent sample of bacterial and archaeal diversity, and compare these MAGs to the popular RefSeq genome database, which heavily relies on cultures. We compare the distribution of 12,454 KEGG gene orthologs (used as trait proxies) in the MAGs and RefSeq genomes, while controlling for environment type (ocean, soil, lake, bioreactor, human, and other animals). Using statistical modeling, we then determine the conditional probabilities that a species is represented in RefSeq depending on its genetic repertoire. We find that the majority of examined genes are significantly biased for or against in RefSeq. Our systematic estimates of gene prevalences across bacteria and archaea in nature and gene-specific biases in reference genomes constitutes a resource for addressing these issues in the future.

Topics: Animals; Archaea; Bacteria; Genome, Microbial; Metagenome; Metagenomics

PubMed: 36759614
DOI: 10.1038/s41597-023-01994-7

Deep learning methods in metagenomics: a review.

Microbial Genomics Apr 2024

The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most... (Review)

Summary PubMed Full Text PDF

Review

Authors: Gaspar Roy, Edi Prifti, Eugeni Belda...

The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.

Topics: Humans; Deep Learning; Microbiota; Metagenome; Gastrointestinal Microbiome; Metagenomics

PubMed: 38630611
DOI: 10.1099/mgen.0.001231

Recent Advances in Function-based Metagenomic Screening.

Genomics, Proteomics & Bioinformatics Dec 2018

Metagenomes from uncultured microorganisms are rich resources for novel enzyme genes. The methods used to screen the metagenomic libraries fall into two categories,... (Review)

Summary PubMed Full Text PDF

Review

Authors: Tanyaradzwa Rodgers Ngara, Houjin Zhang

Metagenomes from uncultured microorganisms are rich resources for novel enzyme genes. The methods used to screen the metagenomic libraries fall into two categories, which are based on sequence or function of the enzymes. The sequence-based approaches rely on the known sequences of the target gene families. In contrast, the function-based approaches do not involve the incorporation of metagenomic sequencing data and, therefore, may lead to the discovery of novel gene sequences with desired functions. In this review, we discuss the function-based screening strategies that have been used in the identification of enzymes from metagenomes. Because of its simplicity, agar plate screening is most commonly used in the identification of novel enzymes with diverse functions. Other screening methods with higher sensitivity are also employed, such as microtiter plate screening. Furthermore, several ultra-high-throughput methods were developed to deal with large metagenomic libraries. Among these are the FACS-based screening, droplet-based screening, and the in vivo reporter-based screening methods. The application of these novel screening strategies has increased the chance for the discovery of novel enzyme genes.

Topics: Animals; Bacteria; Enzymes; Gene Library; High-Throughput Screening Assays; Metagenome; Metagenomics; Plants

PubMed: 30597257
DOI: 10.1016/j.gpb.2018.01.002

Extraction of CRISPR-targeted sequences from the metagenome.

STAR Protocols Sep 2022

Homology-based search is commonly used to uncover mobile genetic elements (MGEs) from metagenomes, but it heavily relies on reference genomes in the database. Here we...

Summary PubMed Full Text PDF

Authors: Ryota Sugimoto, Luca Nishimura, Phuong Thanh Nguyen...

Homology-based search is commonly used to uncover mobile genetic elements (MGEs) from metagenomes, but it heavily relies on reference genomes in the database. Here we introduce a protocol to extract CRISPR-targeted sequences from the assembled human gut metagenomic sequences without using a reference database. We describe the assembling of metagenome contigs, the extraction of CRISPR direct repeats and spacers, the discovery of protospacers, and the extraction of protospacer-enriched regions using the graph-based approach. This protocol could extract numerous characterized/uncharacterized MGEs. For complete details on the use and execution of this protocol, please refer to Sugimoto et al. (2021).

Topics: Base Sequence; Clustered Regularly Interspaced Short Palindromic Repeats; Humans; Metagenome; Metagenomics

PubMed: 35780428
DOI: 10.1016/j.xpro.2022.101525

Validation of a Metagenomic Next-Generation Sequencing Assay for Lower Respiratory Pathogen Detection.

Microbiology Spectrum Feb 2023

Lower respiratory infection (LRI) is the most fatal communicable disease, with only a few pathogens identified. Metagenomic next-generation sequencing (mNGS), as an...

Summary PubMed Full Text PDF

Authors: Zhenli Diao, Huiying Lai, Dongsheng Han...

Lower respiratory infection (LRI) is the most fatal communicable disease, with only a few pathogens identified. Metagenomic next-generation sequencing (mNGS), as an unbiased, hypothesis-free, and culture-independent method, theoretically enables the detection of all pathogens in a single test. In this study, we developed and validated a DNA-based mNGS method for the diagnosis of LRIs from bronchoalveolar lavage fluid (BALF). We prepared simulated data sets and published raw data sets from patients to evaluate the performance of our in-house bioinformatics pipeline and compared it with the popular metagenomics pipeline Kraken2-Bracken. In addition, a series of biological microbial communities were used to comprehensively validate the performance of our mNGS assay. Sixty-nine clinical BALF samples were used for clinical validation to determine the accuracy. The in-house bioinformatics pipeline validation showed a recall of 88.03%, precision of 99.14%, and F1 score of 92.26% via single-genome simulated data. Mock microbial community and clinical metagenomic data showed that the in-house pipeline has a stricter cutoff value than Kraken2-Bracken, which could prevent false-positive detection by the bioinformatics pipeline. The validation for the whole mNGS pipeline revealed that overwhelming human DNA, long-term storage at 4°C, and repeated freezing-thawing reduced the analytical sensitivity of the assay. The mNGS assay showed a sensitivity of 95.18% and specificity of 91.30% for pathogen detection from BALF samples. This study comprehensively demonstrated the analytical performance of this laboratory-developed mNGS assay for pathogen detection from BALF, which contributed to the standardization of this technology. To our knowledge, this study is the first to comprehensively validate the mNGS assay for the diagnosis of LRIs from BALF. This study exhibited a ready-made example for clinical laboratories to prepare reference materials and develop comprehensive validation schemes for their in-house mNGS assays, which would accelerate the standardization of mNGS testing.

Topics: Humans; Metagenome; Respiratory Tract Infections; Microbiota; High-Throughput Nucleotide Sequencing; Metagenomics

PubMed: 36507666
DOI: 10.1128/spectrum.03812-22