-
Microbial Genomics Nov 2020Accumulating evidence suggests that humans could be considered as holobionts in which the gut microbiota play essential functions. Initial metagenomic studies reported a...
Accumulating evidence suggests that humans could be considered as holobionts in which the gut microbiota play essential functions. Initial metagenomic studies reported a pattern of shared genes in the gut microbiome of different individuals, leading to the definition of the minimal gut metagenome as the set of microbial genes necessary for homeostasis and present in all healthy individuals. This study analyses the minimal gut metagenome of the most comprehensive dataset available, including individuals from agriculturalist and industrialist societies, also embodying highly diverse ethnic and geographical backgrounds. The outcome, based on metagenomic predictions for community composition data, resulted in a minimal metagenome comprising 3412 genes, mapping to 1856 reactions and 128 metabolic pathways predicted to occur across all individuals. These results were substantiated by the analysis of two additional datasets describing the microbial community compositions of larger Western cohorts, as well as a substantial shotgun metagenomics dataset. Subsequent analyses showed the plausible metabolic complementarity provided by the minimal gut metagenome to the human genome.
Topics: Bacteria; Female; Gastrointestinal Microbiome; Humans; Male; Metagenome; Metagenomics; RNA, Ribosomal, 16S
PubMed: 33141656
DOI: 10.1099/mgen.0.000466 -
BMC Genomics Aug 2022Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure...
BACKGROUND
Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure is complex because of the numerous tools currently available. The aim of this research was to summarize the results of crowdsourced sbv IMPROVER Microbiomics Challenge designed to evaluate the performance of off-the-shelf metagenomics software as well as to investigate the robustness of these results by the extended post-challenge analysis. In total 21 off-the-shelf taxonomic metagenome profiling pipelines were benchmarked for their capacity to identify the microbiome composition at various taxon levels across 104 shotgun metagenomics datasets of bacterial genomes (representative of various microbiome samples) from public databases. Performance was determined by comparing predicted taxonomy profiles with the gold standard.
RESULTS
Most taxonomic profilers performed homogeneously well at the phylum level but generated intermediate and heterogeneous scores at the genus and species levels, respectively. kmer-based pipelines using Kraken with and without Bracken or using CLARK-S performed best overall, but they exhibited lower precision than the two marker-gene-based methods MetaPhlAn and mOTU. Filtering out the 1% least abundance species-which were not reliably predicted-helped increase the performance of most profilers by increasing precision but at the cost of recall. However, the use of adaptive filtering thresholds determined from the sample's Shannon index increased the performance of most kmer-based profilers while mitigating the tradeoff between precision and recall.
CONCLUSIONS
kmer-based metagenomic pipelines using Kraken/Bracken or CLARK-S performed most robustly across a large variety of microbiome datasets. Removing non-reliably predicted low-abundance species by using diversity-dependent adaptive filtering thresholds further enhanced the performance of these tools. This work demonstrates the applicability of computational pipelines for accurately determining taxonomic profiles in clinical and environmental contexts and exemplifies the power of crowdsourcing for unbiased evaluation.
Topics: Benchmarking; Crowdsourcing; Metagenome; Metagenomics; Software
PubMed: 36042406
DOI: 10.1186/s12864-022-08803-2 -
Nature Communications Apr 2022Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped...
Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.
Topics: Algorithms; Metagenome; Metagenomics; Microbiota; Neural Networks, Computer
PubMed: 35484115
DOI: 10.1038/s41467-022-29843-y -
Journal of Hazardous Materials Jul 2024Groundwater (GW) quality monitoring is vital for sustainable water resource management. The present study introduced a metagenome-derived machine learning (ML) model...
Groundwater (GW) quality monitoring is vital for sustainable water resource management. The present study introduced a metagenome-derived machine learning (ML) model aimed at enhancing the predictive understanding and diagnostic interpretation of GW pollution associated with petroleum. In this framework, taxonomic and metabolic profiles derived from GW metagenomes were combined for use as the input dataset. By employing strategies that optimized data integration, model selection, and parameter tuning, we achieved a significant increase in diagnostic accuracy for petroleum-polluted GW. Explanatory artificial intelligence techniques identified petroleum degradation pathways and Rhodocyclaceae as strong predictors of a pollution diagnosis. Metagenomic analysis corroborated the presence of gene operons encoding aminobenzoate and xylene biodegradation within the de novo assembled genome of Rhodocyclaceae. Our genome-centric metagenomic analysis thus clarified the ecological interactions associated with microbiomes in breaking down petroleum contaminants, validating the ML-based diagnostic results. This metagenome-derived ML framework not only enhances the predictive diagnosis of petroleum pollution but also offers interpretable insights into the interaction between microbiomes and petroleum. The proposed ML framework demonstrates great promise for use as a science-based strategy for the on-site monitoring and remediation of GW pollution.
Topics: Water Pollutants, Chemical; Metagenome; Groundwater; Petroleum; Artificial Intelligence; Environmental Monitoring; Machine Learning; Biodegradation, Environmental; Petroleum Pollution; Metagenomics; Microbiota
PubMed: 38735183
DOI: 10.1016/j.jhazmat.2024.134513 -
Cardiovascular Research Feb 2021
Topics: Metagenome; Metagenomics; Microbiota
PubMed: 32569375
DOI: 10.1093/cvr/cvaa175 -
BMC Bioinformatics Nov 2022The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or...
BACKGROUND
The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become too large to be assembled on a single computer node, and there is no reliable method to predict the memory requirement due to data-specific memory consumption pattern. Currently, out-of-memory (OOM) is one of the most prevalent factors that causes metagenome assembly failures.
RESULTS
In this study, we explored the possibility of using Persistent Memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers. We evaluated the execution time and memory usage of three popular metagenome assemblers (MetaSPAdes, MEGAHIT, and MetaHipMer2) in datasets up to one terabase. We found that PMem can enable metagenome assemblers on terabyte-sized datasets by partially or fully substituting DRAM. Depending on the configured DRAM/PMEM ratio, running metagenome assemblies with PMem can achieve a similar speed as DRAM, while in the worst case it showed a roughly two-fold slowdown. In addition, different assemblers displayed distinct memory/speed trade-offs in the same hardware/software environment.
CONCLUSIONS
We demonstrated that PMem is capable of expanding the capacity of DRAM to allow larger metagenome assembly with a potential tradeoff in speed. Because PMem can be used directly without any application-specific code modification, these findings are likely to be generalized to other memory-intensive bioinformatics applications.
Topics: Metagenome; Metagenomics; Microbiota; Software; Computational Biology
PubMed: 36451083
DOI: 10.1186/s12859-022-05052-8 -
Journal of Biotechnology Oct 2009Environmental DNA is an extremely rich source of genes encoding enzymes with novel biocatalytic activities. To tap this source, function-based and sequence-based... (Review)
Review
Environmental DNA is an extremely rich source of genes encoding enzymes with novel biocatalytic activities. To tap this source, function-based and sequence-based strategies have been established to isolate, clone, and express these novel metagenome-derived genes. Sequence-based strategies, which rely on PCR with consensus primers and genome walking, represent an efficient and inexpensive alternative to activity-based screening of recombinant strains harbouring fragments of environmental DNA. This review covers the diverse array of genome-walking techniques, which were originally developed for genomic DNA and currently are also used for PCR-based recovery of entire genes from the metagenome. These sequence-based gene mining methods appear to offer a powerful tool for retrieving from the metagenome novel genes encoding biocatalysts with potential applications in biotechnology.
Topics: Chromosome Walking; DNA; Environment; Genes; Metagenome; Metagenomics; Polymerase Chain Reaction
PubMed: 19712711
DOI: 10.1016/j.jbiotec.2009.08.013 -
Journal of Biotechnology Jul 2016Metagenomes constitute a major source for the identification of novel enzymes for industrial applications. However, current functional screening methods are hindered by...
Metagenomes constitute a major source for the identification of novel enzymes for industrial applications. However, current functional screening methods are hindered by the limited transcription efficiency of foreign metagenomic genes. To overcome this constraint, we introduced the 'Enforced Transcription' technique, which involves the random insertion of the bi-directional T7 promoter into a metagenomic fosmid library. Then the effect of enforced transcription was quantitatively assessed by screening for metagenomic lipolytic genes encoding enzymes whose catalytic activity forms halos on tributyrin agar plates. The metagenomic library containing the enforced transcription system yielded a significantly increased number of screening hits with lipolytic activity compared to the library without random T7 promoter insertions. Additional sequence analysis revealed that the hits from the enforced transcription library had greater genetic diversity than those from the original metagenome library. Enhancing heterologous expression using the T7 promoter should enable the identification of greater numbers of diverse novel biocatalysts from the metagenome than possible using conventional metagenome screening approaches.
Topics: Bacteriophage T7; Cloning, Molecular; Esterases; Lipase; Metagenome; Metagenomics; Promoter Regions, Genetic; Transcription, Genetic
PubMed: 27239964
DOI: 10.1016/j.jbiotec.2016.05.018 -
Briefings in Bioinformatics Jul 2022Viruses are ubiquitous in humans and various environments and continually mutate themselves. Identifying viruses in an environment without cultivation is challenging;...
Viruses are ubiquitous in humans and various environments and continually mutate themselves. Identifying viruses in an environment without cultivation is challenging; however, promoting the screening of novel viruses and expanding the knowledge of viral space is essential. Homology-based methods that identify viruses using known viral genomes rely on sequence alignments, making it difficult to capture remote homologs of the known viruses. To accurately capture viral signals from metagenomic samples, models are needed to understand the patterns encoded in the viral genomes. In this study, we developed a hierarchical BERT model named ViBE to detect eukaryotic viruses from metagenome sequencing data and classify them at the order level. We pre-trained ViBE using read-like sequences generated from the virus reference genomes and derived three fine-tuned models that classify paired-end reads to orders for eukaryotic deoxyribonucleic acid viruses and eukaryotic ribonucleic acid viruses. ViBE achieved higher recall than state-of-the-art alignment-based methods while maintaining comparable precision. ViBE outperformed state-of-the-art alignment-free methods for all test cases. The performance of ViBE was also verified using real sequencing datasets, including the vaginal virome.
Topics: Eukaryota; Humans; Metagenome; Metagenomics; Sequence Alignment; Viruses
PubMed: 35667011
DOI: 10.1093/bib/bbac204 -
Journal of Biomolecular Techniques : JBT Apr 2017
Topics: DNA; High-Throughput Nucleotide Sequencing; Humans; Metagenome; Metagenomics; RNA
PubMed: 28400709
DOI: 10.7171/jbt.17-2801-010