-
Microbial Genomics Apr 2024The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most... (Review)
Review
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Topics: Humans; Deep Learning; Microbiota; Metagenome; Gastrointestinal Microbiome; Metagenomics
PubMed: 38630611
DOI: 10.1099/mgen.0.001231 -
STAR Protocols Sep 2022Homology-based search is commonly used to uncover mobile genetic elements (MGEs) from metagenomes, but it heavily relies on reference genomes in the database. Here we...
Homology-based search is commonly used to uncover mobile genetic elements (MGEs) from metagenomes, but it heavily relies on reference genomes in the database. Here we introduce a protocol to extract CRISPR-targeted sequences from the assembled human gut metagenomic sequences without using a reference database. We describe the assembling of metagenome contigs, the extraction of CRISPR direct repeats and spacers, the discovery of protospacers, and the extraction of protospacer-enriched regions using the graph-based approach. This protocol could extract numerous characterized/uncharacterized MGEs. For complete details on the use and execution of this protocol, please refer to Sugimoto et al. (2021).
Topics: Base Sequence; Clustered Regularly Interspaced Short Palindromic Repeats; Humans; Metagenome; Metagenomics
PubMed: 35780428
DOI: 10.1016/j.xpro.2022.101525 -
Molecules (Basel, Switzerland) May 2021Microorganisms are highly regarded as a prominent source of natural products that have significant importance in many fields such as medicine, farming, environmental... (Review)
Review
Microorganisms are highly regarded as a prominent source of natural products that have significant importance in many fields such as medicine, farming, environmental safety, and material production. Due to this, only tiny amounts of microorganisms can be cultivated under standard laboratory conditions, and the bulk of microorganisms in the ecosystems are still unidentified, which restricts our knowledge of uncultured microbial metabolism. However, they could hypothetically provide a large collection of innovative natural products. Culture-independent metagenomics study has the ability to address core questions in the potential of NP production by cloning and analysis of microbial DNA derived directly from environmental samples. Latest advancements in next generation sequencing and genetic engineering tools for genome assembly have broadened the scope of metagenomics to offer perspectives into the life of uncultured microorganisms. In this review, we cover the methods of metagenomic library construction, and heterologous expression for the exploration and development of the environmental metabolome and focus on the function-based metagenomics, sequencing-based metagenomics, and single-cell metagenomics of uncultured microorganisms.
Topics: Bacteria; Biological Products; Ecosystem; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics
PubMed: 34067778
DOI: 10.3390/molecules26102977 -
STAR Protocols Sep 2023The analysis of metagenomic data obtained via high-throughput DNA sequencing is primarily carried out by a dedicated binning process involving clustering contigs,...
The analysis of metagenomic data obtained via high-throughput DNA sequencing is primarily carried out by a dedicated binning process involving clustering contigs, presumably belonging to the same species. Here, we present a protocol for improving the quality of binning using BinSPreader. We describe steps for typical metagenome assembly and binning workflow. We then detail binning refining, its variants, output, and possible caveats. This protocol optimizes the process of reconstructing more complete genomes of microorganisms that make up the metagenome. For complete details on the use and execution of this protocol, please refer to Tolstoganov et al..
Topics: Metagenome; Sequence Analysis, DNA; Metagenomics; Cluster Analysis; High-Throughput Nucleotide Sequencing
PubMed: 37405923
DOI: 10.1016/j.xpro.2023.102417 -
BMC Bioinformatics May 2020Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of...
BACKGROUND
Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of interactions of microbes in environmental samples. To understand their environments more deeply, the composition of microorganisms in environmental samples has been studied using metagenomes, which are the collections of genomes of the microorganisms. Although many tools have been developed for taxonomy analysis based on different algorithms, variability of analysis outputs of existing tools from the same input metagenome datasets is the main obstacle for many researchers in this field.
RESULTS
Here, we present a novel meta-analysis tool for metagenome taxonomy analysis, called TAMA, by intelligently integrating outputs from three different taxonomy analysis tools. Using an integrated reference database, TAMA performs taxonomy assignment for input metagenome reads based on a meta-score by integrating scores of taxonomy assignment from different taxonomy classification tools. TAMA outperformed existing tools when evaluated using various benchmark datasets. It was also successfully applied to obtain relative species abundance profiles and difference in composition of microorganisms in two types of cheese metagenome and human gut metagenome.
CONCLUSION
TAMA can be easily installed and used for metagenome read classification and the prediction of relative species abundance from multiple numbers and types of metagenome read samples. TAMA can be used to more accurately uncover the composition of microorganisms in metagenome samples collected from various environments, especially when the use of a single taxonomy analysis tool is unreliable. TAMA is an open source tool, and can be downloaded at https://github.com/jkimlab/TAMA.
Topics: Bacteria; Classification; Databases, Genetic; Datasets as Topic; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Models, Genetic; Phylogeny
PubMed: 32397982
DOI: 10.1186/s12859-020-3533-7 -
Briefings in Functional Genomics Jan 2023Viruses are the most abundant infectious agents on earth, and they infect living organisms such as bacteria, plants and animals, among others. They play an important...
Viruses are the most abundant infectious agents on earth, and they infect living organisms such as bacteria, plants and animals, among others. They play an important role in the balance of different ecosystems by modulating microbial populations. In humans, they are responsible for some common diseases and may cause severe illnesses. Viral metagenomic studies have become essential and offer the possibility to understand and extend the knowledge of virus diversity and functionality. For these approaches, an essential step is the classification of viral sequences. In this work, 11 taxonomic classification tools were compared by analysing their performances, in terms of sensitivity and precision, to classify reads at the species and family levels using the same (viral and nonviral) datasets and evaluation metrics, as well as their processing times and memory requirements. The results showed that factors such as richness (numbers of viral species in samples), taxonomic level in the classification and read length influence tool performance. High values of viral richness in samples decreased the performances of most tools. Additionally, the classifications were better at higher taxonomic levels, such as families, compared to lower taxonomic levels, such as species, and were more evident in short reads. The results also indicated that BLAST and Kraken2 were the best tools for classifying all types of reads, while FastViromeExplorer and VirusFinder were only good when used for long reads and Centrifuge, DIAMOND, and One Codex when used for short reads. Regarding nonviral datasets (human and bacterial), all tools correctly classified them as nonviral.
Topics: Humans; Ecosystem; Viruses; Bacteria; Metagenome; Metagenomics; High-Throughput Nucleotide Sequencing
PubMed: 36335985
DOI: 10.1093/bfgp/elac036 -
Bioinformatics (Oxford, England) May 2020Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging...
MOTIVATION
Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.
RESULTS
We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.
CONCLUSIONS
DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.
AVAILABILITY AND IMPLEMENTATION
DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Bacteria; Computer Simulation; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 32096824
DOI: 10.1093/bioinformatics/btaa124 -
Current Issues in Molecular Biology 2019Methanotrophic microorganisms utilize methane as an electron donor and a carbon source. To date, the capacity to oxidize methane is restricted to microorganisms from... (Review)
Review
Methanotrophic microorganisms utilize methane as an electron donor and a carbon source. To date, the capacity to oxidize methane is restricted to microorganisms from three bacterial and one archaeal phyla. Most of our knowledge of methanotrophic metabolism has been obtained using highly enriched or pure cultures grown in the laboratory. However, many methanotrophs currently evade cultivation, thus metagenomics provides a complementary approach for gaining insight into currently unisolated microorganisms. Here we synthesize the studies using metagenomics to glean information about methanotrophs. We complement this summary with an analysis of methanotroph marker genes from 235 publically available metagenomic datasets. We analyze the phylogenetic and environmental distribution of methanotrophs sampled by metagenomics. We also highlight metabolic insights that methanotroph genomes assembled from metagenomes are illuminating. In summary, metagenomics has increased methanotrophic foliage within the tree of life, as well as provided new insights into methanotroph metabolism, which collectively can guide new cultivation efforts. Lastly, given the importance of methanotrophs for biotechnological applications and their capacity to filter greenhouse gases from a variety of ecosystems, metagenomics will continue to be an important component in the arsenal of tools needed for understanding methanotroph diversity and metabolism in both engineered and natural systems.
Topics: Archaea; Biodiversity; Energy Metabolism; Metagenome; Metagenomics; Methane; Methanobacteriales; Microbiota; Phylogeny; Soil Microbiology
PubMed: 31166185
DOI: 10.21775/cimb.033.057 -
Microbiological Research Dec 2022Microbial cells attached to inert or living surfaces adopt biofilm mode with self-produced exopolysaccharide matrix containing polysaccharides, proteins, and... (Review)
Review
Microbial cells attached to inert or living surfaces adopt biofilm mode with self-produced exopolysaccharide matrix containing polysaccharides, proteins, and extracellular DNA, for protection from adverse external stimuli. Biofilms in hospitals and industries serve as a breeding ground for drug-resistant pathogens and ARG enrichment that are linked to pathogenicity and also impede industrial production process. Biofilm formation, including virulence and pathogenicity, is regulated through quorum sensing (QS), a means of bacterial cell to cell communication for cooperative physiological processes. Hence, QS inhibition through quorum quenching (QQ) is a feasible approach to inhibit biofilm formation. In contrast, biofilms have beneficial roles in promoting plant growth, biocontrol, and wastewater treatment. Furthermore, polymicrobial biofilms can harbour novel compounds and species of industrial and pharmaceutical interest. Hence, surveillance of biofilm microbiome structure and functional attributes is crucial to determine the extent of the risk it poses and to harness its bioactive potential. One of the most preferred approaches to delineate the microbiome is culture-independent metagenomics. In this context, this review article explores the biofilm microbiome in built and natural settings such as agriculture, household appliances, wastewater treatment plants, hospitals, microplastics, and dental biofilm. We have also discussed the recent reports on discoveries of novel QS and biofilm inhibitors through conventional, metagenomics, and machine learning approaches. Finally, we present biofilm-derived novel metagenome-assembled genomes (MAGs), genomes, and taxa of medical and industrial interest.
Topics: Biofilms; Metagenome; Metagenomics; Microplastics; Pharmaceutical Preparations; Plastics; Quorum Sensing
PubMed: 36194989
DOI: 10.1016/j.micres.2022.127207 -
Methods in Molecular Biology (Clifton,... 2023Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as...
Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as carbohydrate-active enzymes (CAZymes). However, the inability to culture the majority of microorganisms that exist in natural ecosystems restricts access to potentially novel bacteria and beneficial CAZymes. While commonplace molecular-based culture-independent methods such as metagenomics enable researchers to study microbial communities directly from environmental samples, recent progress in long-read sequencing technologies are advancing the field. We outline key methodological stages that are required as well as describe specific protocols that are currently used for long-read metagenomic projects dedicated to CAZyme discovery.
Topics: Metagenomics; Microbiota; Metagenome; Carbohydrates; High-Throughput Nucleotide Sequencing
PubMed: 37149537
DOI: 10.1007/978-1-0716-3151-5_19