-
Current Issues in Molecular Biology 2017Surveys of environmental microbial communities using metagenomic approach produce vast volumes of multidimensional data regarding the phylogenetic and functional... (Review)
Review
Surveys of environmental microbial communities using metagenomic approach produce vast volumes of multidimensional data regarding the phylogenetic and functional composition of the microbiota. Faced with such complex data, a metagenomic researcher needs to select the means for data analysis properly. Data visualization became an indispensable part of the exploratory data analysis and serves a key to the discoveries. While the molecular-genetic analysis of even a single bacterium presents multiple layers of data to be properly displayed and perceived, the studies of microbiota are significantly more challenging. Here we present a review of the state-of-art methods for the visualization of metagenomic data in a multi-level manner: from the methods applicable to an in-depth analysis of a single metagenome to the techniques appropriate for large-scale studies containing hundreds of environmental samples.
Topics: Bacteria; Computer Graphics; Databases, Genetic; Metagenome; Metagenomics; Microbiota
PubMed: 28686567
DOI: 10.21775/cimb.024.037 -
Microbial Genomics Apr 2024The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most... (Review)
Review
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Topics: Humans; Deep Learning; Microbiota; Metagenome; Gastrointestinal Microbiome; Metagenomics
PubMed: 38630611
DOI: 10.1099/mgen.0.001231 -
Annual Review of Biomedical Data Science Jul 2021Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the...
Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.
Topics: Ecology; Genome, Viral; Metagenome; Metagenomics; Viruses
PubMed: 34465172
DOI: 10.1146/annurev-biodatasci-012221-095114 -
Methods in Molecular Biology (Clifton,... 2021Assembly of metagenomic sequence data into microbial genomes is of critical importance for disentangling community complexity and unraveling the functional capacity of...
Assembly of metagenomic sequence data into microbial genomes is of critical importance for disentangling community complexity and unraveling the functional capacity of microorganisms. The rapid development of sequencing technology and novel assembly algorithms have made it possible to reliably reconstruct hundreds to thousands of microbial genomes from raw sequencing reads through metagenomic assembly. In this chapter, we introduce a routinely used metagenomic assembly workflow including read quality filtering, assembly, contig/scaffold binning, and postassembly check for genome completeness and contamination. We also describe a case study to reconstruct near-complete microbial genomes from metagenomes using our workflow.
Topics: Databases, Genetic; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Phylogeny; Research Design; Sequence Analysis, DNA; Software; Workflow
PubMed: 33961222
DOI: 10.1007/978-1-0716-1099-2_9 -
Genome Biology Dec 2012Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a...
Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net.
Topics: Bacteria; Gene Ontology; Genome, Bacterial; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Software
PubMed: 23259615
DOI: 10.1186/gb-2012-13-12-r122 -
Bioinformatics (Oxford, England) Oct 2022Shotgun metagenomic sequencing provides the capacity to understand microbial community structure and function at unprecedented resolution; however, the current...
SUMMARY
Shotgun metagenomic sequencing provides the capacity to understand microbial community structure and function at unprecedented resolution; however, the current analytical methods are constrained by a focus on taxonomic classifications that may obfuscate functional relationships. Here, we present expam, a tree-based, taxonomy agnostic tool for the identification of biologically relevant clades from shotgun metagenomic sequencing.
AVAILABILITY AND IMPLEMENTATION
expam is an open-source Python application released under the GNU General Public Licence v3.0. expam installation instructions, source code and tutorials can be found at https://github.com/seansolari/expam.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metagenome; Metagenomics; Microbiota; Software
PubMed: 36029242
DOI: 10.1093/bioinformatics/btac591 -
BMC Bioinformatics Jun 2020Metagenomics studies provide valuable insight into the composition and function of microbial populations from diverse environments; however, the data processing...
BACKGROUND
Metagenomics studies provide valuable insight into the composition and function of microbial populations from diverse environments; however, the data processing pipelines that rely on mapping reads to gene catalogs or genome databases for cultured strains yield results that underrepresent the genes and functional potential of uncultured microbes. Recent improvements in sequence assembly methods have eased the reliance on genome databases, thereby allowing the recovery of genomes from uncultured microbes. However, configuring these tools, linking them with advanced binning and annotation tools, and maintaining provenance of the processing continues to be challenging for researchers.
RESULTS
Here we present ATLAS, a software package for customizable data processing from raw sequence reads to functional and taxonomic annotations using state-of-the-art tools to assemble, annotate, quantify, and bin metagenome data. Abundance estimates at genome resolution are provided for each sample in a dataset. ATLAS is written in Python and the workflow implemented in Snakemake; it operates in a Linux environment, and is compatible with Python 3.5+ and Anaconda 3+ versions. The source code for ATLAS is freely available, distributed under a BSD-3 license.
CONCLUSIONS
ATLAS provides a user-friendly, modular and customizable Snakemake workflow for metagenome data processing; it is easily installable with conda and maintained as open-source on GitHub at https://github.com/metagenome-atlas/atlas.
Topics: Metagenome; Metagenomics; Molecular Sequence Annotation; Software; Workflow
PubMed: 32571209
DOI: 10.1186/s12859-020-03585-4 -
Methods in Molecular Biology (Clifton,... 2023Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved... (Review)
Review
Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.
Topics: Microbiota; Metagenome; Sequence Analysis, DNA; Metagenomics
PubMed: 36929080
DOI: 10.1007/978-1-0716-2986-4_11 -
Current Issues in Molecular Biology 2019Methanotrophic microorganisms utilize methane as an electron donor and a carbon source. To date, the capacity to oxidize methane is restricted to microorganisms from... (Review)
Review
Methanotrophic microorganisms utilize methane as an electron donor and a carbon source. To date, the capacity to oxidize methane is restricted to microorganisms from three bacterial and one archaeal phyla. Most of our knowledge of methanotrophic metabolism has been obtained using highly enriched or pure cultures grown in the laboratory. However, many methanotrophs currently evade cultivation, thus metagenomics provides a complementary approach for gaining insight into currently unisolated microorganisms. Here we synthesize the studies using metagenomics to glean information about methanotrophs. We complement this summary with an analysis of methanotroph marker genes from 235 publically available metagenomic datasets. We analyze the phylogenetic and environmental distribution of methanotrophs sampled by metagenomics. We also highlight metabolic insights that methanotroph genomes assembled from metagenomes are illuminating. In summary, metagenomics has increased methanotrophic foliage within the tree of life, as well as provided new insights into methanotroph metabolism, which collectively can guide new cultivation efforts. Lastly, given the importance of methanotrophs for biotechnological applications and their capacity to filter greenhouse gases from a variety of ecosystems, metagenomics will continue to be an important component in the arsenal of tools needed for understanding methanotroph diversity and metabolism in both engineered and natural systems.
Topics: Archaea; Biodiversity; Energy Metabolism; Metagenome; Metagenomics; Methane; Methanobacteriales; Microbiota; Phylogeny; Soil Microbiology
PubMed: 31166185
DOI: 10.21775/cimb.033.057 -
F1000Research 2018Shotgun metagenomics sequencing is a powerful tool for the characterization of complex biological matrices, enabling analysis of prokaryotic and eukaryotic organisms and...
Shotgun metagenomics sequencing is a powerful tool for the characterization of complex biological matrices, enabling analysis of prokaryotic and eukaryotic organisms and viruses in a single experiment, with the possibility of reconstructing the whole metagenome or a set of genes of interest. One of the main factors limiting the use of shotgun metagenomics on wide scale projects is the high cost associated with the approach. We set out to determine if it is possible to use shallow shotgun metagenomics to characterize complex biological matrices while reducing costs. We used a staggered mock community to estimate the optimal threshold for species detection. We measured the variation of several summary statistics simulating a decrease in sequencing depth by randomly subsampling a number of reads. The main statistics that were compared are diversity estimates, species abundance, and ability of reconstructing the metagenome in terms of length and completeness. Our results show that diversity indices of complex prokaryotic, eukaryotic and viral communities can be accurately estimated with 500,000 reads or less, although particularly complex samples may require 1,000,000 reads. On the contrary, any task involving the reconstruction of the metagenome performed poorly, even with the largest simulated subsample (1,000,000 reads). The length of the reconstructed assembly was smaller than the length obtained with the full dataset, and the proportion of conserved genes that were identified in the meta-genome was drastically reduced compared to the full sample. Shallow shotgun metagenomics can be a useful tool to describe the structure of complex matrices, but it is not adequate to reconstruct-even partially-the metagenome.
Topics: Animals; Metagenome; Metagenomics; Sequence Analysis, DNA; Species Specificity
PubMed: 32185014
DOI: 10.12688/f1000research.16804.4