-
BMC Bioinformatics May 2021Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the...
BACKGROUND
Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.
RESULTS
We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.
CONCLUSIONS
Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.
AVAILABILITY
Source code, documentation and install instructions are freely available at GitHub ( https://github.com/gerners/tamock ).
Topics: Algorithms; Benchmarking; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 33932979
DOI: 10.1186/s12859-021-04154-z -
Applied and Environmental Microbiology Aug 2020Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional... (Review)
Review
Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional monitoring tools rely on cultivation and are not robust in addressing modern water quality concerns. This review proposes metagenomics as a means to provide a rapid, nontargeted assessment of biological contaminants in water. When further coupled with appropriate methods (e.g., quantitative PCR and flow cytometry) and bioinformatic tools, metagenomics can provide information concerning both the abundance and diversity of biological contaminants in reclaimed waters. Further correlation between the metagenomic-derived data of selected contaminants and the measurable parameters of water quality can also aid in devising strategies to alleviate undesirable water quality. Here, we review metagenomic approaches (i.e., both sequencing platforms and bioinformatic tools) and studies that demonstrated their use for reclaimed-water quality monitoring. We also provide recommendations on areas of improvement that will allow metagenomics to significantly impact how the water industry performs reclaimed-water quality monitoring in the future.
Topics: Environmental Monitoring; Metagenome; Metagenomics; Waste Disposal, Fluid; Water Quality
PubMed: 32503906
DOI: 10.1128/AEM.00724-20 -
Nucleic Acids Research Jul 2022Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of...
Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of metagenomic analysis workflows. In case reference information is lacking, genomic binning is considered to be a state-of-the-art method in mixed culture metagenomic data analysis. In this light, our previously published tool BusyBee Web implements a composition-based binning method efficient enough to function as a rapid online utility. Handling assembled contigs and long nanopore generated reads alike, the webserver provides a wide range of supplementary annotations and visualizations. Half a decade after the initial publication, we revisited existing functionality, added comprehensive visualizations, and increased the number of data analysis customization options for further experimentation. The webserver now allows for visualization-supported differential analysis of samples, which is computationally expensive and typically only performed in coverage-based binning methods. Further, users may now optionally check their uploaded samples for plasmid sequences using PLSDB as a reference database. Lastly, a new application programming interface with a supporting python package was implemented, to allow power users fully automated access to the resource and integration into existing workflows. The webserver is freely available under: https://www.ccb.uni-saarland.de/busybee.
Topics: Algorithms; Metagenome; Software; Metagenomics; Workflow; Sequence Analysis, DNA
PubMed: 35489067
DOI: 10.1093/nar/gkac298 -
Microbiome Jun 2021Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular...
BACKGROUND
Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples.
RESULTS
We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)-an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets.
CONCLUSIONS
SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP . Video abstract.
Topics: Algorithms; Humans; Metagenome; Metagenomics; Plasmids; Sequence Analysis, DNA; Software
PubMed: 34172093
DOI: 10.1186/s40168-021-01068-z -
Functional & Integrative Genomics Feb 2022This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area,... (Review)
Review
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Topics: High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Software
PubMed: 34657989
DOI: 10.1007/s10142-021-00810-y -
BMC Bioinformatics Oct 2022In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of...
BACKGROUND
In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read.
RESULTS
Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy.
CONCLUSIONS
The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
Topics: Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Software; Metagenomics; Metagenome; Algorithms
PubMed: 36316646
DOI: 10.1186/s12859-022-05014-0 -
Environmental Science & Technology Aug 2021The advent of new data acquisition and handling techniques has opened the door to alternative and more comprehensive approaches to environmental monitoring that will...
The advent of new data acquisition and handling techniques has opened the door to alternative and more comprehensive approaches to environmental monitoring that will improve our capacity to understand and manage environmental systems. Researchers have recently begun using machine learning (ML) techniques to analyze complex environmental systems and their associated data. Herein, we provide an overview of data analytics frameworks suitable for various Environmental Science and Engineering (ESE) research applications. We present current applications of ML algorithms within the ESE domain using three representative case studies: (1) Metagenomic data analysis for characterizing and tracking antimicrobial resistance in the environment; (2) Nontarget analysis for environmental pollutant profiling; and (3) Detection of anomalies in continuous data generated by engineered water systems. We conclude by proposing a path to advance incorporation of data analytics approaches in ESE research and application.
Topics: Data Science; Environmental Science; Machine Learning; Metagenome; Metagenomics
PubMed: 34338518
DOI: 10.1021/acs.est.1c01026 -
Molecular Ecology Resources Oct 2023At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different...
At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different environmental niches and can have a profound influence on microbial community dynamics. As a consequence, any individual genome or population will contain merely a fraction of the total genetic diversity of any operationally defined "species", whose ecological potential can thus be only fully understood by studying all of their genomes and the genes therein. This concept, known as the pangenome, is valuable for studying microbial ecology and evolution, as it partitions genomes into core (present in all the genomes from a species, and responsible for housekeeping and species-level niche adaptation among others) and accessory regions (present only in some, and responsible for intra-species differentiation). Here we present SuperPang, an algorithm producing pangenome assemblies from a set of input genomes of varying quality, including metagenome-assembled genomes (MAGs). SuperPang runs in linear time and its results are complete, non-redundant, preserve gene ordering and contain both coding and non-coding regions. Our approach provides a modular view of the pangenome, identifying operons and genomic islands, and allowing to track their prevalence in different populations. We illustrate this by analysing intra-species diversity in Polynucleobacter, a bacterial genus ubiquitous in freshwater ecosystems, characterized by their streamlined genomes and their ecological versatility. We show how SuperPang facilitates the simultaneous analysis of allelic and gene content variation under different environmental pressures, allowing us to study the drivers of microbial diversification at unprecedented resolution.
Topics: Phylogeny; Bacteria; Metagenome; Algorithms; Microbiota; Metagenomics
PubMed: 37382302
DOI: 10.1111/1755-0998.13826 -
Microbiology Spectrum Aug 2022Pigs are among the most numerous and intensively farmed food-producing animals in the world. The gut microbiome plays an important role in the health and performance of...
Pigs are among the most numerous and intensively farmed food-producing animals in the world. The gut microbiome plays an important role in the health and performance of swine and changes rapidly after weaning. Here, fecal samples were collected from pigs at 7 different times points from 7 to 140 days of age. These swine fecal metagenomes were used to assemble 1,150 dereplicated metagenome-assembled genomes (MAGs) that were at least 90% complete and had less than 5% contamination. These MAGs represented 472 archaeal and bacterial species, and the most widely distributed MAGs were the uncultured species sp002391315, sp004557565, and sp000434975. Weaning was associated with a decrease in the relative abundance of 69 MAGs (e.g., Escherichia coli) and an increase in the relative abundance of 140 MAGs (e.g., sp000435835, ). Genes encoding for the production of the short-chain fatty acids acetate, butyrate, and propionate were identified in 68.5%, 18.8%, and 8.3% of the MAGs, respectively. Carbohydrate-active enzymes associated with the degradation of arabinose oligosaccharides and mixed-linkage glucans were predicted to be most prevalent among the MAGs. Antimicrobial resistance genes were detected in 327 MAGs, including 59 MAGs with tetracycline resistance genes commonly associated with pigs, such as (44), (Q), and (W). Overall, 82% of the MAGs were assigned to species that lack cultured representatives indicating that a large portion of the swine gut microbiome is still poorly characterized. The results here also demonstrate the value of MAGs in adding genomic context to gut microbiomes. Many of the bacterial strains found in the mammalian gut are difficult to culture and isolate due to their various growth and nutrient requirements that are frequently unknown. Here, we assembled strain-level genomes from short metagenomic sequences, so-called metagenome-assembled genomes (MAGs), that were derived from fecal samples collected from pigs at multiple time points. The genomic context of a number of antimicrobial resistance genes commonly detected in swine was also determined. In addition, our study connected taxonomy with potential metabolic functions such as carbohydrate degradation and short-chain fatty acid production.
Topics: Animals; Archaea; Bacteria; Carbohydrates; Gastrointestinal Microbiome; Mammals; Metagenome; Metagenomics; Swine
PubMed: 35880887
DOI: 10.1128/spectrum.02380-22 -
Genome Biology Feb 2022Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation...
Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. We develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden clustering algorithm and includes the spurious contact detection into binning pipelines for the first time. HiCBin is validated on one synthetic and two real metagenomic samples and is shown to outperform the existing Hi-C-based binning methods. HiCBin is available at https://github.com/dyxstat/HiCBin .
Topics: Algorithms; Cluster Analysis; Metagenome; Metagenomics; Microbiota
PubMed: 35227283
DOI: 10.1186/s13059-022-02626-w