metagenome - OpenMD.com Journal Search

Tamock: simulation of habitat-specific benchmark data in metagenomics.

BMC Bioinformatics May 2021

Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the...

Summary PubMed Full Text PDF

Authors: Samuel M Gerner, Alexandra B Graf, Thomas Rattei...

BACKGROUND

Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.

RESULTS

We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.

CONCLUSIONS

Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.

AVAILABILITY

Source code, documentation and install instructions are freely available at GitHub ( https://github.com/gerners/tamock ).

Topics: Algorithms; Benchmarking; Metagenome; Metagenomics; Sequence Analysis, DNA; Software

PubMed: 33932979
DOI: 10.1186/s12859-021-04154-z

Metagenomics as a Tool To Monitor Reclaimed-Water Quality.

Applied and Environmental Microbiology Aug 2020

Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional... (Review)

Summary PubMed Full Text PDF

Review

Authors: Pei-Ying Hong, David Mantilla-Calderon, Changzhi Wang...

Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional monitoring tools rely on cultivation and are not robust in addressing modern water quality concerns. This review proposes metagenomics as a means to provide a rapid, nontargeted assessment of biological contaminants in water. When further coupled with appropriate methods (e.g., quantitative PCR and flow cytometry) and bioinformatic tools, metagenomics can provide information concerning both the abundance and diversity of biological contaminants in reclaimed waters. Further correlation between the metagenomic-derived data of selected contaminants and the measurable parameters of water quality can also aid in devising strategies to alleviate undesirable water quality. Here, we review metagenomic approaches (i.e., both sequencing platforms and bioinformatic tools) and studies that demonstrated their use for reclaimed-water quality monitoring. We also provide recommendations on areas of improvement that will allow metagenomics to significantly impact how the water industry performs reclaimed-water quality monitoring in the future.

Topics: Environmental Monitoring; Metagenome; Metagenomics; Waste Disposal, Fluid; Water Quality

PubMed: 32503906
DOI: 10.1128/AEM.00724-20

BusyBee Web: towards comprehensive and differential composition-based metagenomic binning.

Nucleic Acids Research Jul 2022

Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of...

Summary PubMed Full Text PDF

Authors: Georges P Schmartz, Pascal Hirsch, Jérémy Amand...

Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of metagenomic analysis workflows. In case reference information is lacking, genomic binning is considered to be a state-of-the-art method in mixed culture metagenomic data analysis. In this light, our previously published tool BusyBee Web implements a composition-based binning method efficient enough to function as a rapid online utility. Handling assembled contigs and long nanopore generated reads alike, the webserver provides a wide range of supplementary annotations and visualizations. Half a decade after the initial publication, we revisited existing functionality, added comprehensive visualizations, and increased the number of data analysis customization options for further experimentation. The webserver now allows for visualization-supported differential analysis of samples, which is computationally expensive and typically only performed in coverage-based binning methods. Further, users may now optionally check their uploaded samples for plasmid sequences using PLSDB as a reference database. Lastly, a new application programming interface with a supporting python package was implemented, to allow power users fully automated access to the resource and integration into existing workflows. The webserver is freely available under: https://www.ccb.uni-saarland.de/busybee.

Topics: Algorithms; Metagenome; Software; Metagenomics; Workflow; Sequence Analysis, DNA

PubMed: 35489067
DOI: 10.1093/nar/gkac298

SCAPP: an algorithm for improved plasmid assembly in metagenomes.

Microbiome Jun 2021

Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular...

Summary PubMed Full Text PDF

Authors: David Pellow, Alvah Zorea, Maraike Probst...

BACKGROUND

Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples.

RESULTS

We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)-an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets.

CONCLUSIONS

SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP . Video abstract.

Topics: Algorithms; Humans; Metagenome; Metagenomics; Plasmids; Sequence Analysis, DNA; Software

PubMed: 34172093
DOI: 10.1186/s40168-021-01068-z

Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Functional & Integrative Genomics Feb 2022

This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area,... (Review)

Summary PubMed

Review

Authors: Bilal Wajid, Faria Anwar, Imran Wajid...

This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."

Topics: High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Software

PubMed: 34657989
DOI: 10.1007/s10142-021-00810-y

Sketching and sampling approaches for fast and accurate long read classification.

BMC Bioinformatics Oct 2022

In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of...

Summary PubMed Full Text PDF

Authors: Arun Das, Michael C Schatz

BACKGROUND

In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read.

RESULTS

Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy.

CONCLUSIONS

The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .

Topics: Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Software; Metagenomics; Metagenome; Algorithms

PubMed: 36316646
DOI: 10.1186/s12859-022-05014-0

Data Analytics for Environmental Science and Engineering Research.

Environmental Science & Technology Aug 2021

The advent of new data acquisition and handling techniques has opened the door to alternative and more comprehensive approaches to environmental monitoring that will...

Summary PubMed

Authors: Suraj Gupta, Diana Aga, Amy Pruden...

The advent of new data acquisition and handling techniques has opened the door to alternative and more comprehensive approaches to environmental monitoring that will improve our capacity to understand and manage environmental systems. Researchers have recently begun using machine learning (ML) techniques to analyze complex environmental systems and their associated data. Herein, we provide an overview of data analytics frameworks suitable for various Environmental Science and Engineering (ESE) research applications. We present current applications of ML algorithms within the ESE domain using three representative case studies: (1) Metagenomic data analysis for characterizing and tracking antimicrobial resistance in the environment; (2) Nontarget analysis for environmental pollutant profiling; and (3) Detection of anomalies in continuous data generated by engineered water systems. We conclude by proposing a path to advance incorporation of data analytics approaches in ESE research and application.

Topics: Data Science; Environmental Science; Machine Learning; Metagenome; Metagenomics

PubMed: 34338518
DOI: 10.1021/acs.est.1c01026

Exploring environmental intra-species diversity through non-redundant pangenome assemblies.

Molecular Ecology Resources Oct 2023

At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different...

Summary PubMed

Authors: Fernando Puente-Sánchez, Matthias Hoetzinger, Moritz Buck...

At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different environmental niches and can have a profound influence on microbial community dynamics. As a consequence, any individual genome or population will contain merely a fraction of the total genetic diversity of any operationally defined "species", whose ecological potential can thus be only fully understood by studying all of their genomes and the genes therein. This concept, known as the pangenome, is valuable for studying microbial ecology and evolution, as it partitions genomes into core (present in all the genomes from a species, and responsible for housekeeping and species-level niche adaptation among others) and accessory regions (present only in some, and responsible for intra-species differentiation). Here we present SuperPang, an algorithm producing pangenome assemblies from a set of input genomes of varying quality, including metagenome-assembled genomes (MAGs). SuperPang runs in linear time and its results are complete, non-redundant, preserve gene ordering and contain both coding and non-coding regions. Our approach provides a modular view of the pangenome, identifying operons and genomic islands, and allowing to track their prevalence in different populations. We illustrate this by analysing intra-species diversity in Polynucleobacter, a bacterial genus ubiquitous in freshwater ecosystems, characterized by their streamlined genomes and their ecological versatility. We show how SuperPang facilitates the simultaneous analysis of allelic and gene content variation under different environmental pressures, allowing us to study the drivers of microbial diversification at unprecedented resolution.

Topics: Phylogeny; Bacteria; Metagenome; Algorithms; Microbiota; Metagenomics

PubMed: 37382302
DOI: 10.1111/1755-0998.13826

Novel Insights into the Pig Gut Microbiome Using Metagenome-Assembled Genomes.

Microbiology Spectrum Aug 2022

Pigs are among the most numerous and intensively farmed food-producing animals in the world. The gut microbiome plays an important role in the health and performance of...

Summary PubMed Full Text PDF

Authors: Devin B Holman, Arun Kommadath, Jeffrey P Tingley...

Pigs are among the most numerous and intensively farmed food-producing animals in the world. The gut microbiome plays an important role in the health and performance of swine and changes rapidly after weaning. Here, fecal samples were collected from pigs at 7 different times points from 7 to 140 days of age. These swine fecal metagenomes were used to assemble 1,150 dereplicated metagenome-assembled genomes (MAGs) that were at least 90% complete and had less than 5% contamination. These MAGs represented 472 archaeal and bacterial species, and the most widely distributed MAGs were the uncultured species sp002391315, sp004557565, and sp000434975. Weaning was associated with a decrease in the relative abundance of 69 MAGs (e.g., Escherichia coli) and an increase in the relative abundance of 140 MAGs (e.g., sp000435835, ). Genes encoding for the production of the short-chain fatty acids acetate, butyrate, and propionate were identified in 68.5%, 18.8%, and 8.3% of the MAGs, respectively. Carbohydrate-active enzymes associated with the degradation of arabinose oligosaccharides and mixed-linkage glucans were predicted to be most prevalent among the MAGs. Antimicrobial resistance genes were detected in 327 MAGs, including 59 MAGs with tetracycline resistance genes commonly associated with pigs, such as (44), (Q), and (W). Overall, 82% of the MAGs were assigned to species that lack cultured representatives indicating that a large portion of the swine gut microbiome is still poorly characterized. The results here also demonstrate the value of MAGs in adding genomic context to gut microbiomes. Many of the bacterial strains found in the mammalian gut are difficult to culture and isolate due to their various growth and nutrient requirements that are frequently unknown. Here, we assembled strain-level genomes from short metagenomic sequences, so-called metagenome-assembled genomes (MAGs), that were derived from fecal samples collected from pigs at multiple time points. The genomic context of a number of antimicrobial resistance genes commonly detected in swine was also determined. In addition, our study connected taxonomy with potential metabolic functions such as carbohydrate degradation and short-chain fatty acid production.

Topics: Animals; Archaea; Bacteria; Carbohydrates; Gastrointestinal Microbiome; Mammals; Metagenome; Metagenomics; Swine

PubMed: 35880887
DOI: 10.1128/spectrum.02380-22

HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps.

Genome Biology Feb 2022

Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation...

Summary PubMed Full Text PDF

Authors: Yuxuan Du, Fengzhu Sun

Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. We develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden clustering algorithm and includes the spurious contact detection into binning pipelines for the first time. HiCBin is validated on one synthetic and two real metagenomic samples and is shown to outperform the existing Hi-C-based binning methods. HiCBin is available at https://github.com/dyxstat/HiCBin .

Topics: Algorithms; Cluster Analysis; Metagenome; Metagenomics; Microbiota

PubMed: 35227283
DOI: 10.1186/s13059-022-02626-w