-
Scientific Data Nov 2023Urban lakes provide multiple benefits to society while influencing life quality. Moreover, lakes and their microbiomes are sentinels of anthropogenic impact and can be...
Urban lakes provide multiple benefits to society while influencing life quality. Moreover, lakes and their microbiomes are sentinels of anthropogenic impact and can be used for natural resource management and planning. Here, we release original metagenomic data from several well-characterized and anthropogenically impacted eutrophic lakes in the vicinity of Stockholm (Sweden). Our goal was to collect representative microbial community samples and use shotgun sequencing to provide a broad view on microbial diversity of productive urban lakes. Our dataset has an emphasis on Lake Mälaren as a major drinking water reservoir under anthropogenic impact. This dataset includes short-read sequence data and metagenome assemblies from each of 17 samples collected from eutrophic lakes near the greater Stockholm area. We used genome-resolved metagenomics and obtained 2378 metagenome assembled genomes that de-replicated into 514 species representative genomes. This dataset adds new datapoints to previously sequenced lakes and it includes the first sequenced set of metagenomes from Lake Mälaren. Our dataset serves as a baseline for future monitoring of drinking water reservoirs and urban lakes.
Topics: Bacteria; Drinking Water; Lakes; Metagenome; Metagenomics; Sweden
PubMed: 37978200
DOI: 10.1038/s41597-023-02722-x -
BMC Bioinformatics May 2021Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the...
BACKGROUND
Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.
RESULTS
We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.
CONCLUSIONS
Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.
AVAILABILITY
Source code, documentation and install instructions are freely available at GitHub ( https://github.com/gerners/tamock ).
Topics: Algorithms; Benchmarking; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 33932979
DOI: 10.1186/s12859-021-04154-z -
Applied and Environmental Microbiology Aug 2020Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional... (Review)
Review
Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional monitoring tools rely on cultivation and are not robust in addressing modern water quality concerns. This review proposes metagenomics as a means to provide a rapid, nontargeted assessment of biological contaminants in water. When further coupled with appropriate methods (e.g., quantitative PCR and flow cytometry) and bioinformatic tools, metagenomics can provide information concerning both the abundance and diversity of biological contaminants in reclaimed waters. Further correlation between the metagenomic-derived data of selected contaminants and the measurable parameters of water quality can also aid in devising strategies to alleviate undesirable water quality. Here, we review metagenomic approaches (i.e., both sequencing platforms and bioinformatic tools) and studies that demonstrated their use for reclaimed-water quality monitoring. We also provide recommendations on areas of improvement that will allow metagenomics to significantly impact how the water industry performs reclaimed-water quality monitoring in the future.
Topics: Environmental Monitoring; Metagenome; Metagenomics; Waste Disposal, Fluid; Water Quality
PubMed: 32503906
DOI: 10.1128/AEM.00724-20 -
Bioinformatics (Oxford, England) Sep 2021MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that...
SUMMARY
MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig's taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2-18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments.
AVAILABILITY AND IMPLEMENTATION
MMseqs2 taxonomy is part of the MMseqs2 free open-source software package available for Linux, macOS and Windows at https://mmseqs.com.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Software; Metagenome; Metagenomics; Databases, Factual
PubMed: 33734313
DOI: 10.1093/bioinformatics/btab184 -
FEMS Microbiology Reviews Sep 2013Humans are home to complex microbial communities, whose aggregate genomes and their encoded metabolic activities are referred to as the human microbiome. Recently,... (Review)
Review
Humans are home to complex microbial communities, whose aggregate genomes and their encoded metabolic activities are referred to as the human microbiome. Recently, researchers have begun to appreciate that different human body habitats and the activities of their resident microorganisms can be better understood in ecological terms, as a range of spatial scales encompassing single cells, guilds of microorganisms responsive to a similar substrate, microbial communities, body habitats, and host populations. However, the bulk of the work to date has focused on studies of culturable microorganisms in isolation or on DNA sequencing-based surveys of microbial diversity in small-to-moderate-sized cohorts of individuals. Here, we discuss recent work that highlights the potential for assessing the human microbiome at a range of spatial scales, and for developing novel techniques that bridge multiple levels: for example, through the combination of single-cell methods and metagenomic sequencing. These studies promise to not only provide a much-needed epidemiological and ecological context for mechanistic studies of culturable and genetically tractable microorganisms, but may also lead to the discovery of fundamental rules that govern the assembly and function of host-associated microbial communities.
Topics: Animals; Ecosystem; Humans; Metabolomics; Metagenome; Metagenomics; Microbiota; Single-Cell Analysis
PubMed: 23550823
DOI: 10.1111/1574-6976.12022 -
Nucleic Acids Research Jul 2022Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of...
Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of metagenomic analysis workflows. In case reference information is lacking, genomic binning is considered to be a state-of-the-art method in mixed culture metagenomic data analysis. In this light, our previously published tool BusyBee Web implements a composition-based binning method efficient enough to function as a rapid online utility. Handling assembled contigs and long nanopore generated reads alike, the webserver provides a wide range of supplementary annotations and visualizations. Half a decade after the initial publication, we revisited existing functionality, added comprehensive visualizations, and increased the number of data analysis customization options for further experimentation. The webserver now allows for visualization-supported differential analysis of samples, which is computationally expensive and typically only performed in coverage-based binning methods. Further, users may now optionally check their uploaded samples for plasmid sequences using PLSDB as a reference database. Lastly, a new application programming interface with a supporting python package was implemented, to allow power users fully automated access to the resource and integration into existing workflows. The webserver is freely available under: https://www.ccb.uni-saarland.de/busybee.
Topics: Algorithms; Metagenome; Software; Metagenomics; Workflow; Sequence Analysis, DNA
PubMed: 35489067
DOI: 10.1093/nar/gkac298 -
BMC Bioinformatics Oct 2022In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of...
BACKGROUND
In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read.
RESULTS
Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy.
CONCLUSIONS
The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .
Topics: Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Software; Metagenomics; Metagenome; Algorithms
PubMed: 36316646
DOI: 10.1186/s12859-022-05014-0 -
Microbiology Spectrum Aug 2022Pigs are among the most numerous and intensively farmed food-producing animals in the world. The gut microbiome plays an important role in the health and performance of...
Pigs are among the most numerous and intensively farmed food-producing animals in the world. The gut microbiome plays an important role in the health and performance of swine and changes rapidly after weaning. Here, fecal samples were collected from pigs at 7 different times points from 7 to 140 days of age. These swine fecal metagenomes were used to assemble 1,150 dereplicated metagenome-assembled genomes (MAGs) that were at least 90% complete and had less than 5% contamination. These MAGs represented 472 archaeal and bacterial species, and the most widely distributed MAGs were the uncultured species sp002391315, sp004557565, and sp000434975. Weaning was associated with a decrease in the relative abundance of 69 MAGs (e.g., Escherichia coli) and an increase in the relative abundance of 140 MAGs (e.g., sp000435835, ). Genes encoding for the production of the short-chain fatty acids acetate, butyrate, and propionate were identified in 68.5%, 18.8%, and 8.3% of the MAGs, respectively. Carbohydrate-active enzymes associated with the degradation of arabinose oligosaccharides and mixed-linkage glucans were predicted to be most prevalent among the MAGs. Antimicrobial resistance genes were detected in 327 MAGs, including 59 MAGs with tetracycline resistance genes commonly associated with pigs, such as (44), (Q), and (W). Overall, 82% of the MAGs were assigned to species that lack cultured representatives indicating that a large portion of the swine gut microbiome is still poorly characterized. The results here also demonstrate the value of MAGs in adding genomic context to gut microbiomes. Many of the bacterial strains found in the mammalian gut are difficult to culture and isolate due to their various growth and nutrient requirements that are frequently unknown. Here, we assembled strain-level genomes from short metagenomic sequences, so-called metagenome-assembled genomes (MAGs), that were derived from fecal samples collected from pigs at multiple time points. The genomic context of a number of antimicrobial resistance genes commonly detected in swine was also determined. In addition, our study connected taxonomy with potential metabolic functions such as carbohydrate degradation and short-chain fatty acid production.
Topics: Animals; Archaea; Bacteria; Carbohydrates; Gastrointestinal Microbiome; Mammals; Metagenome; Metagenomics; Swine
PubMed: 35880887
DOI: 10.1128/spectrum.02380-22 -
Bioinformatics (Oxford, England) Jan 2022With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and...
MOTIVATION
With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity.
RESULTS
We developed Tiara, a deep-learning-based approach for the identification of eukaryotic sequences in the metagenomic datasets. Its two-step classification process enables the classification of nuclear and organellar eukaryotic fractions and subsequently divides organellar sequences into plastidial and mitochondrial. Using the test dataset, we have shown that Tiara performed similarly to EukRep for prokaryotes classification and outperformed it for eukaryotes classification with lower calculation time. In the tests on the real data, Tiara performed better than EukRep in analyzing the small dataset representing eukaryotic cell microbiome and large dataset from the pelagic zone of oceans. Tiara is also the only available tool correctly classifying organellar sequences, which was confirmed by the recovery of nearly complete plastid and mitochondrial genomes from the test data and real metagenomic data.
AVAILABILITY AND IMPLEMENTATION
Tiara is implemented in python 3.8, available at https://github.com/ibe-uw/tiara and tested on Unix-based systems. It is released under an open-source MIT license and documentation is available at https://ibe-uw.github.io/tiara. Version 1.0.1 of Tiara has been used for all benchmarks.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Software; Deep Learning; Eukaryota; Eukaryotic Cells; Metagenomics; Metagenome
PubMed: 34570171
DOI: 10.1093/bioinformatics/btab672 -
Bioinformatics (Oxford, England) Sep 2019Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life...
MOTIVATION
Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.
RESULTS
Microbiome researchers are generally interested in two objectives of a taxonomic classifier: (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances.
AVAILABILITY AND IMPLEMENTATION
The implementations are freely available for non-commercial purposes. They can be downloaded from https://drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl.
Topics: Algorithms; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Sequence Analysis, DNA
PubMed: 30649204
DOI: 10.1093/bioinformatics/bty1071