-
Frontiers in Cellular and Infection... 2023The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research.
INTRODUCTION
The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research.
METHOD AND RESULTS
Firstly, we visualized the chaotic game representation (CGR) of simulated metagenomes and real metagenomes. We find that metagenomes are visualized with self-similarity. Then we defined and calculated the multifractal dimension for the visualized plot of simulated and real metagenomes, respectively. By analyzing the Pearson correlation coefficients between the multifractal dimension and the traditional species diversity index, we obtain that the correlation coefficients between the multifractal dimension and the species richness index and Shannon diversity index reached the maximum value when q = 0, 1, and the correlation coefficient between the multifractal dimension and the Simpson diversity index reached the maximum value when q = 5. Finally, we apply our method to real metagenomes of the gut microbiota of 100 infants who are newborn and 4 and 12 months old. The results show that the multifractal dimensions of an infant's gut microbiomes can distinguish age differences.
CONCLUSION AND DISCUSSION
There is self-similarity among the CGRs of WGS of metagenomes, and the multifractal spectrum is an important characteristic for metagenomes. The traditional diversity indicators can be unified under the framework of multifractal analysis. These results coincided with similar results in macrobial ecology. The multifractal spectrum of infants' gut microbiomes are related to the development of the infants.
Topics: Humans; Infant; Infant, Newborn; Metagenome; Microbiota; Gastrointestinal Microbiome; Metagenomics; Ecology
PubMed: 36779183
DOI: 10.3389/fcimb.2023.1117421 -
Microbiome Mar 2021Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are...
BACKGROUND
Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are critical to these communities, they are challenging to study with shotgun sequencing techniques and are therefore often excluded.
RESULTS
Here, we present EukDetect, a bioinformatics method to identify eukaryotes in shotgun metagenomic sequencing data. Our approach uses a database of 521,824 universal marker genes from 241 conserved gene families, which we curated from 3713 fungal, protist, non-vertebrate metazoan, and non-streptophyte archaeplastida genomes and transcriptomes. EukDetect has a broad taxonomic coverage of microbial eukaryotes, performs well on low-abundance and closely related species, and is resilient against bacterial contamination in eukaryotic genomes. Using EukDetect, we describe the spatial distribution of eukaryotes along the human gastrointestinal tract, showing that fungi and protists are present in the lumen and mucosa throughout the large intestine. We discover that there is a succession of eukaryotes that colonize the human gut during the first years of life, mirroring patterns of developmental succession observed in gut bacteria. By comparing DNA and RNA sequencing of paired samples from human stool, we find that many eukaryotes continue active transcription after passage through the gut, though some do not, suggesting they are dormant or nonviable. We analyze metagenomic data from the Baltic Sea and find that eukaryotes differ across locations and salinity gradients. Finally, we observe eukaryotes in Arabidopsis leaf samples, many of which are not identifiable from public protein databases.
CONCLUSIONS
EukDetect provides an automated and reliable way to characterize eukaryotes in shotgun sequencing datasets from diverse microbiomes. We demonstrate that it enables discoveries that would be missed or clouded by false positives with standard shotgun sequence analysis. EukDetect will greatly advance our understanding of how microbial eukaryotes contribute to microbiomes. Video abstract.
Topics: Animals; Eukaryota; Humans; Metagenome; Metagenomics; Sequence Analysis, DNA
PubMed: 33658077
DOI: 10.1186/s40168-021-01015-y -
Briefings in Bioinformatics Nov 2012Metagenomic sequencing provides a unique opportunity to explore earth's limitless environments harboring scores of yet unknown and mostly unculturable microbes and other... (Review)
Review
Metagenomic sequencing provides a unique opportunity to explore earth's limitless environments harboring scores of yet unknown and mostly unculturable microbes and other organisms. Functional analysis of the metagenomic data plays a central role in projects aiming to explore the most essential questions in microbiology, namely 'In a given environment, among the microbes present, what are they doing, and how are they doing it?' Toward this goal, several large-scale metagenomic projects have recently been conducted or are currently underway. Functional analysis of metagenomic data mainly suffers from the vast amount of data generated in these projects. The shear amount of data requires much computational time and storage space. These problems are compounded by other factors potentially affecting the functional analysis, including, sample preparation, sequencing method and average genome size of the metagenomic samples. In addition, the read-lengths generated during sequencing influence sequence assembly, gene prediction and subsequently the functional analysis. The level of confidence for functional predictions increases with increasing read-length. Usually, the most reliable functional annotations for metagenomic sequences are achieved using homology-based approaches against publicly available reference sequence databases. Here, we present an overview of the current state of functional analysis of metagenomic sequence data, bottlenecks frequently encountered and possible solutions in light of currently available resources and tools. Finally, we provide some examples of applications from recent metagenomic studies which have been successfully conducted in spite of the known difficulties.
Topics: Algorithms; Metagenome; Metagenomics; Sequence Analysis, DNA
PubMed: 22772835
DOI: 10.1093/bib/bbs033 -
Genomics, Proteomics & Bioinformatics Oct 2015The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for... (Review)
Review
The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint.
Topics: Cluster Analysis; Computational Biology; High-Throughput Nucleotide Sequencing; Humans; Information Storage and Retrieval; Internet; Metagenome; Metagenomics; Software
PubMed: 26602607
DOI: 10.1016/j.gpb.2015.10.003 -
Virology Dec 2012There are 100 times more virions than eukaryotic cells in a healthy human body. The characterization of human-associated viral communities in a non-pathological state... (Review)
Review
There are 100 times more virions than eukaryotic cells in a healthy human body. The characterization of human-associated viral communities in a non-pathological state and the detection of viral pathogens in cases of infection are essential for medical care and epidemic surveillance. Viral metagenomics, the sequenced-based analysis of the complete collection of viral genomes directly isolated from an organism or an ecosystem, bypasses the "single-organism-level" point of view of clinical diagnostics and thus the need to isolate and culture the targeted organism. The first part of this review is dedicated to a presentation of past research in viral metagenomics with an emphasis on human-associated viral communities (eukaryotic viruses and bacteriophages). In the second part, we review more precisely the computational challenges posed by the analysis of viral metagenomes, and we illustrate the problem of sequences that do not have homologs in public databases and the possible approaches to characterize them.
Topics: Biomedical Research; Biota; Computational Biology; Environmental Microbiology; Humans; Metagenome; Metagenomics; Viruses
PubMed: 23062738
DOI: 10.1016/j.virol.2012.09.025 -
Journal of Molecular Biology Jul 2023An increasingly common output arising from the analysis of shotgun metagenomic datasets is the generation of metagenome-assembled genomes (MAGs), with tens of thousands...
An increasingly common output arising from the analysis of shotgun metagenomic datasets is the generation of metagenome-assembled genomes (MAGs), with tens of thousands of MAGs now described in the literature. However, the discovery and comparison of these MAG collections is hampered by the lack of uniformity in their generation, annotation and storage. To address this, we have developed MGnify Genomes, a growing collection of biome-specific non-redundant microbial genome catalogues generated using MAGs and publicly available isolate genomes. Genomes within a biome-specific catalogue are organised into species clusters. For species that contain multiple conspecific genomes, the highest quality genome is selected as the representative, always prioritising an isolate genome over a MAG. The species representative sequences and annotations can be visualised on the MGnify website and the full catalogue and associated analysis outputs can be downloaded from MGnify servers. A suite of online search tools is provided allowing users to compare their own sequences, ranging from a gene to sets of genomes, against the catalogues. Seven biomes are available currently, comprising over 300,000 genomes that represent 11,048 non-redundant species, and include 36 taxonomic classes not currently represented by cultured genomes. MGnify Genomes is available at https://www.ebi.ac.uk/metagenomics/browse/genomes/.
Topics: Genome, Microbial; Metagenome; Metagenomics
PubMed: 36806692
DOI: 10.1016/j.jmb.2023.168016 -
Microbiome Jun 2021Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular...
BACKGROUND
Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples.
RESULTS
We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)-an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets.
CONCLUSIONS
SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP . Video abstract.
Topics: Algorithms; Humans; Metagenome; Metagenomics; Plasmids; Sequence Analysis, DNA; Software
PubMed: 34172093
DOI: 10.1186/s40168-021-01068-z -
Nature Oct 2012An innovative method for probing the genomes of the vast community of microorganisms that inhabit the human gut provides an alternative approach to identifying risk...
An innovative method for probing the genomes of the vast community of microorganisms that inhabit the human gut provides an alternative approach to identifying risk factors for type 2 diabetes.
Topics: Diabetes Mellitus, Type 2; Genome-Wide Association Study; Humans; Intestines; Metagenome; Metagenomics
PubMed: 23038462
DOI: 10.1038/490044a -
Genes Aug 2020Most current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique...
Most current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique genomic regions. NGS reads, however, might not be long enough to differentiate similar genomes. This suggests a potential for using longer reads to improve classification performance. Presently, longer reads tend to have a higher rate of sequencing errors. Thus, given the pros and cons, it remains unclear which types of reads is better for metagenomic classification. We compared two taxonomic classification protocols: a traditional assembly-free protocol and a novel assembly-based protocol. The novel assembly-based protocol consists of assembling short-reads into longer reads, which will be subsequently classified by a traditional taxonomic classifier. We discovered that most classifiers made fewer predictions with longer reads and that they achieved higher classification performance on synthetic metagenomic data. Generally, we observed a significant increase in precision, while having similar recall rates. On real data, we observed similar characteristics that suggest that the classifiers might have similar performance of higher precision with similar recall with longer reads. We have shown a noticeable difference in performance between assembly-based and assembly-free taxonomic classification. This finding strongly suggests that classifying species in metagenomic environments can be achieved with higher overall performance simply by assembling short reads. Further, it also suggests that long-read technologies might be better for species classification.
Topics: Computational Biology; DNA Barcoding, Taxonomic; Metagenome; Metagenomics; Reproducibility of Results; Workflow
PubMed: 32824429
DOI: 10.3390/genes11080946 -
Scientific Data Nov 2023Urban lakes provide multiple benefits to society while influencing life quality. Moreover, lakes and their microbiomes are sentinels of anthropogenic impact and can be...
Urban lakes provide multiple benefits to society while influencing life quality. Moreover, lakes and their microbiomes are sentinels of anthropogenic impact and can be used for natural resource management and planning. Here, we release original metagenomic data from several well-characterized and anthropogenically impacted eutrophic lakes in the vicinity of Stockholm (Sweden). Our goal was to collect representative microbial community samples and use shotgun sequencing to provide a broad view on microbial diversity of productive urban lakes. Our dataset has an emphasis on Lake Mälaren as a major drinking water reservoir under anthropogenic impact. This dataset includes short-read sequence data and metagenome assemblies from each of 17 samples collected from eutrophic lakes near the greater Stockholm area. We used genome-resolved metagenomics and obtained 2378 metagenome assembled genomes that de-replicated into 514 species representative genomes. This dataset adds new datapoints to previously sequenced lakes and it includes the first sequenced set of metagenomes from Lake Mälaren. Our dataset serves as a baseline for future monitoring of drinking water reservoirs and urban lakes.
Topics: Bacteria; Drinking Water; Lakes; Metagenome; Metagenomics; Sweden
PubMed: 37978200
DOI: 10.1038/s41597-023-02722-x