-
F1000Research 2021Metagenomic sequencing allows large-scale identification and genomic characterization. Binning is the process of recovering genomes from complex mixtures of sequence...
Metagenomic sequencing allows large-scale identification and genomic characterization. Binning is the process of recovering genomes from complex mixtures of sequence fragments (metagenome contigs) of unknown bacteria and archaeal species. Assessing the quality of genomes recovered from metagenomes requires the use of complex pipelines involving many independent steps, often difficult to reproduce and maintain. A comprehensive, automated and easy-to-use computational workflow for the quality assessment of draft prokaryotic genomes, based on container technology, would greatly improve reproducibility and reusability of published results. We present metashot/prok-quality, a container-enabled Nextflow pipeline for quality assessment and genome dereplication. The metashot/prok-quality tool produces genome quality reports that are compliant with the Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard, and can run out-of-the-box on any platform that supports Nextflow, Docker or Singularity, including computing clusters or batch infrastructures in the cloud. metashot/prok-quality is part of the metashot collection of analysis pipelines. Workflow and documentation are available under GPL3 licence on GitHub.
Topics: Archaea; Metagenome; Metagenomics; Prokaryotic Cells; Reproducibility of Results
PubMed: 35136576
DOI: 10.12688/f1000research.54418.1 -
Methods in Molecular Biology (Clifton,... 2024Viral metagenomics is one of the most widely used approaches to study viral population genomics. With the recent development of bioinformatic tools, the number of...
Viral metagenomics is one of the most widely used approaches to study viral population genomics. With the recent development of bioinformatic tools, the number of molecular biological methods, programs, and software to analyze viral metagenome data have greatly increased. Here, we describe the basic analysis workflow along with bioinformatic tools that can be used to analyze viral metagenome data. Although this chapter assumes that the viral metagenome data are prepared from the freshwater samples and are subjected to dsDNA sequencing, the protocol can be applied and modified for other types of metagenome data collected from a variety of sources.
Topics: Metagenome; Genome, Viral; Metagenomics; Fresh Water; Viruses
PubMed: 38060116
DOI: 10.1007/978-1-0716-3515-5_3 -
Microbiology Spectrum Dec 2022Antibiotic resistance genes (ARGs) pose a serious threat to public health and ecological security in the 21st century. However, the resistome only accounts for a tiny...
Antibiotic resistance genes (ARGs) pose a serious threat to public health and ecological security in the 21st century. However, the resistome only accounts for a tiny fraction of metagenomic content, which makes it difficult to investigate low-abundance ARGs in various environmental settings. Thus, a highly sensitive, accurate, and comprehensive method is needed to describe ARG profiles in complex metagenomic samples. In this study, we established a high-throughput sequencing method based on targeted amplification, which could simultaneously detect ARGs ( = 251), mobile genetic element genes ( = 8), and metal resistance genes ( = 19) in metagenomes. The performance of amplicon sequencing was compared with traditional metagenomic shotgun sequencing (MetaSeq). A total of 1421 primer pairs were designed, achieving extremely high coverage of target genes. The amplicon sequencing significantly improved the recovery of target ARGs (~9 × 10-fold), with higher sensitivity and diversity, less cost, and computation burden. Furthermore, targeted enrichment allows deep scanning of single nucleotide polymorphisms (SNPs), and elevated SNPs detection was shown in this study. We further performed this approach for 48 environmental samples (37 feces, 20 soils, and 7 sewage) and 16 clinical samples. All samples tested in this study showed high diversity and recovery of targeted genes. Our results demonstrated that the approach could be applied to various metagenomic samples and served as an efficient tool in the surveillance and evolution assessment of ARGs. Access to the resistome using the enrichment method validated in this study enabled the capture of low-abundance resistomes while being less costly and time-consuming, which can greatly advance our understanding of local and global resistome dynamics. ARGs, an increasing global threat to human health, can be transferred into health-related microorganisms in the environment by horizontal gene transfer, posing a serious threat to public health. Advancing profiling methods are needed for monitoring and predicting the potential risks of ARGs in metagenomes. Our study described a customized amplicon sequencing assay that could enable a high-throughput, targeted, in-depth analysis of ARGs and detect a low-abundance portion of resistomes. This method could serve as an efficient tool to assess the variation and evolution of specific ARGs in the clinical and natural environment.
Topics: Humans; Metagenome; Genes, Bacterial; Anti-Bacterial Agents; Drug Resistance, Microbial; Sewage; Metagenomics
PubMed: 36287061
DOI: 10.1128/spectrum.02297-22 -
Briefings in Bioinformatics May 2023Recovering high-quality metagenome-assembled genomes (HQ-MAGs) is critical for exploring microbial compositions and microbe-phenotype associations. However, multiple...
Recovering high-quality metagenome-assembled genomes (HQ-MAGs) is critical for exploring microbial compositions and microbe-phenotype associations. However, multiple sequencing platforms and computational tools for this purpose may confuse researchers and thus call for extensive evaluation. Here, we systematically evaluated a total of 40 combinations of popular computational tools and sequencing platforms (i.e. strategies), involving eight assemblers, eight metagenomic binners and four sequencing technologies, including short-, long-read and metaHiC sequencing. We identified the best tools for the individual tasks (e.g. the assembly and binning) and combinations (e.g. generating more HQ-MAGs) depending on the availability of the sequencing data. We found that the combination of the hybrid assemblies and metaHiC-based binning performed best, followed by the hybrid and long-read assemblies. More importantly, both long-read and metaHiC sequencings link more mobile elements and antibiotic resistance genes to bacterial hosts and improve the quality of public human gut reference genomes with 32% (34/105) HQ-MAGs that were either of better quality than those in the Unified Human Gastrointestinal Genome catalog version 2 or novel.
Topics: Humans; Metagenomics; Sequence Analysis, DNA; Metagenome; Bacteria; Gastrointestinal Tract
PubMed: 37114640
DOI: 10.1093/bib/bbad162 -
Frontiers in Cellular and Infection... 2023The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research.
INTRODUCTION
The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research.
METHOD AND RESULTS
Firstly, we visualized the chaotic game representation (CGR) of simulated metagenomes and real metagenomes. We find that metagenomes are visualized with self-similarity. Then we defined and calculated the multifractal dimension for the visualized plot of simulated and real metagenomes, respectively. By analyzing the Pearson correlation coefficients between the multifractal dimension and the traditional species diversity index, we obtain that the correlation coefficients between the multifractal dimension and the species richness index and Shannon diversity index reached the maximum value when q = 0, 1, and the correlation coefficient between the multifractal dimension and the Simpson diversity index reached the maximum value when q = 5. Finally, we apply our method to real metagenomes of the gut microbiota of 100 infants who are newborn and 4 and 12 months old. The results show that the multifractal dimensions of an infant's gut microbiomes can distinguish age differences.
CONCLUSION AND DISCUSSION
There is self-similarity among the CGRs of WGS of metagenomes, and the multifractal spectrum is an important characteristic for metagenomes. The traditional diversity indicators can be unified under the framework of multifractal analysis. These results coincided with similar results in macrobial ecology. The multifractal spectrum of infants' gut microbiomes are related to the development of the infants.
Topics: Humans; Infant; Infant, Newborn; Metagenome; Microbiota; Gastrointestinal Microbiome; Metagenomics; Ecology
PubMed: 36779183
DOI: 10.3389/fcimb.2023.1117421 -
Functional relevance of microbiome signatures: The correlation era requires tools for consolidation.The Journal of Allergy and Clinical... Apr 2017Compelling research over the past decade identified a fundamental role of the intestinal microbiome on human health. Compositional and functional changes of this... (Review)
Review
Compelling research over the past decade identified a fundamental role of the intestinal microbiome on human health. Compositional and functional changes of this microbial ecosystem are correlated with a variety of human pathologies. Metagenomic resolution and bioinformatic tools considerably improved, allowing even strain-level analysis. However, the search for microbial risk patterns in human cohorts is often confounded by environmental factors (eg, medication) and host status (eg, disease relapse), questioning the prognostic and therapeutic value of the currently available information. In addition to a better stratification of human phenotypes, the implementation of standardized protocols for sampling and analysis is needed to improve the reproducibility and comparability of microbiome signatures at a meaningful taxonomic resolution. At the level of mechanistic understanding, the molecular integration of pleiotropic signals coming from this complex and dynamically changing ecosystem is one of the biggest challenges in this field. The first successful attempts to apply reverse genetics based on the available metagenomic information yielded identification of small molecules and metabolites with functional relevance for microbe-host interactions. Further expansion on the isolation of bacteria from the "unculturable biomass" will help characterize microbiome signatures in model systems, finally aiming at the development of clinically relevant synthetic consortia with safe and functionally well-defined strains. In conclusion and beyond reasonable enthusiasm, the mechanistic implementation and clinical relevance of microbiome alterations on disease susceptibility is still in its infancy, but the integration of all the above-mentioned strategies will help overcome the correlation era in microbiome research and lead to a rational evaluation of clinical strategies relevant for targeted microbial intervention.
Topics: Animals; Gastrointestinal Microbiome; Humans; Metagenome; Metagenomics; Models, Biological
PubMed: 28390576
DOI: 10.1016/j.jaci.2017.02.010 -
Methods (San Diego, Calif.) Jun 2016The study of metagenomics has been much benefited from low-cost and high-throughput sequencing technologies, yet the tremendous amount of data generated make analysis... (Review)
Review
The study of metagenomics has been much benefited from low-cost and high-throughput sequencing technologies, yet the tremendous amount of data generated make analysis like de novo assembly to consume too much computational resources. In late 2014 we released MEGAHIT v0.1 (together with a brief note of Li et al. (2015) [1]), which is the first NGS metagenome assembler that can assemble genome sequences from metagenomic datasets of hundreds of Giga base-pairs (bp) in a time- and memory-efficient manner on a single server. The core of MEGAHIT is an efficient parallel algorithm for constructing succinct de Bruijn Graphs (SdBG), implemented on a graphical processing unit (GPU). The software has been well received by the assembly community, and there is interest in how to adapt the algorithms to integrate popular assembly practices so as to improve the assembly quality, as well as how to speed up the software using better CPU-based algorithms (instead of GPU). In this paper we first describe the details of the core algorithms in MEGAHIT v0.1, and then we show the new modules to upgrade MEGAHIT to version v1.0, which gives better assembly quality, runs faster and uses less memory. For the Iowa Prairie Soil dataset (252Gbp after quality trimming), the assembly quality of MEGAHIT v1.0, when compared with v0.1, has a significant improvement, namely, 36% increase in assembly size and 23% in N50. More interestingly, MEGAHIT v1.0 is no slower than before (even running with the extra modules). This is primarily due to a new CPU-based algorithm for SdBG construction that is faster and requires less memory. Using CPU only, MEGAHIT v1.0 can assemble the Iowa Prairie Soil sample in about 43h, reducing the running time of v0.1 by at least 25% and memory usage by up to 50%. MEGAHIT v1.0, exhibiting a smaller memory footprint, can process even larger datasets. The Kansas Prairie Soil sample (484Gbp), the largest publicly available dataset, can now be assembled using no more than 500GB of memory in 7.5days. The assemblies of these datasets (and other large metgenomic datasets), as well as the software, are available at the website https://hku-bal.github.io/megabox.
Topics: Algorithms; Datasets as Topic; Metagenome; Metagenomics; Sequence Analysis; Software; Soil
PubMed: 27012178
DOI: 10.1016/j.ymeth.2016.02.020 -
Nature Biotechnology May 2021Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an...
Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.
Topics: Genome, Viral; Metagenome; Metagenomics; Molecular Sequence Annotation; Software
PubMed: 33349699
DOI: 10.1038/s41587-020-00774-7 -
Genomics, Proteomics & Bioinformatics Oct 2015The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for... (Review)
Review
The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint.
Topics: Cluster Analysis; Computational Biology; High-Throughput Nucleotide Sequencing; Humans; Information Storage and Retrieval; Internet; Metagenome; Metagenomics; Software
PubMed: 26602607
DOI: 10.1016/j.gpb.2015.10.003 -
Bioinformatics (Oxford, England) Apr 2016During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome...
UNLABELLED
During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets.
AVAILABILITY AND IMPLEMENTATION
http://bioinf.spbau.ru/metaquast
CONTACT
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Algorithms; Genomic Structural Variation; Metagenome; Metagenomics; Software
PubMed: 26614127
DOI: 10.1093/bioinformatics/btv697