-
Methods in Molecular Biology (Clifton,... 2024Metagenome-assembled genomes, or MAGs, are genomes retrieved from metagenome datasets. In the vast majority of cases, MAGs are genomes from prokaryotic species that have...
Metagenome-assembled genomes, or MAGs, are genomes retrieved from metagenome datasets. In the vast majority of cases, MAGs are genomes from prokaryotic species that have not been isolated or cultivated in the lab. They, therefore, provide us with information on these species that are impossible to obtain otherwise, at least until new cultivation methods are devised. Thanks to improvements and cost reductions of DNA sequencing technologies and growing interest in microbial ecology, the rise in number of MAGs in genome repositories has been exponential. This chapter covers the basics of MAG retrieval and processing and provides a practical step-by-step guide using a real dataset and state-of-the-art tools for MAG analysis and comparison.
Topics: Metagenome; Metagenomics; Software; Computational Biology; Databases, Genetic; Sequence Analysis, DNA; Genome, Bacterial
PubMed: 38819559
DOI: 10.1007/978-1-0716-3838-5_6 -
Microbiology Spectrum Aug 2023Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life....
Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life. Metagenome coassembly, in which many metagenomic samples from an environment are simultaneously analyzed to infer the underlying genomes' sequences, is an essential tool for achieving this goal. We applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 terabases (Tbp) of metagenome data from a tropical soil in the Luquillo Experimental Forest (LEF), Puerto Rico. The resulting coassembly yielded 39 high-quality (>90% complete, <5% contaminated, with predicted 23S, 16S, and 5S rRNA genes and ≥18 tRNAs) metagenome-assembled genomes (MAGs), including two from the candidate phylum . Another 268 medium-quality (≥50% complete, <10% contaminated) MAGs were extracted, including the candidate phyla , , and . In total, 307 medium- or higher-quality MAGs were assigned to 23 phyla, compared to 294 MAGs assigned to nine phyla in the same samples individually assembled. The low-quality (<50% complete, <10% contaminated) MAGs from the coassembly revealed a 49% complete rare biosphere microbe from the candidate phylum FCPU426 among other low-abundance microbes, an 81% complete fungal genome from the phylum Ascomycota, and 30 partial eukaryotic MAGs with ≥10% completeness, possibly representing protist lineages. A total of 22,254 viruses, many of them low abundance, were identified. Estimation of metagenome coverage and diversity indicates that we may have characterized ≥87.5% of the sequence diversity in this humid tropical soil and indicates the value of future terabase-scale sequencing and coassembly of complex environments. Petabases of reads are being produced by environmental metagenome sequencing. An essential step in analyzing these data is metagenome assembly, the computational reconstruction of genome sequences from microbial communities. "Coassembly" of metagenomic sequence data, in which multiple samples are assembled together, enables more complete detection of microbial genomes in an environment than "multiassembly," in which samples are assembled individually. To demonstrate the potential for coassembling terabases of metagenome data to drive biological discovery, we applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 Tbp of reads from a humid tropical soil environment. The resulting coassembly, its functional annotation, and analysis are presented here. The coassembly yielded more, and phylogenetically more diverse, microbial, eukaryotic, and viral genomes than the multiassembly of the same data. Our resource may facilitate the discovery of novel microbial biology in tropical soils and demonstrates the value of terabase-scale metagenome sequencing.
Topics: Soil; Microbiota; Bacteria; Metagenome; Genome, Viral; Metagenomics
PubMed: 37310219
DOI: 10.1128/spectrum.00200-23 -
Nature Biotechnology Nov 2023Metagenomic assembly enables new organism discovery from microbial communities, but it can only capture few abundant organisms from most metagenomes. Here we present...
Metagenomic assembly enables new organism discovery from microbial communities, but it can only capture few abundant organisms from most metagenomes. Here we present MetaPhlAn 4, which integrates information from metagenome assemblies and microbial isolate genomes for more comprehensive metagenomic taxonomic profiling. From a curated collection of 1.01 M prokaryotic reference and metagenome-assembled genomes, we define unique marker genes for 26,970 species-level genome bins, 4,992 of them taxonomically unidentified at the species level. MetaPhlAn 4 explains ~20% more reads in most international human gut microbiomes and >40% in less-characterized environments such as the rumen microbiome and proves more accurate than available alternatives on synthetic evaluations while also reliably quantifying organisms with no cultured isolates. Application of the method to >24,500 metagenomes highlights previously undetected species to be strong biomarkers for host conditions and lifestyles in human and mouse microbiomes and shows that even previously uncharacterized species can be genetically profiled at the resolution of single microbial strains.
Topics: Humans; Animals; Mice; Metagenome; Microbiota; Gastrointestinal Microbiome; Metagenomics; Phylogeny
PubMed: 36823356
DOI: 10.1038/s41587-023-01688-w -
Angewandte Chemie (International Ed. in... May 2024In the ever-growing demand for sustainable ways to produce high-value small molecules, biocatalysis has come to the forefront of greener routes to these chemicals. As... (Review)
Review
In the ever-growing demand for sustainable ways to produce high-value small molecules, biocatalysis has come to the forefront of greener routes to these chemicals. As such, the need to constantly find and optimise suitable biocatalysts for specific transformations has never been greater. Metagenome mining has been shown to rapidly expand the toolkit of promiscuous enzymes needed for new transformations, without requiring protein engineering steps. If protein engineering is needed, the metagenomic candidate can often provide a better starting point for engineering than a previously discovered enzyme on the open database or from literature, for instance. In this review, we highlight where metagenomics has made substantial impact on the area of biocatalysis in recent years. We review the discovery of enzymes in previously unexplored or 'hidden' sequence space, leading to the characterisation of enzymes with enhanced properties that originate from natural selection pressures in native environments.
Topics: Metagenomics; Biocatalysis; Enzymes; Protein Engineering
PubMed: 38494442
DOI: 10.1002/anie.202402316 -
The ISME Journal Jan 2024Nearly all organisms are hosts to multiple viruses that collectively appear to be the most abundant biological entities in the biosphere. With recent advances in... (Review)
Review
Nearly all organisms are hosts to multiple viruses that collectively appear to be the most abundant biological entities in the biosphere. With recent advances in metagenomics and metatranscriptomics, the known diversity of viruses substantially expanded. Comparative analysis of these viruses using advanced computational methods culminated in the reconstruction of the evolution of major groups of viruses and enabled the construction of a virus megataxonomy, which has been formally adopted by the International Committee on Taxonomy of Viruses. This comprehensive taxonomy consists of six virus realms, which are aspired to be monophyletic and assembled based on the conservation of hallmark proteins involved in capsid structure formation or genome replication. The viruses in different major taxa substantially differ in host range and accordingly in ecological niches. In this review article, we outline the latest developments in virus megataxonomy and the recent discoveries that will likely lead to reassessment of some major taxa, in particular, split of three of the current six realms into two or more independent realms. We then discuss the correspondence between virus taxonomy and the distribution of viruses among hosts and ecological niches, as well as the abundance of viruses versus cells in different habitats. The distribution of viruses across environments appears to be primarily determined by the host ranges, i.e. the virome is shaped by the composition of the biome in a given habitat, which itself is affected by abiotic factors.
Topics: Viruses; Metagenomics; Ecology; Phylogeny; Genome, Viral
PubMed: 38365236
DOI: 10.1093/ismejo/wrad042 -
Nature Methods Jun 2024Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled... (Review)
Review
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Topics: Metagenomics; Metagenome; High-Throughput Nucleotide Sequencing; Microbiota; Humans; Sequence Analysis, DNA; Computational Biology
PubMed: 38689099
DOI: 10.1038/s41592-024-02262-1 -
Statistical normalization methods in microbiome data with application to microbiome cancer research.Gut Microbes Dec 2023Mounting evidence has shown that gut microbiome is associated with various cancers, including gastrointestinal (GI) tract and non-GI tract cancers. But microbiome data... (Review)
Review
Mounting evidence has shown that gut microbiome is associated with various cancers, including gastrointestinal (GI) tract and non-GI tract cancers. But microbiome data have unique characteristics and pose major challenges when using standard statistical methods causing results to be invalid or misleading. Thus, to analyze microbiome data, it not only needs appropriate statistical methods, but also requires microbiome data to be normalized prior to statistical analysis. Here, we first describe the unique characteristics of microbiome data and the challenges in analyzing them (Section 2). Then, we provide an overall review on the available normalization methods of 16S rRNA and shotgun metagenomic data along with examples of their applications in microbiome cancer research (Section 3). In Section 4, we comprehensively investigate how the normalization methods of 16S rRNA and shotgun metagenomic data are evaluated. Finally, we summarize and conclude with remarks on statistical normalization methods (Section 5). Altogether, this review aims to provide a broad and comprehensive view and remarks on the promises and challenges of the statistical normalization methods in microbiome data with microbiome cancer research examples.
Topics: RNA, Ribosomal, 16S; Gastrointestinal Microbiome; Microbiota; Metagenome; Research Design; Neoplasms
PubMed: 37622724
DOI: 10.1080/19490976.2023.2244139 -
The ISME Journal Oct 2023It is generally assumed that viruses outnumber cells on Earth by at least tenfold. Virus-to-microbe ratios (VMR) are largely based on counts of fluorescently labelled...
It is generally assumed that viruses outnumber cells on Earth by at least tenfold. Virus-to-microbe ratios (VMR) are largely based on counts of fluorescently labelled virus-like particles. However, these exclude intracellular viruses and potentially include false positives (DNA-containing vesicles, gene-transfer agents, unspecifically stained inert particles). Here, we develop a metagenome-based VMR estimate (mVRM) that accounts for DNA viruses across all stages of their replication cycles (virion, intracellular lytic and lysogenic) by using normalised RPKM (reads per kilobase of gene sequence per million of mapped metagenome reads) counts of the major capsid protein (MCP) genes and cellular universal single-copy genes (USCGs) as proxies for virus and cell counts, respectively. After benchmarking this strategy using mock metagenomes with increasing VMR, we inferred mVMR across different biomes. To properly estimate mVMR in aquatic ecosystems, we generated metagenomes from co-occurring cellular and viral fractions (>50 kDa-200 µm size-range) in freshwater, seawater and solar saltern ponds (10 metagenomes, 2 control metaviromes). Viruses outnumbered cells in freshwater by ~13 fold and in plankton from marine and saline waters by ~2-4 fold. However, across an additional set of 121 diverse non-aquatic metagenomes including microbial mats, microbialites, soils, freshwater and marine sediments and metazoan-associated microbiomes, viruses, on average, outnumbered cells by barely two-fold. Although viruses likely are the most diverse biological entities on Earth, their global numbers might be closer to those of cells than previously estimated.
Topics: Animals; Ecosystem; Metagenome; Viruses; DNA Viruses; Seawater
PubMed: 37169871
DOI: 10.1038/s41396-023-01431-y -
Frontiers in Cellular and Infection... 2023Infectious disease is a large burden on public health globally. Metagenomic next-generation sequencing (mNGS) has become popular as a new tool for pathogen diagnosis... (Review)
Review
BACKGROUND
Infectious disease is a large burden on public health globally. Metagenomic next-generation sequencing (mNGS) has become popular as a new tool for pathogen diagnosis with numerous advantages compared to conventional methods. Recently, research on mNGS increases yearly. However, no bibliometric analysis has systematically presented the full spectrum of this research field. Therefore, we reviewed all the publications associated with this topic and performed this study to analyze the comprehensive status and future hotspots of mNGS for infectious disease diagnosis.
METHODS
The literature was searched in the Web of Science Core Collection and screened without year or language restrictions, and the characteristics of the studies were also identified. The outcomes included publication years, study types, journals, countries, authorship, institutions, frontiers, and hotspots with trends. Statistical analysis and visualization were conducted using VOSviewer (version 1.6.16) and CiteSpace (version 6.1. R3).
RESULTS
In total, 325 studies were included in the analysis after screening. Studies were published between 2009 and 2022 with a significantly increasing number from 1 to 118. Most of the studies were original articles and case reports. and were the most commonly cited and co-cited journals. Institutions and researchers from China contributed the most to this field, followed by those from the USA. The hotspots and frontiers of these studies are pneumonia, tuberculosis, and central nervous system infections.
CONCLUSION
This study determined that mNGS is a hot topic in the diagnosis of infectious diseases with development trends and provides insights into researchers, institutions, hotspots and frontiers in mNGS, which can offer references to related researchers and future research.
Topics: Humans; Bibliometrics; High-Throughput Nucleotide Sequencing; China; Metagenome; Communicable Diseases
PubMed: 37600953
DOI: 10.3389/fcimb.2023.1112229 -
Current Opinion in Infectious Diseases Oct 2023Plasma cell-free metagenomic next-generation sequencing (cf-mNGS) is increasingly employed for the diagnosis of infection, but a consensus for optimal use has not been... (Review)
Review
PURPOSE OF REVIEW
Plasma cell-free metagenomic next-generation sequencing (cf-mNGS) is increasingly employed for the diagnosis of infection, but a consensus for optimal use has not been established. This minireview focuses on the commercially available Karius Test and is aimed at local leaders seeking to understand the complexities of cf-mNGS to make informed test utilization policies and better interpret results.
RECENT FINDINGS
Recent retrospective studies have reported how the Karius Test was applied at their institutions and identified areas of potential patient benefit. In addition, substantive studies have reported how this test performs in specific indications, particularly invasive fungal disease, endovascular infection and lower respiratory infection.
SUMMARY
Successfully integrating plasma cf-mNGS requires careful assessment of performance in the specific applications and patient populations in which it is used. Individual institutions must independently evaluate implementation strategies and determine where diagnostic yields outweigh the potential pitfalls.
Topics: Humans; High-Throughput Nucleotide Sequencing; Invasive Fungal Infections; Metagenomics; Respiratory Tract Infections; Sensitivity and Specificity; Retrospective Studies
PubMed: 37493238
DOI: 10.1097/QCO.0000000000000942