-
Genome Research Mar 2020Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically,... (Review)
Review
Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
Topics: Data Curation; Genome, Archaeal; Genome, Bacterial; Metagenome; Metagenomics
PubMed: 32188701
DOI: 10.1101/gr.258640.119 -
Scientific Data Jun 2022With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public...
With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public repositories is increasing exponentially. However, owing to the large and decentralized nature of the data, it is still difficult for users to mine, compare, and analyze the data. The animal metagenome database (AnimalMetagenome DB) integrates metagenomic sequencing data with host information, making it easier for users to find data of interest. The AnimalMetagenome DB is designed to contain all public metagenomic data from animals, and the data are divided into domestic and wild animal categories. Users can browse, search, and download animal metagenomic data of interest based on different attributes of the metadata such as animal species, sample site, study purpose, and DNA extraction method. The AnimalMetagenome DB version 1.0 includes metadata for 82,097 metagenomes from 4 domestic animals (pigs, bovines, horses, and sheep) and 540 wild animals. These metagenomes cover 15 years of experiments, 73 countries, 1,044 studies, 63,214 amplicon sequencing data, and 10,672 whole genome sequencing data. All data in the database are hosted and available in figshare https://doi.org/10.6084/m9.figshare.19728619 .
Topics: Animals; Cattle; Databases, Factual; High-Throughput Nucleotide Sequencing; Horses; Metadata; Metagenome; Metagenomics; Sheep; Swine
PubMed: 35710683
DOI: 10.1038/s41597-022-01444-w -
Cardiovascular Research Feb 2021
Topics: Metagenome; Metagenomics; Microbiota
PubMed: 32569375
DOI: 10.1093/cvr/cvaa175 -
Nature Biotechnology Nov 2023Metagenomic assembly enables new organism discovery from microbial communities, but it can only capture few abundant organisms from most metagenomes. Here we present...
Metagenomic assembly enables new organism discovery from microbial communities, but it can only capture few abundant organisms from most metagenomes. Here we present MetaPhlAn 4, which integrates information from metagenome assemblies and microbial isolate genomes for more comprehensive metagenomic taxonomic profiling. From a curated collection of 1.01 M prokaryotic reference and metagenome-assembled genomes, we define unique marker genes for 26,970 species-level genome bins, 4,992 of them taxonomically unidentified at the species level. MetaPhlAn 4 explains ~20% more reads in most international human gut microbiomes and >40% in less-characterized environments such as the rumen microbiome and proves more accurate than available alternatives on synthetic evaluations while also reliably quantifying organisms with no cultured isolates. Application of the method to >24,500 metagenomes highlights previously undetected species to be strong biomarkers for host conditions and lifestyles in human and mouse microbiomes and shows that even previously uncharacterized species can be genetically profiled at the resolution of single microbial strains.
Topics: Humans; Animals; Mice; Metagenome; Microbiota; Gastrointestinal Microbiome; Metagenomics; Phylogeny
PubMed: 36823356
DOI: 10.1038/s41587-023-01688-w -
Microbiology Spectrum Aug 2023Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life....
Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life. Metagenome coassembly, in which many metagenomic samples from an environment are simultaneously analyzed to infer the underlying genomes' sequences, is an essential tool for achieving this goal. We applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 terabases (Tbp) of metagenome data from a tropical soil in the Luquillo Experimental Forest (LEF), Puerto Rico. The resulting coassembly yielded 39 high-quality (>90% complete, <5% contaminated, with predicted 23S, 16S, and 5S rRNA genes and ≥18 tRNAs) metagenome-assembled genomes (MAGs), including two from the candidate phylum . Another 268 medium-quality (≥50% complete, <10% contaminated) MAGs were extracted, including the candidate phyla , , and . In total, 307 medium- or higher-quality MAGs were assigned to 23 phyla, compared to 294 MAGs assigned to nine phyla in the same samples individually assembled. The low-quality (<50% complete, <10% contaminated) MAGs from the coassembly revealed a 49% complete rare biosphere microbe from the candidate phylum FCPU426 among other low-abundance microbes, an 81% complete fungal genome from the phylum Ascomycota, and 30 partial eukaryotic MAGs with ≥10% completeness, possibly representing protist lineages. A total of 22,254 viruses, many of them low abundance, were identified. Estimation of metagenome coverage and diversity indicates that we may have characterized ≥87.5% of the sequence diversity in this humid tropical soil and indicates the value of future terabase-scale sequencing and coassembly of complex environments. Petabases of reads are being produced by environmental metagenome sequencing. An essential step in analyzing these data is metagenome assembly, the computational reconstruction of genome sequences from microbial communities. "Coassembly" of metagenomic sequence data, in which multiple samples are assembled together, enables more complete detection of microbial genomes in an environment than "multiassembly," in which samples are assembled individually. To demonstrate the potential for coassembling terabases of metagenome data to drive biological discovery, we applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 Tbp of reads from a humid tropical soil environment. The resulting coassembly, its functional annotation, and analysis are presented here. The coassembly yielded more, and phylogenetically more diverse, microbial, eukaryotic, and viral genomes than the multiassembly of the same data. Our resource may facilitate the discovery of novel microbial biology in tropical soils and demonstrates the value of terabase-scale metagenome sequencing.
Topics: Soil; Microbiota; Bacteria; Metagenome; Genome, Viral; Metagenomics
PubMed: 37310219
DOI: 10.1128/spectrum.00200-23 -
MSphere Nov 2020Continued influx of metagenome-derived proteins with misannotated taxonomy into conventional databases, including RefSeq, threatens to eliminate the value of taxonomy...
Continued influx of metagenome-derived proteins with misannotated taxonomy into conventional databases, including RefSeq, threatens to eliminate the value of taxonomy identifiers. To prevent this, urgent efforts should be undertaken by submitters of metagenomic data sets as well as by database managers.
Topics: Algorithms; Databases, Genetic; Metagenome; Metagenomics; Proteins
PubMed: 33148820
DOI: 10.1128/mSphere.00854-20 -
Cell Aug 2016Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to... (Review)
Review
Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized.
Topics: Classification; Computational Biology; Humans; Metagenome; Metagenomics; Microbiota; Models, Statistical
PubMed: 27565341
DOI: 10.1016/j.cell.2016.08.007 -
Nature Jan 2022Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats,...
Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats, little is known about the distribution of genes across the global biosphere, with implications for human and planetary health. Here we constructed a non-redundant gene catalogue of 303 million species-level genes (clustered at 95% nucleotide identity) from 13,174 publicly available metagenomes across 14 major habitats and use it to show that most genes are specific to a single habitat. The small fraction of genes found in multiple habitats is enriched in antibiotic-resistance genes and markers for mobile genetic elements. By further clustering these species-level genes into 32 million protein families, we observed that a small fraction of these families contain the majority of the genes (0.6% of families account for 50% of the genes). The majority of species-level genes and protein families are rare. Furthermore, species-level genes, and in particular the rare ones, show low rates of positive (adaptive) selection, supporting a model in which most genetic variability observed within each protein family is neutral or nearly neutral.
Topics: Anti-Bacterial Agents; Drug Resistance, Microbial; Ecosystem; Humans; Metagenome; Metagenomics
PubMed: 34912116
DOI: 10.1038/s41586-021-04233-4 -
Journal of Food Protection Mar 2022Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is... (Review)
Review
ABSTRACT
Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is revolutionizing the identification and analysis of foodborne microbial pathogens, facilitating expedited detection and mitigation of foodborne outbreaks, improving public health outcomes, and limiting costly recalls. However, next-generation sequencing is still anchored in the traditional laboratory practice of the selection and culture of a single isolate. Metagenomic-based approaches, including metabarcoding and shotgun and long-read metagenomics, are part of the next disruptive revolution in food safety diagnostics and offer the potential to directly identify entire microbial communities in a single food, ingredient, or environmental sample. In this review, metagenomic-based approaches are introduced and placed within the context of conventional detection and diagnostic techniques, and essential considerations for undertaking metagenomic assays and data analysis are described. Recent applications of the use of metagenomics for food safety are discussed alongside current limitations and knowledge gaps and new opportunities arising from the use of this technology.
Topics: Food Safety; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Whole Genome Sequencing
PubMed: 34706052
DOI: 10.4315/JFP-21-301 -
Trends in Microbiology Nov 2022Viruses are key members of Earth's microbiomes, shaping microbial community composition and metabolism. Here, we describe recent advances in 'soil viromics', that is,... (Review)
Review
Viruses are key members of Earth's microbiomes, shaping microbial community composition and metabolism. Here, we describe recent advances in 'soil viromics', that is, virus-focused metagenome and metatranscriptome analyses that offer unprecedented windows into the soil virosphere. Given the emerging picture of high soil viral activity, diversity, and dynamics over short spatiotemporal scales, we then outline key eco-evolutionary processes that we hypothesize are the major diversity drivers for soil viruses. We argue that a community effort is needed to establish a 'global soil virosphere atlas' that can be used to address the roles of viruses in soil microbiomes and terrestrial biogeochemical cycles across spatiotemporal scales.
Topics: Metagenome; Metagenomics; Soil; Soil Microbiology; Viruses
PubMed: 35644779
DOI: 10.1016/j.tim.2022.05.003