-
Genome Research Mar 2020Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically,... (Review)
Review
Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
Topics: Data Curation; Genome, Archaeal; Genome, Bacterial; Metagenome; Metagenomics
PubMed: 32188701
DOI: 10.1101/gr.258640.119 -
Philosophical Transactions of the Royal... Dec 2020Today massive amounts of sequenced metagenomic and metatranscriptomic data from different ecological niches and environmental locations are available. Scientific... (Review)
Review
Today massive amounts of sequenced metagenomic and metatranscriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding of biological systems analysed from different perspectives is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data are of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data have been successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects. This article is part of the theme issue 'Integrative research perspectives on marine conservation'.
Topics: Conservation of Natural Resources; Ecosystem; Gene Expression Profiling; Metagenome; Metagenomics; Models, Biological; Transcriptome
PubMed: 33131436
DOI: 10.1098/rstb.2019.0448 -
Journal of Food Protection Mar 2022Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is... (Review)
Review
ABSTRACT
Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is revolutionizing the identification and analysis of foodborne microbial pathogens, facilitating expedited detection and mitigation of foodborne outbreaks, improving public health outcomes, and limiting costly recalls. However, next-generation sequencing is still anchored in the traditional laboratory practice of the selection and culture of a single isolate. Metagenomic-based approaches, including metabarcoding and shotgun and long-read metagenomics, are part of the next disruptive revolution in food safety diagnostics and offer the potential to directly identify entire microbial communities in a single food, ingredient, or environmental sample. In this review, metagenomic-based approaches are introduced and placed within the context of conventional detection and diagnostic techniques, and essential considerations for undertaking metagenomic assays and data analysis are described. Recent applications of the use of metagenomics for food safety are discussed alongside current limitations and knowledge gaps and new opportunities arising from the use of this technology.
Topics: Food Safety; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Whole Genome Sequencing
PubMed: 34706052
DOI: 10.4315/JFP-21-301 -
Microbiology Spectrum Feb 2023Lower respiratory infection (LRI) is the most fatal communicable disease, with only a few pathogens identified. Metagenomic next-generation sequencing (mNGS), as an...
Lower respiratory infection (LRI) is the most fatal communicable disease, with only a few pathogens identified. Metagenomic next-generation sequencing (mNGS), as an unbiased, hypothesis-free, and culture-independent method, theoretically enables the detection of all pathogens in a single test. In this study, we developed and validated a DNA-based mNGS method for the diagnosis of LRIs from bronchoalveolar lavage fluid (BALF). We prepared simulated data sets and published raw data sets from patients to evaluate the performance of our in-house bioinformatics pipeline and compared it with the popular metagenomics pipeline Kraken2-Bracken. In addition, a series of biological microbial communities were used to comprehensively validate the performance of our mNGS assay. Sixty-nine clinical BALF samples were used for clinical validation to determine the accuracy. The in-house bioinformatics pipeline validation showed a recall of 88.03%, precision of 99.14%, and F1 score of 92.26% via single-genome simulated data. Mock microbial community and clinical metagenomic data showed that the in-house pipeline has a stricter cutoff value than Kraken2-Bracken, which could prevent false-positive detection by the bioinformatics pipeline. The validation for the whole mNGS pipeline revealed that overwhelming human DNA, long-term storage at 4°C, and repeated freezing-thawing reduced the analytical sensitivity of the assay. The mNGS assay showed a sensitivity of 95.18% and specificity of 91.30% for pathogen detection from BALF samples. This study comprehensively demonstrated the analytical performance of this laboratory-developed mNGS assay for pathogen detection from BALF, which contributed to the standardization of this technology. To our knowledge, this study is the first to comprehensively validate the mNGS assay for the diagnosis of LRIs from BALF. This study exhibited a ready-made example for clinical laboratories to prepare reference materials and develop comprehensive validation schemes for their in-house mNGS assays, which would accelerate the standardization of mNGS testing.
Topics: Humans; Metagenome; Respiratory Tract Infections; Microbiota; High-Throughput Nucleotide Sequencing; Metagenomics
PubMed: 36507666
DOI: 10.1128/spectrum.03812-22 -
Cardiovascular Research Feb 2021
Topics: Metagenome; Metagenomics; Microbiota
PubMed: 32569375
DOI: 10.1093/cvr/cvaa175 -
Scientific Data Jun 2022With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public...
With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public repositories is increasing exponentially. However, owing to the large and decentralized nature of the data, it is still difficult for users to mine, compare, and analyze the data. The animal metagenome database (AnimalMetagenome DB) integrates metagenomic sequencing data with host information, making it easier for users to find data of interest. The AnimalMetagenome DB is designed to contain all public metagenomic data from animals, and the data are divided into domestic and wild animal categories. Users can browse, search, and download animal metagenomic data of interest based on different attributes of the metadata such as animal species, sample site, study purpose, and DNA extraction method. The AnimalMetagenome DB version 1.0 includes metadata for 82,097 metagenomes from 4 domestic animals (pigs, bovines, horses, and sheep) and 540 wild animals. These metagenomes cover 15 years of experiments, 73 countries, 1,044 studies, 63,214 amplicon sequencing data, and 10,672 whole genome sequencing data. All data in the database are hosted and available in figshare https://doi.org/10.6084/m9.figshare.19728619 .
Topics: Animals; Cattle; Databases, Factual; High-Throughput Nucleotide Sequencing; Horses; Metadata; Metagenome; Metagenomics; Sheep; Swine
PubMed: 35710683
DOI: 10.1038/s41597-022-01444-w -
Molecules (Basel, Switzerland) May 2021Microorganisms are highly regarded as a prominent source of natural products that have significant importance in many fields such as medicine, farming, environmental... (Review)
Review
Microorganisms are highly regarded as a prominent source of natural products that have significant importance in many fields such as medicine, farming, environmental safety, and material production. Due to this, only tiny amounts of microorganisms can be cultivated under standard laboratory conditions, and the bulk of microorganisms in the ecosystems are still unidentified, which restricts our knowledge of uncultured microbial metabolism. However, they could hypothetically provide a large collection of innovative natural products. Culture-independent metagenomics study has the ability to address core questions in the potential of NP production by cloning and analysis of microbial DNA derived directly from environmental samples. Latest advancements in next generation sequencing and genetic engineering tools for genome assembly have broadened the scope of metagenomics to offer perspectives into the life of uncultured microorganisms. In this review, we cover the methods of metagenomic library construction, and heterologous expression for the exploration and development of the environmental metabolome and focus on the function-based metagenomics, sequencing-based metagenomics, and single-cell metagenomics of uncultured microorganisms.
Topics: Bacteria; Biological Products; Ecosystem; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics
PubMed: 34067778
DOI: 10.3390/molecules26102977 -
Nature Jan 2022Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats,...
Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats, little is known about the distribution of genes across the global biosphere, with implications for human and planetary health. Here we constructed a non-redundant gene catalogue of 303 million species-level genes (clustered at 95% nucleotide identity) from 13,174 publicly available metagenomes across 14 major habitats and use it to show that most genes are specific to a single habitat. The small fraction of genes found in multiple habitats is enriched in antibiotic-resistance genes and markers for mobile genetic elements. By further clustering these species-level genes into 32 million protein families, we observed that a small fraction of these families contain the majority of the genes (0.6% of families account for 50% of the genes). The majority of species-level genes and protein families are rare. Furthermore, species-level genes, and in particular the rare ones, show low rates of positive (adaptive) selection, supporting a model in which most genetic variability observed within each protein family is neutral or nearly neutral.
Topics: Anti-Bacterial Agents; Drug Resistance, Microbial; Ecosystem; Humans; Metagenome; Metagenomics
PubMed: 34912116
DOI: 10.1038/s41586-021-04233-4 -
Journal of Biotechnology Nov 2017Metagenomics has proven to be one of the most important research fields for microbial ecology during the last decade. Starting from 16S rRNA marker gene analysis for the... (Review)
Review
Metagenomics has proven to be one of the most important research fields for microbial ecology during the last decade. Starting from 16S rRNA marker gene analysis for the characterization of community compositions to whole metagenome shotgun sequencing which additionally allows for functional analysis, metagenomics has been applied in a wide spectrum of research areas. The cost reduction paired with the increase in the amount of data due to the advent of next-generation sequencing led to a rapidly growing demand for bioinformatic software in metagenomics. By now, a large number of tools that can be used to analyze metagenomic datasets has been developed. The Bielefeld-Gießen center for microbial bioinformatics as part of the German Network for Bioinformatics Infrastructure bundles and imparts expert knowledge in the analysis of metagenomic datasets, especially in research on microbial communities involved in anaerobic digestion residing in biogas reactors. In this review, we give an overview of the field of metagenomics, introduce into important bioinformatic tools and possible workflows, accompanied by application examples of biogas surveys successfully conducted at the Center for Biotechnology of Bielefeld University.
Topics: Anaerobiosis; Biofuels; Computational Biology; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics
PubMed: 28823476
DOI: 10.1016/j.jbiotec.2017.08.012 -
Bioinformatics (Oxford, England) Sep 2022Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial...
MOTIVATION
Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning.
RESULTS
We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning.
AVAILABILITY AND IMPLEMENTATION
GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Sequence Analysis, DNA; Metagenomics; Metagenome; Genome, Microbial; Algorithms
PubMed: 35972375
DOI: 10.1093/bioinformatics/btac557