-
Scientific Data Jun 2022With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public...
With the rapid development of high-throughput sequencing technology, the amount of metagenomic data (including both 16S and whole-genome sequencing data) in public repositories is increasing exponentially. However, owing to the large and decentralized nature of the data, it is still difficult for users to mine, compare, and analyze the data. The animal metagenome database (AnimalMetagenome DB) integrates metagenomic sequencing data with host information, making it easier for users to find data of interest. The AnimalMetagenome DB is designed to contain all public metagenomic data from animals, and the data are divided into domestic and wild animal categories. Users can browse, search, and download animal metagenomic data of interest based on different attributes of the metadata such as animal species, sample site, study purpose, and DNA extraction method. The AnimalMetagenome DB version 1.0 includes metadata for 82,097 metagenomes from 4 domestic animals (pigs, bovines, horses, and sheep) and 540 wild animals. These metagenomes cover 15 years of experiments, 73 countries, 1,044 studies, 63,214 amplicon sequencing data, and 10,672 whole genome sequencing data. All data in the database are hosted and available in figshare https://doi.org/10.6084/m9.figshare.19728619 .
Topics: Animals; Cattle; Databases, Factual; High-Throughput Nucleotide Sequencing; Horses; Metadata; Metagenome; Metagenomics; Sheep; Swine
PubMed: 35710683
DOI: 10.1038/s41597-022-01444-w -
BMC Genomics Jan 2022Advances in DNA sequencing technologies have transformed our capacity to perform life science research, decipher the dynamics of complex soil microbial communities and...
BACKGROUND
Advances in DNA sequencing technologies have transformed our capacity to perform life science research, decipher the dynamics of complex soil microbial communities and exploit them for plant disease management. However, soil is a complex conglomerate, which makes functional metagenomics studies very challenging.
RESULTS
Metagenomes were assembled by long-read (PacBio, PB), short-read (Illumina, IL), and mixture of PB and IL (PI) sequencing of soil DNA samples were compared. Ortholog analyses and functional annotation revealed that the PI approach significantly increased the contig length of the metagenomic sequences compared to IL and enlarged the gene pool compared to PB. The PI approach also offered comparable or higher species abundance than either PB or IL alone, and showed significant advantages for studying natural product biosynthetic genes in the soil microbiomes.
CONCLUSION
Our results provide an effective strategy for combining long and short-read DNA sequencing data to explore and distill the maximum information out of soil metagenomics.
Topics: High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Sequence Analysis, DNA; Soil
PubMed: 34996356
DOI: 10.1186/s12864-021-08260-3 -
Virus Research Jul 2017Viruses are the most abundant biological entities on Earth, exceeding bacteria in most of the ecosystems. Specially in oceans, viruses are thought to be the major... (Review)
Review
Viruses are the most abundant biological entities on Earth, exceeding bacteria in most of the ecosystems. Specially in oceans, viruses are thought to be the major planktonic predators shaping microorganism communities and controlling ocean biological capacity. Plankton lysis by viruses plays an important role in ocean nutrient and energy cycles. Viral metagenomics has emerged as a powerful tool to uncover viral diversity in aquatic ecosystems through the use of Next Generation Sequencing. However, many of the commonly used viral sample preparation steps have several important biases that must be considered to avoid a misinterpretation of the results. In addition to biases caused by the purification of virus particles, viral DNA/RNA amplification and the preparation of genomic libraries could also introduce biases, and a detailed knowledge about such protocols is required. In this review, the main steps in the viral metagenomic workflow are described paying special attention to the potential biases introduced by each one.
Topics: Genetic Variation; Genome, Viral; Geography; Metagenome; Metagenomics; Viruses; Water Microbiology
PubMed: 27889617
DOI: 10.1016/j.virusres.2016.11.021 -
Methods in Molecular Biology (Clifton,... 2023Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as...
Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as carbohydrate-active enzymes (CAZymes). However, the inability to culture the majority of microorganisms that exist in natural ecosystems restricts access to potentially novel bacteria and beneficial CAZymes. While commonplace molecular-based culture-independent methods such as metagenomics enable researchers to study microbial communities directly from environmental samples, recent progress in long-read sequencing technologies are advancing the field. We outline key methodological stages that are required as well as describe specific protocols that are currently used for long-read metagenomic projects dedicated to CAZyme discovery.
Topics: Metagenomics; Microbiota; Metagenome; Carbohydrates; High-Throughput Nucleotide Sequencing
PubMed: 37149537
DOI: 10.1007/978-1-0716-3151-5_19 -
Bioinformatics (Oxford, England) Jan 2023The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number...
SUMMARY
The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number variants in microbial populations. Here, we present MIDAS2, which addresses the computational challenges presented by increasingly large reference genome databases, while adding functionality for building custom databases and leveraging paired-end reads to improve SNV accuracy. This fast and scalable reengineering of the MIDAS pipeline enables thousands of metagenomic samples to be efficiently genotyped.
AVAILABILITY AND IMPLEMENTATION
The source code is available at https://github.com/czbiohub/MIDAS2.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metagenome; Software; Metagenomics; Genotype; Databases, Factual
PubMed: 36321886
DOI: 10.1093/bioinformatics/btac713 -
Bioinformatics (Oxford, England) Sep 2022Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial...
MOTIVATION
Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning.
RESULTS
We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning.
AVAILABILITY AND IMPLEMENTATION
GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Sequence Analysis, DNA; Metagenomics; Metagenome; Genome, Microbial; Algorithms
PubMed: 35972375
DOI: 10.1093/bioinformatics/btac557 -
Journal of Microbiology and... Sep 2020The identification of bacterial pathogens to humans is critical for environmental microbial risk assessment. However, current methods for identifying pathogens in...
The identification of bacterial pathogens to humans is critical for environmental microbial risk assessment. However, current methods for identifying pathogens in environmental samples are limited in their ability to detect highly diverse bacterial communities and accurately differentiate pathogens from commensal bacteria. In the present study, we suggest an improved approach using a combination of identification results obtained from multiple databases, including the multilocus sequence typing (MLST) database, virulence factor database (VFDB), and pathosystems resource integration center (PATRIC) databases to resolve current challenges. By integrating the identification results from multiple databases, potential bacterial pathogens in metagenomes were identified and classified into eight different groups. Based on the distribution of genes in each group, we proposed an equation to calculate the metagenomic pathogen identification index (MPII) of each metagenome based on the weighted abundance of identified sequences in each database. We found that the accuracy of pathogen identification was improved by using combinations of multiple databases compared to that of individual databases. When the approach was applied to environmental metagenomes, metagenomes associated with activated sludge were estimated with higher MPII than other environments (, drinking water, ocean water, ocean sediment, and freshwater sediment). The calculated MPII values were statistically distinguishable among different environments (<0.05). These results demonstrate that the suggested approach allows more for more accurate identification of the pathogens associated with metagenomes.
Topics: Bacteria; Databases, Genetic; Environmental Microbiology; Humans; Metagenome; Metagenomics; Microbiota; Systems Integration
PubMed: 32627750
DOI: 10.4014/jmb.2005.05033 -
Bioinformatics (Oxford, England) May 2021The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in...
MOTIVATION
The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes.
RESULTS
We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity.
CONCLUSION
OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues.
AVAILABILITYAND IMPLEMENTATION
Code is made available on Github (https://github.com/Marleen1/OGRE).
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Algorithms; Cluster Analysis; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 32871010
DOI: 10.1093/bioinformatics/btaa760 -
Nature Microbiology Nov 2017Challenges in cultivating microorganisms have limited the phylogenetic diversity of currently available microbial genomes. This is being addressed by advances in...
Challenges in cultivating microorganisms have limited the phylogenetic diversity of currently available microbial genomes. This is being addressed by advances in sequencing throughput and computational techniques that allow for the cultivation-independent recovery of genomes from metagenomes. Here, we report the reconstruction of 7,903 bacterial and archaeal genomes from >1,500 public metagenomes. All genomes are estimated to be ≥50% complete and nearly half are ≥90% complete with ≤5% contamination. These genomes increase the phylogenetic diversity of bacterial and archaeal genome trees by >30% and provide the first representatives of 17 bacterial and three archaeal candidate phyla. We also recovered 245 genomes from the Patescibacteria superphylum (also known as the Candidate Phyla Radiation) and find that the relative diversity of this group varies substantially with different protein marker sets. The scale and quality of this data set demonstrate that recovering genomes from metagenomes provides an expedient path forward to exploring microbial dark matter.
Topics: Archaea; Bacteria; Genome, Archaeal; Genome, Bacterial; Metagenome; Metagenomics; Phylogeny; Sequence Analysis, DNA
PubMed: 28894102
DOI: 10.1038/s41564-017-0012-7 -
Journal of Food Protection Mar 2022Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is... (Review)
Review
ABSTRACT
Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is revolutionizing the identification and analysis of foodborne microbial pathogens, facilitating expedited detection and mitigation of foodborne outbreaks, improving public health outcomes, and limiting costly recalls. However, next-generation sequencing is still anchored in the traditional laboratory practice of the selection and culture of a single isolate. Metagenomic-based approaches, including metabarcoding and shotgun and long-read metagenomics, are part of the next disruptive revolution in food safety diagnostics and offer the potential to directly identify entire microbial communities in a single food, ingredient, or environmental sample. In this review, metagenomic-based approaches are introduced and placed within the context of conventional detection and diagnostic techniques, and essential considerations for undertaking metagenomic assays and data analysis are described. Recent applications of the use of metagenomics for food safety are discussed alongside current limitations and knowledge gaps and new opportunities arising from the use of this technology.
Topics: Food Safety; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Whole Genome Sequencing
PubMed: 34706052
DOI: 10.4315/JFP-21-301