-
STAR Protocols Sep 2023The analysis of metagenomic data obtained via high-throughput DNA sequencing is primarily carried out by a dedicated binning process involving clustering contigs,...
The analysis of metagenomic data obtained via high-throughput DNA sequencing is primarily carried out by a dedicated binning process involving clustering contigs, presumably belonging to the same species. Here, we present a protocol for improving the quality of binning using BinSPreader. We describe steps for typical metagenome assembly and binning workflow. We then detail binning refining, its variants, output, and possible caveats. This protocol optimizes the process of reconstructing more complete genomes of microorganisms that make up the metagenome. For complete details on the use and execution of this protocol, please refer to Tolstoganov et al..
Topics: Metagenome; Sequence Analysis, DNA; Metagenomics; Cluster Analysis; High-Throughput Nucleotide Sequencing
PubMed: 37405923
DOI: 10.1016/j.xpro.2023.102417 -
BMC Bioinformatics May 2020Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of...
BACKGROUND
Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of interactions of microbes in environmental samples. To understand their environments more deeply, the composition of microorganisms in environmental samples has been studied using metagenomes, which are the collections of genomes of the microorganisms. Although many tools have been developed for taxonomy analysis based on different algorithms, variability of analysis outputs of existing tools from the same input metagenome datasets is the main obstacle for many researchers in this field.
RESULTS
Here, we present a novel meta-analysis tool for metagenome taxonomy analysis, called TAMA, by intelligently integrating outputs from three different taxonomy analysis tools. Using an integrated reference database, TAMA performs taxonomy assignment for input metagenome reads based on a meta-score by integrating scores of taxonomy assignment from different taxonomy classification tools. TAMA outperformed existing tools when evaluated using various benchmark datasets. It was also successfully applied to obtain relative species abundance profiles and difference in composition of microorganisms in two types of cheese metagenome and human gut metagenome.
CONCLUSION
TAMA can be easily installed and used for metagenome read classification and the prediction of relative species abundance from multiple numbers and types of metagenome read samples. TAMA can be used to more accurately uncover the composition of microorganisms in metagenome samples collected from various environments, especially when the use of a single taxonomy analysis tool is unreliable. TAMA is an open source tool, and can be downloaded at https://github.com/jkimlab/TAMA.
Topics: Bacteria; Classification; Databases, Genetic; Datasets as Topic; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Models, Genetic; Phylogeny
PubMed: 32397982
DOI: 10.1186/s12859-020-3533-7 -
Briefings in Functional Genomics Jan 2023Viruses are the most abundant infectious agents on earth, and they infect living organisms such as bacteria, plants and animals, among others. They play an important...
Viruses are the most abundant infectious agents on earth, and they infect living organisms such as bacteria, plants and animals, among others. They play an important role in the balance of different ecosystems by modulating microbial populations. In humans, they are responsible for some common diseases and may cause severe illnesses. Viral metagenomic studies have become essential and offer the possibility to understand and extend the knowledge of virus diversity and functionality. For these approaches, an essential step is the classification of viral sequences. In this work, 11 taxonomic classification tools were compared by analysing their performances, in terms of sensitivity and precision, to classify reads at the species and family levels using the same (viral and nonviral) datasets and evaluation metrics, as well as their processing times and memory requirements. The results showed that factors such as richness (numbers of viral species in samples), taxonomic level in the classification and read length influence tool performance. High values of viral richness in samples decreased the performances of most tools. Additionally, the classifications were better at higher taxonomic levels, such as families, compared to lower taxonomic levels, such as species, and were more evident in short reads. The results also indicated that BLAST and Kraken2 were the best tools for classifying all types of reads, while FastViromeExplorer and VirusFinder were only good when used for long reads and Centrifuge, DIAMOND, and One Codex when used for short reads. Regarding nonviral datasets (human and bacterial), all tools correctly classified them as nonviral.
Topics: Humans; Ecosystem; Viruses; Bacteria; Metagenome; Metagenomics; High-Throughput Nucleotide Sequencing
PubMed: 36335985
DOI: 10.1093/bfgp/elac036 -
Bioinformatics (Oxford, England) May 2020Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging...
MOTIVATION
Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.
RESULTS
We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.
CONCLUSIONS
DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.
AVAILABILITY AND IMPLEMENTATION
DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Bacteria; Computer Simulation; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 32096824
DOI: 10.1093/bioinformatics/btaa124 -
Microbiological Research Dec 2022Microbial cells attached to inert or living surfaces adopt biofilm mode with self-produced exopolysaccharide matrix containing polysaccharides, proteins, and... (Review)
Review
Microbial cells attached to inert or living surfaces adopt biofilm mode with self-produced exopolysaccharide matrix containing polysaccharides, proteins, and extracellular DNA, for protection from adverse external stimuli. Biofilms in hospitals and industries serve as a breeding ground for drug-resistant pathogens and ARG enrichment that are linked to pathogenicity and also impede industrial production process. Biofilm formation, including virulence and pathogenicity, is regulated through quorum sensing (QS), a means of bacterial cell to cell communication for cooperative physiological processes. Hence, QS inhibition through quorum quenching (QQ) is a feasible approach to inhibit biofilm formation. In contrast, biofilms have beneficial roles in promoting plant growth, biocontrol, and wastewater treatment. Furthermore, polymicrobial biofilms can harbour novel compounds and species of industrial and pharmaceutical interest. Hence, surveillance of biofilm microbiome structure and functional attributes is crucial to determine the extent of the risk it poses and to harness its bioactive potential. One of the most preferred approaches to delineate the microbiome is culture-independent metagenomics. In this context, this review article explores the biofilm microbiome in built and natural settings such as agriculture, household appliances, wastewater treatment plants, hospitals, microplastics, and dental biofilm. We have also discussed the recent reports on discoveries of novel QS and biofilm inhibitors through conventional, metagenomics, and machine learning approaches. Finally, we present biofilm-derived novel metagenome-assembled genomes (MAGs), genomes, and taxa of medical and industrial interest.
Topics: Biofilms; Metagenome; Metagenomics; Microplastics; Pharmaceutical Preparations; Plastics; Quorum Sensing
PubMed: 36194989
DOI: 10.1016/j.micres.2022.127207 -
Methods in Molecular Biology (Clifton,... 2023Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as...
Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as carbohydrate-active enzymes (CAZymes). However, the inability to culture the majority of microorganisms that exist in natural ecosystems restricts access to potentially novel bacteria and beneficial CAZymes. While commonplace molecular-based culture-independent methods such as metagenomics enable researchers to study microbial communities directly from environmental samples, recent progress in long-read sequencing technologies are advancing the field. We outline key methodological stages that are required as well as describe specific protocols that are currently used for long-read metagenomic projects dedicated to CAZyme discovery.
Topics: Metagenomics; Microbiota; Metagenome; Carbohydrates; High-Throughput Nucleotide Sequencing
PubMed: 37149537
DOI: 10.1007/978-1-0716-3151-5_19 -
Bioinformatics (Oxford, England) Jan 2023The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number...
SUMMARY
The Metagenomic Intra-Species Diversity Analysis System (MIDAS) is a scalable metagenomic pipeline that identifies single nucleotide variants (SNVs) and gene copy number variants in microbial populations. Here, we present MIDAS2, which addresses the computational challenges presented by increasingly large reference genome databases, while adding functionality for building custom databases and leveraging paired-end reads to improve SNV accuracy. This fast and scalable reengineering of the MIDAS pipeline enables thousands of metagenomic samples to be efficiently genotyped.
AVAILABILITY AND IMPLEMENTATION
The source code is available at https://github.com/czbiohub/MIDAS2.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metagenome; Software; Metagenomics; Genotype; Databases, Factual
PubMed: 36321886
DOI: 10.1093/bioinformatics/btac713 -
Viruses Jun 2023Blood transfusion safety is an essential element of public health. Current blood screening strategies rely on targeted techniques that could miss unknown or unexpected...
Blood transfusion safety is an essential element of public health. Current blood screening strategies rely on targeted techniques that could miss unknown or unexpected pathogens. Recent studies have demonstrated the presence of a viral community (virobiota/virome) in the blood of healthy individuals. Here, we characterized the blood virome in patients frequently exposed to blood transfusion by using Illumina metagenomic sequencing. The virome of these patients was compared to viruses present in healthy blood donors. A total number of 155 beta-thalassemia, 149 hemodialysis, and 100 healthy blood donors were pooled with five samples per pool. Members of the and family were most frequently observed. Interestingly, samples of healthy blood donors harbored traces of potentially pathogenic viruses, including adeno-, rota-, and Merkel cell polyomavirus. Viruses of the family were most abundant in the blood of hemodialysis patients and displayed a higher anellovirus richness. Pegiviruses () were only observed in patient populations. An overall trend of higher eukaryotic read abundance in both patient groups was observed. This might be associated with increased exposure through blood transfusion. Overall, the findings in this study demonstrated the presence of various viruses in the blood of Iranian multiple-transfused patients and healthy blood donors.
Topics: Humans; Iran; Virome; Viruses; Anelloviridae; Metagenome; Metagenomics
PubMed: 37515113
DOI: 10.3390/v15071425 -
Journal of Microbiology (Seoul, Korea) Mar 2021Microorganisms play a vital role in living systems in numerous ways. In the soil or ocean environment, microbes are involved in diverse processes, such as carbon and... (Review)
Review
Microorganisms play a vital role in living systems in numerous ways. In the soil or ocean environment, microbes are involved in diverse processes, such as carbon and nitrogen cycle, nutrient recycling, and energy acquisition. The relation between microbial dysbiosis and disease developments has been extensively studied. In particular, microbial communities in the human gut are associated with the pathophysiology of several chronic diseases such as inflammatory bowel disease and diabetes. Therefore, analyzing the distribution of microorganisms and their associations with the environment is a key step in understanding nature. With the advent of next-generation sequencing technology, a vast amount of metagenomic data on unculturable microbes in addition to culturable microbes has been produced. To reconstruct microbial genomes, several assembly algorithms have been developed by incorporating metagenomic features, such as uneven depth. Since it is difficult to reconstruct complete microbial genomes from metagenomic reads, contig binning approaches were suggested to collect contigs that originate from the same genome. To estimate the microbial composition in the environment, various methods have been developed to classify individual reads or contigs and profile bacterial proportions. Since microbial communities affect their hosts and environments through metabolites, metabolic profiles from metagenomic or metatranscriptomic data have been estimated. Here, we provide a comprehensive review of computational methods that can be applied to investigate microbiomes using metagenomic and metatranscriptomic sequencing data. The limitations of metagenomic studies and the key approaches to overcome such problems are discussed.
Topics: Animals; Bacteria; Genome, Microbial; Humans; Metagenome; Metagenomics; Microbiota
PubMed: 33565054
DOI: 10.1007/s12275-021-0632-8 -
MSystems Aug 2022Metagenome-assembled genomes (MAGs) represent individual genomes recovered from metagenomic data. MAGs are extremely useful to analyze uncultured microbial genomic...
Metagenome-assembled genomes (MAGs) represent individual genomes recovered from metagenomic data. MAGs are extremely useful to analyze uncultured microbial genomic diversity, as well as to characterize associated functional and metabolic potential in natural environments. Recent computational developments have considerably improved MAG reconstruction but also emphasized several limitations, such as the nonbinning of sequence regions with repetitions or distinct nucleotidic composition. Different assembly and binning strategies are often used; however, it still remains unclear which assembly strategy, in combination with which binning approach, offers the best performance for MAG recovery. Several workflows have been proposed in order to reconstruct MAGs, but users are usually limited to single-metagenome assembly or need to manually define sets of metagenomes to coassemble prior to genome binning. Here, we present MAGNETO, an automated workflow dedicated to MAG reconstruction, which includes a fully-automated coassembly step informed by optimal clustering of metagenomic distances, and implements complementary genome binning strategies, for improving MAG recovery. MAGNETO is implemented as a Snakemake workflow and is available at: https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto. Genome-resolved metagenomics has led to the discovery of previously untapped biodiversity within the microbial world. As the development of computational methods for the recovery of genomes from metagenomes continues, existing strategies need to be evaluated and compared to eventually lead to standardized computational workflows. In this study, we compared commonly used assembly and binning strategies and assessed their performance using both simulated and real metagenomic data sets. We propose a novel approach to automate coassembly, avoiding the requirement for knowledge to combine metagenomic information. The comparison against a previous coassembly approach demonstrates a strong impact of this step on genome binning results, but also the benefits of informing coassembly for improving the quality of recovered genomes. MAGNETO integrates complementary assembly-binning strategies to optimize genome reconstruction and provides a complete reads-to-genomes workflow for the growing microbiome research community.
Topics: Workflow; Metagenomics; Metagenome; Microbiota; Genome, Microbial
PubMed: 35703559
DOI: 10.1128/msystems.00432-22