-
Bioinformatics (Oxford, England) Jan 2022Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant...
MOTIVATION
Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome.
RESULTS
To address this problem, we developed a novel clustering approach called 'metagenomic clustering by reference library' (MCRL), where a reference library containing a set of reference genes is clustered with respect to an assembled metagenome. According to our proposed approach, reference genes homologous to similar sets of metagenomic sequences, termed 'signatures', are iteratively clustered in a greedy fashion, retaining at each step the reference genes yielding the lowest E values, and terminating when signatures of remaining reference genes have a minimal overlap. The outcome of this computation is a non-redundant list of reference genes homologous to minimally overlapping sets of contigs, representing potential candidates for gene families present in the metagenome. Unlike metagenomic clustering methods, there is no need for contigs to overlap to be associated with a cluster, enabling MCRL to draw on more information encoded in the metagenome when computing tentative gene families. We demonstrate how MCRL can be used to extract candidate viral gene families from an oral metagenome and an oral virome that otherwise could not be determined using standard approaches. We evaluate the sensitivity, accuracy and robustness of our proposed method for the viral case study and compare it with existing analysis approaches.
AVAILABILITY AND IMPLEMENTATION
https://github.com/a-tadmor/MCRL.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metagenome; Sequence Analysis, DNA; Metagenomics; Cluster Analysis; Viruses
PubMed: 34636854
DOI: 10.1093/bioinformatics/btab703 -
Genomics, Proteomics & Bioinformatics Dec 2018Metagenomes from uncultured microorganisms are rich resources for novel enzyme genes. The methods used to screen the metagenomic libraries fall into two categories,... (Review)
Review
Metagenomes from uncultured microorganisms are rich resources for novel enzyme genes. The methods used to screen the metagenomic libraries fall into two categories, which are based on sequence or function of the enzymes. The sequence-based approaches rely on the known sequences of the target gene families. In contrast, the function-based approaches do not involve the incorporation of metagenomic sequencing data and, therefore, may lead to the discovery of novel gene sequences with desired functions. In this review, we discuss the function-based screening strategies that have been used in the identification of enzymes from metagenomes. Because of its simplicity, agar plate screening is most commonly used in the identification of novel enzymes with diverse functions. Other screening methods with higher sensitivity are also employed, such as microtiter plate screening. Furthermore, several ultra-high-throughput methods were developed to deal with large metagenomic libraries. Among these are the FACS-based screening, droplet-based screening, and the in vivo reporter-based screening methods. The application of these novel screening strategies has increased the chance for the discovery of novel enzyme genes.
Topics: Animals; Bacteria; Enzymes; Gene Library; High-Throughput Screening Assays; Metagenome; Metagenomics; Plants
PubMed: 30597257
DOI: 10.1016/j.gpb.2018.01.002 -
Microbiome Mar 2021Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are...
BACKGROUND
Microbial eukaryotes are found alongside bacteria and archaea in natural microbial systems, including host-associated microbiomes. While microbial eukaryotes are critical to these communities, they are challenging to study with shotgun sequencing techniques and are therefore often excluded.
RESULTS
Here, we present EukDetect, a bioinformatics method to identify eukaryotes in shotgun metagenomic sequencing data. Our approach uses a database of 521,824 universal marker genes from 241 conserved gene families, which we curated from 3713 fungal, protist, non-vertebrate metazoan, and non-streptophyte archaeplastida genomes and transcriptomes. EukDetect has a broad taxonomic coverage of microbial eukaryotes, performs well on low-abundance and closely related species, and is resilient against bacterial contamination in eukaryotic genomes. Using EukDetect, we describe the spatial distribution of eukaryotes along the human gastrointestinal tract, showing that fungi and protists are present in the lumen and mucosa throughout the large intestine. We discover that there is a succession of eukaryotes that colonize the human gut during the first years of life, mirroring patterns of developmental succession observed in gut bacteria. By comparing DNA and RNA sequencing of paired samples from human stool, we find that many eukaryotes continue active transcription after passage through the gut, though some do not, suggesting they are dormant or nonviable. We analyze metagenomic data from the Baltic Sea and find that eukaryotes differ across locations and salinity gradients. Finally, we observe eukaryotes in Arabidopsis leaf samples, many of which are not identifiable from public protein databases.
CONCLUSIONS
EukDetect provides an automated and reliable way to characterize eukaryotes in shotgun sequencing datasets from diverse microbiomes. We demonstrate that it enables discoveries that would be missed or clouded by false positives with standard shotgun sequence analysis. EukDetect will greatly advance our understanding of how microbial eukaryotes contribute to microbiomes. Video abstract.
Topics: Animals; Eukaryota; Humans; Metagenome; Metagenomics; Sequence Analysis, DNA
PubMed: 33658077
DOI: 10.1186/s40168-021-01015-y -
Clinical Microbiology and Infection :... Jul 2012The development of extensive sequencing methods has allowed metagenomic studies on the human gut microbiome to be carried out. This has tremendously increased our... (Review)
Review
The development of extensive sequencing methods has allowed metagenomic studies on the human gut microbiome to be carried out. This has tremendously increased our knowledge on gut microbiota composition and activity, allowing microbiota aberrations related to different diseases to be identified. These aberrations constitute targets for the development of probiotics directed to correct them. Probiotics are extensively used to modulate gut microbiota. Nevertheless, metagenomic studies on the effects of probiotics are still very scarce. In the near future, the use of metagenomics promises to expand our understanding of probiotic action.
Topics: Biota; Gastrointestinal Tract; Humans; Metagenome; Metagenomics; Probiotics
PubMed: 22647045
DOI: 10.1111/j.1469-0691.2012.03873.x -
Methods in Molecular Biology (Clifton,... 2023Bacteriophages, also called phages, are viruses of bacteria. They are the most common and diverse biological entities on this planet. For metagenomic investigation,...
Bacteriophages, also called phages, are viruses of bacteria. They are the most common and diverse biological entities on this planet. For metagenomic investigation, their diversity is also their biggest obstacle. The direct metagenomic sequence of environmental phage communities often leads to short genomic fragments limiting the investigation to a few individual aspects of phage biology and diversity.The presented protocol for generating a host-associated metagenome reduces the phage diversity to a concise and accessible size. Metagenome sequencing often leads to complete genomes, and the availability of a suitable host system ensures further experimental investigation.
Topics: Metagenome; Bacteriophages; Metagenomics; Bacteria; Genomics; Genome, Viral
PubMed: 36306088
DOI: 10.1007/978-1-0716-2795-2_14 -
Methods in Molecular Biology (Clifton,... 2022Microbial communities are key components of all ecosystems, but characterization of their complete genomic structure remains challenging. Typical analysis tends to elude...
Microbial communities are key components of all ecosystems, but characterization of their complete genomic structure remains challenging. Typical analysis tends to elude the complexity of the mixes in terms of species, strains, as well as extrachromosomal DNA molecules. Recently, approaches have been developed that bins DNA contigs into individual genomes and episomes according to their 3D contact frequencies. Those contacts are quantified by chromosome conformation capture experiments (3C, Hi-C), also known as proximity-ligation approaches, applied to metagenomics samples. Here, we present a simple computational pipeline that allows to recover high-quality Metagenomics Assemble Genomes (MAGs) starting from metagenomic 3C or Hi-C datasets and a metagenome assembly.
Topics: High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Microbiota; Sequence Analysis, DNA
PubMed: 34415535
DOI: 10.1007/978-1-0716-1390-0_8 -
Methods in Molecular Biology (Clifton,... 2022Most microbial groups have not been cultivated yet, and the only way to approach the enormous diversity of rhodopsins that they contain in a sensible timeframe is...
Most microbial groups have not been cultivated yet, and the only way to approach the enormous diversity of rhodopsins that they contain in a sensible timeframe is through the analysis of their genomes. High-throughput sequencing technologies have allowed the release of community genomics (metagenomics) of many habitats in the photic zones of theĀ ocean and lakes. Already the harvest is impressive and included from the first bacterial rhodopsin (proteorhodopsin) to the recent discovery of heliorhodopsin by functional metagenomics. However, the search continues using bioinformatic or biochemical routes.
Topics: Metagenome; Metagenomics; Phylogeny; Rhodopsins, Microbial
PubMed: 35857224
DOI: 10.1007/978-1-0716-2329-9_4 -
PloS One 2017An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of...
An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples.
Topics: Algorithms; Computational Biology; DNA Primers; Metagenome; Metagenomics; Microbiota; Polymerase Chain Reaction; RNA, Ribosomal; Reproducibility of Results; Software
PubMed: 28350876
DOI: 10.1371/journal.pone.0174609 -
PLoS Biology Apr 2023The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration...
The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.
Topics: Archaea; Metagenome; Viruses; Bacteria; Metagenomics; Machine Learning; Genome, Viral
PubMed: 37083735
DOI: 10.1371/journal.pbio.3002083 -
Journal of Microbiological Methods Aug 2018Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically...
Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically very large, De novo metagenomic assembly can be effectively used to reduce the total amount of data and enhance quality of downstream analysis, such as annotation and binning. Although, there are many freely available assemblers, but selecting one suitable for a specific goal can be highly challenging. In this study, the performance of 11 well-known assemblers was evaluated in the assembly of three different metagenomes. The results obtained show that metaSPAdes is the best assembler and Megahit is a good choice for conservative assembly strategy. In addition, this research provides useful information regarding the pros and cons of each assembler and the effect of read length on assembly, thereby helping scholars to select the optimal assembler based on their objectives.
Topics: Computational Biology; High-Throughput Nucleotide Sequencing; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 29953874
DOI: 10.1016/j.mimet.2018.06.007