-
Applied and Environmental Microbiology Jun 2023Surveillance for early disease detection is crucial to reduce the threat of plant diseases to food security. Metagenomic sequencing and taxonomic classification have...
Surveillance for early disease detection is crucial to reduce the threat of plant diseases to food security. Metagenomic sequencing and taxonomic classification have recently been used to detect and identify plant pathogens. However, for an emerging pathogen, its genome may not be similar enough to any public genome to permit reference-based tools to identify infected samples. Also, in the case of point-of care diagnosis in the field, database access may be limited. Therefore, here we explore reference-free detection of plant pathogens using metagenomic sequencing and machine learning (ML). We used long-read metagenomes from healthy and infected plants as our model system and constructed k-mer frequency tables to test eight different ML models. The accuracy in classifying individual reads as coming from a healthy or infected metagenome were compared. Of all models, random forest (RF) had the best combination of short run-time and high accuracy (over 0.90) using tomato metagenomes. We further evaluated the RF model with a different tomato sample infected with the same pathogen or a different pathogen and a grapevine sample infected with a grapevine pathogen and achieved similar performances. ML models can thus learn features to successfully perform reference-free detection of plant diseases whereby a model trained with one pathogen-host system can also be used to detect different pathogens on different hosts. Potential and challenges of applying ML to metagenomics in plant disease detection are discussed. Climate change may lead to the emergence of novel plant diseases caused by yet unknown pathogens. Surveillance for emerging plant diseases is crucial to reduce their threat to food security. However, conventional genomic based methods require knowledge of existing plant pathogens and cannot be applied to detecting newly emerged pathogens. In this work, we explored reference-free, meta-genomic sequencing-based disease detection using machine learning. By sequencing the genomes of all microbial species extracted from an infected plant sample, we were able to train machine learning models to accurately classify individual sequencing reads as coming from a healthy or an infected plant sample. This method has the potential to be integrated into a generic pipeline for a meta-genomic based plant disease surveillance approach but also has limitations that still need to be overcome.
Topics: Metagenome; Metagenomics; Machine Learning; Chromosome Mapping; Plant Diseases; High-Throughput Nucleotide Sequencing
PubMed: 37184398
DOI: 10.1128/aem.00260-23 -
Microbiology Spectrum Apr 2022The reproductive tract metagenome plays a significant role in the various reproductive system functions, including reproductive cycles, health, and fertility. One of the...
The reproductive tract metagenome plays a significant role in the various reproductive system functions, including reproductive cycles, health, and fertility. One of the major challenges in bovine vaginal metagenome studies is host DNA contamination, which limits the sequencing capacity for metagenomic content and reduces the accuracy of untargeted shotgun metagenomic profiling. This is the first study comparing the effectiveness of different host depletion and DNA extraction methods for bovine vaginal metagenomic samples. The host depletion methods evaluated were slow centrifugation (Soft-spin), NEBNext Microbiome DNA Enrichment kit (NEBNext), and propidium monoazide (PMA) treatment, while the extraction methods were DNeasy Blood and Tissue extraction (DNeasy) and QIAamp DNA Microbiome extraction (QIAamp). Soft-spin and QIAamp were the most effective host depletion method and extraction methods, respectively, in reducing the number of cattle genomic content in bovine vaginal samples. The reduced host-to-microbe ratio in the extracted DNA increased the sequencing depth for microbial reads in untargeted shotgun sequencing. Bovine vaginal samples extracted with QIAamp presented taxonomical profiles which closely resembled the mock microbial composition, especially for the recovery of Gram-positive bacteria. Additionally, samples extracted with QIAamp presented extensive functional profiles with deep coverage. Overall, a combination of Soft-spin and QIAamp provided the most robust representation of the vaginal microbial community in cattle while minimizing host DNA contamination. In addition to the host tissue collected during the sampling process, bovine vaginal samples are saturated with large amounts of extracellular DNA and secreted proteins that are essential for physiological purposes, including the reproductive cycle and immune defense. Due to the high host-to-microbe genome ratio, which hampers the sequencing efficacy for metagenome samples and the recovery of the actual metagenomic profiles, bovine vaginal samples cannot benefit from the full potential of shotgun sequencing. This is the first investigation on the most effective host depletion and extraction methods for bovine vaginal metagenomic samples. This study demonstrated an effective combination of host depletion and extraction methods, which harvested higher percentages of 16S rRNA genes and microbial reads, which subsequently led to a taxonomical profile that resembled the actual community and a functional profile with deeper coverage. A representative metagenomic profile is essential for investigating the role of the bovine vaginal metagenome for both reproductive function and susceptibility to infections.
Topics: Animals; Cattle; DNA; Female; Metagenome; Metagenomics; RNA, Ribosomal, 16S; Sequence Analysis, DNA
PubMed: 35404108
DOI: 10.1128/spectrum.00412-21 -
The ISME Journal Jan 2021Growth rates are central to understanding microbial interactions and community dynamics. Metagenomic growth estimators have been developed, specifically codon usage bias...
Growth rates are central to understanding microbial interactions and community dynamics. Metagenomic growth estimators have been developed, specifically codon usage bias (CUB) for maximum growth rates and "peak-to-trough ratio" (PTR) for in situ rates. Both were originally tested with pure cultures, but natural populations are more heterogeneous, especially in individual cell histories pertinent to PTR. To test these methods, we compared predictors with observed growth rates of freshly collected marine prokaryotes in unamended seawater. We prefiltered and diluted samples to remove grazers and greatly reduce virus infection, so net growth approximated gross growth. We sampled over 44 h for abundances and metagenomes, generating 101 metagenome-assembled genomes (MAGs), including Actinobacteria, Verrucomicrobia, SAR406, MGII archaea, etc. We tracked each MAG population by cell-abundance-normalized read recruitment, finding growth rates of 0 to 5.99 per day, the first reported rates for several groups, and used these rates as benchmarks. PTR, calculated by three methods, rarely correlated to growth (r ~-0.26-0.08), except for rapidly growing γ-Proteobacteria (r ~0.63-0.92), while CUB correlated moderately well to observed maximum growth rates (r = 0.57). This suggests that current PTR approaches poorly predict actual growth of most marine bacterial populations, but maximum growth rates can be approximated from genomic characteristics.
Topics: Archaea; Bacteria; Benchmarking; Metagenome; Metagenomics
PubMed: 32939027
DOI: 10.1038/s41396-020-00773-1 -
Nucleic Acids Research Aug 2022Genome binning has been essential for characterization of bacteria, archaea, and even eukaryotes from metagenomes. Yet, few approaches exist for viruses. We developed...
Genome binning has been essential for characterization of bacteria, archaea, and even eukaryotes from metagenomes. Yet, few approaches exist for viruses. We developed vRhyme, a fast and precise software for construction of viral metagenome-assembled genomes (vMAGs). vRhyme utilizes single- or multi-sample coverage effect size comparisons between scaffolds and employs supervised machine learning to identify nucleotide feature similarities, which are compiled into iterations of weighted networks and refined bins. To refine bins, vRhyme utilizes unique features of viral genomes, namely a protein redundancy scoring mechanism based on the observation that viruses seldom encode redundant genes. Using simulated viromes, we displayed superior performance of vRhyme compared to available binning tools in constructing more complete and uncontaminated vMAGs. When applied to 10,601 viral scaffolds from human skin, vRhyme advanced our understanding of resident viruses, highlighted by identification of a Herelleviridae vMAG comprised of 22 scaffolds, and another vMAG encoding a nitrate reductase metabolic gene, representing near-complete genomes post-binning. vRhyme will enable a convention of binning uncultivated viral genomes and has the potential to transform metagenome-based viral ecology.
Topics: Genome, Viral; High-Throughput Nucleotide Sequencing; Humans; Metagenome; Metagenomics; Sequence Analysis, DNA; Software
PubMed: 35544285
DOI: 10.1093/nar/gkac341 -
NPJ Biofilms and Microbiomes Apr 2021Investigation of the microbial ecology of terrestrial, aquatic and atmospheric ecosystems requires specific sampling and analytical technologies, owing to vastly...
Investigation of the microbial ecology of terrestrial, aquatic and atmospheric ecosystems requires specific sampling and analytical technologies, owing to vastly different biomass densities typically encountered. In particular, the ultra-low biomass nature of air presents an inherent analytical challenge that is confounded by temporal fluctuations in community structure. Our ultra-low biomass pipeline advances the field of bioaerosol research by significantly reducing sampling times from days/weeks/months to minutes/hours, while maintaining the ability to perform species-level identification through direct metagenomic sequencing. The study further addresses all experimental factors contributing to analysis outcome, such as amassment, storage and extraction, as well as factors that impact on nucleic acid analysis. Quantity and quality of nucleic acid extracts from each optimisation step are evaluated using fluorometry, qPCR and sequencing. Both metagenomics and marker gene amplification-based (16S and ITS) sequencing are assessed with regard to their taxonomic resolution and inter-comparability. The pipeline is robust across a wide range of climatic settings, ranging from arctic to desert to tropical environments. Ultimately, the pipeline can be adapted to environmental settings, such as dust and surfaces, which also require ultra-low biomass analytics.
Topics: Air Microbiology; Biomass; Ecosystem; Environmental Microbiology; Environmental Monitoring; Metagenome; Metagenomics; Microbiota; Soil Microbiology; Water Microbiology
PubMed: 33863892
DOI: 10.1038/s41522-021-00209-4 -
GigaScience Jan 2024Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of...
BACKGROUND
Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.
FINDINGS
To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.
CONCLUSIONS
LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
Topics: Humans; Genome, Human; Metagenome; Software; Metagenomics; Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; Computational Biology
PubMed: 38869148
DOI: 10.1093/gigascience/giae028 -
Genes Oct 2022The recent increase in publicly available metagenomic datasets with geospatial metadata has made it possible to determine location-specific, microbial fingerprints from...
The recent increase in publicly available metagenomic datasets with geospatial metadata has made it possible to determine location-specific, microbial fingerprints from around the world. Such fingerprints can be useful for comparing microbial niches for environmental research, as well as for applications within forensic science and public health. To determine the regional specificity for environmental metagenomes, we examined 4305 shotgun-sequenced samples from the MetaSUB Consortium dataset-the most extensive public collection of urban microbiomes, spanning 60 different cities, 30 countries, and 6 continents. We were able to identify city-specific microbial fingerprints using supervised machine learning (SML) on the taxonomic classifications, and we also compared the performance of ten SML classifiers. We then further evaluated the five algorithms with the highest accuracy, with the city and continental accuracy ranging from 85-89% to 90-94%, respectively. Thereafter, we used these results to develop Cassandra, a random-forest-based classifier that identifies bioindicator species to aid in fingerprinting and can infer higher-order microbial interactions at each site. We further tested the Cassandra algorithm on the Tara Oceans dataset, the largest collection of marine-based microbial genomes, where it classified the oceanic sample locations with 83% accuracy. These results and code show the utility of SML methods and Cassandra to identify bioindicator species across both oceanic and urban environments, which can help guide ongoing efforts in biotracing, environmental monitoring, and microbial forensics (MF).
Topics: Metagenomics; Metagenome; Microbiota; Supervised Machine Learning; Cities
PubMed: 36292799
DOI: 10.3390/genes13101914 -
Current Opinion in Microbiology Dec 2022While they are the most abundant biological entities on the planet, the role of bacteriophages (phages) in the microbiome remains enigmatic and understudied. With a rise... (Review)
Review
While they are the most abundant biological entities on the planet, the role of bacteriophages (phages) in the microbiome remains enigmatic and understudied. With a rise in the number of metagenomics studies and the publication of highly efficient phage mining programmes, we now have extensive data on the genomic and taxonomic diversity of (mainly) DNA bacteriophages in a wide range of environments. In addition, the higher throughput and quality of sequencing is allowing for strain-level reconstructions of phage genomes from metagenomes. These factors will ultimately help us to understand the role these phages play as part of specific microbial communities, enabling the tracking of individual virus genomes through space and time. Using lessons learned from the latest metagenomic studies, we focus on two explicit aspects of the role bacteriophages play within the microbiome, their ecological role in structuring bacterial populations, and their contribution to microbiome functioning by encoding auxiliary metabolism genes.
Topics: Humans; Bacteriophages; Metagenomics; Metagenome; Genome, Viral; Bacteria
PubMed: 36347213
DOI: 10.1016/j.mib.2022.102229 -
Clinical Microbiology and Infection :... Sep 2022The diagnosis of bacterial infections continues to rely on culture, a slow process in which antibiotic susceptibility profiles of potential pathogens are made available... (Review)
Review
BACKGROUND
The diagnosis of bacterial infections continues to rely on culture, a slow process in which antibiotic susceptibility profiles of potential pathogens are made available to clinicians 48 hours after sampling, at best. Recently, clinical metagenomics, the metagenomic sequencing of samples with the purpose of identifying microorganisms and determining their susceptibility to antimicrobials, has emerged as a potential diagnostic tool that could prove faster than culture. Clinical metagenomics indeed has the potential to detect antibiotic resistance genes (ARGs) and mutations associated with resistance. Nevertheless, many challenges have yet to be overcome in order to make rapid phenotypic inference of antibiotic susceptibility from metagenomic data a reality.
OBJECTIVES
The objective of this narrative review is to discuss the challenges underlying the phenotypic inference of antibiotic susceptibility from metagenomic data.
SOURCES
We conducted a narrative review using published articles available in the National Center for Biotechnology Information PubMed database.
CONTENT
We review the current ARG databases with a specific emphasis on those which now provide associations with phenotypic data. Next, we discuss the bioinformatic tools designed to identify ARGs in metagenomes. We then report on the performance of phenotypic inference from genomic data and the issue predicting the expression of ARGs. Finally, we address the challenge of linking an ARG to this host.
IMPLICATIONS
Significant improvements have recently been made in associating ARG and phenotype, and the inference of susceptibility from genomic data has been demonstrated in pathogenic bacteria such as Staphylococci and Enterobacterales. Resistance involving gene expression is more challenging however, and inferring susceptibility from species such as Pseudomonas aeruginosa remains difficult. Future research directions include the consideration of gene expression via RNA sequencing and machine learning.
Topics: Anti-Bacterial Agents; Drug Resistance, Microbial; Genes, Bacterial; Metagenome; Metagenomics
PubMed: 35551982
DOI: 10.1016/j.cmi.2022.04.017 -
Chinese Medical Journal Oct 2022
Topics: Humans; Virome; Bacteriophages; Feces; Metagenome; Metagenomics
PubMed: 36583859
DOI: 10.1097/CM9.0000000000002382