-
Microbial Genomics Nov 2021Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial...
Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.
Topics: Databases, Nucleic Acid; Genome, Bacterial; Metagenome; Metagenomics; Software
PubMed: 34739369
DOI: 10.1099/mgen.0.000685 -
Nature Oct 2023Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities. Exploration of this vast sequence space has been limited to...
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Topics: Cluster Analysis; Metagenome; Metagenomics; Proteins; Databases, Protein; Protein Conformation; Microbiology
PubMed: 37821698
DOI: 10.1038/s41586-023-06583-7 -
DNA Research : An International Journal... Dec 2023Various microorganisms exist in environments, and each of them has its optimal growth temperature (OGT). The relationship between genomic information and OGT of each...
Various microorganisms exist in environments, and each of them has its optimal growth temperature (OGT). The relationship between genomic information and OGT of each species has long been studied, and one such study revealed that OGT of prokaryotes can be accurately predicted based on the fraction of seven amino acids (IVYWREL) among all encoded amino-acid sequences in its genome. Extending this discovery, we developed a 'Metagenomic Thermometer' as a means of predicting environmental temperature based on metagenomic sequences. Temperature prediction of diverse environments using publicly available metagenomic data revealed that the Metagenomic Thermometer can predict environmental temperatures with small temperature changes and little influx of microorganisms from other environments. The accuracy of the Metagenomic Thermometer was also confirmed by a demonstration experiment using an artificial hot water canal. The Metagenomic Thermometer was also applied to human gut metagenomic samples, yielding a reasonably accurate value for human body temperature. The result further suggests that deep body temperature determines the dominant lineage of the gut community. Metagenomic Thermometer provides a new insight into temperature-driven community assembly based on amino-acid composition rather than microbial taxa.
Topics: Humans; Thermometers; Metagenome; Metagenomics; Genomics
PubMed: 37940329
DOI: 10.1093/dnares/dsad024 -
Current Opinion in Virology Apr 2022Viruses are diverse biological entities that influence all life. Even with limited genome sizes, viruses can manipulate, drive, steal from, and kill their hosts. The... (Review)
Review
Viruses are diverse biological entities that influence all life. Even with limited genome sizes, viruses can manipulate, drive, steal from, and kill their hosts. The field of virus genomics, using sequencing data to understand viral capabilities, has seen significant innovations in recent years. However, with advancements in metagenomic sequencing and related technologies, the bottleneck to discovering and employing the virosphere has become the analysis of genomes rather than generation. With metagenomics rapidly expanding available data, vital components of virus genomes and features are being overlooked, with the issue compounded by lagging databases and bioinformatics methods. Despite the field moving in a positive direction, there are noteworthy points to keep in mind, from how software-based virus genome predictions are interpreted to what information is overlooked by current standards. In this review, we discuss conventions and ideologies that likely need to be revised while continuing forward in the study of virus genomics.
Topics: Genome, Viral; Metagenome; Metagenomics; Software; Viruses
PubMed: 35051682
DOI: 10.1016/j.coviro.2022.101200 -
Annual Review of Virology Sep 2022Over the past 20 years, our knowledge of virus diversity and abundance in subsurface environments has expanded dramatically through application of quantitative... (Review)
Review
Over the past 20 years, our knowledge of virus diversity and abundance in subsurface environments has expanded dramatically through application of quantitative metagenomic approaches. In most subsurface environments, viral diversity and abundance rival viral diversity and abundance observed in surface environments. Most of these viruses are uncharacterized in terms of their hosts and replication cycles. Analysis of accessory metabolic genes encoded by subsurface viruses indicates that they evolved to replicate within the unique features of their environments. The key question remains: What role do these viruses play in the ecology and evolution of the environments in which they replicate? Undoubtedly, as more virologists examine the role of viruses in subsurface environments, new insights will emerge.
Topics: Ecology; Metagenome; Metagenomics; Viruses
PubMed: 36173700
DOI: 10.1146/annurev-virology-093020-015957 -
The Lancet. Microbe Nov 2022Measurement and manipulation of the microbiome is generally considered to have great potential for understanding the causes of complex diseases in humans, developing new... (Review)
Review
Measurement and manipulation of the microbiome is generally considered to have great potential for understanding the causes of complex diseases in humans, developing new therapies, and finding preventive measures. Many studies have found significant associations between the microbiome and various diseases; however, Koch's classical postulates remind us about the importance of causative reasoning when considering the relationship between microbes and a disease manifestation. Although causal discovery in observational microbiome data faces many challenges, methodological advances in causal structure learning have improved the potential of data-driven prediction of causal effects in large-scale biological systems. In this Personal View, we show the capability of existing methods for inferring causal effects from metagenomic data, and we highlight ways in which the introduction of causal structures that are more flexible than existing structures offers new opportunities for causal reasoning. Our observations suggest that microbiome research can further benefit from tools developed in the past 5 years in causal discovery and learn from their applications elsewhere.
Topics: Humans; Microbiota; Metagenomics; Causality; Metagenome
PubMed: 36152674
DOI: 10.1016/S2666-5247(22)00186-0 -
MSphere Nov 2020Continued influx of metagenome-derived proteins with misannotated taxonomy into conventional databases, including RefSeq, threatens to eliminate the value of taxonomy...
Continued influx of metagenome-derived proteins with misannotated taxonomy into conventional databases, including RefSeq, threatens to eliminate the value of taxonomy identifiers. To prevent this, urgent efforts should be undertaken by submitters of metagenomic data sets as well as by database managers.
Topics: Algorithms; Databases, Genetic; Metagenome; Metagenomics; Proteins
PubMed: 33148820
DOI: 10.1128/mSphere.00854-20 -
Clinical Microbiology and Infection :... Jul 2012
Topics: Biota; Host-Pathogen Interactions; Humans; Metagenome; Metagenomics; Probiotics
PubMed: 22647037
DOI: 10.1111/j.1469-0691.2012.03915.x -
Genome Research Mar 2020Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically,... (Review)
Review
Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
Topics: Data Curation; Genome, Archaeal; Genome, Bacterial; Metagenome; Metagenomics
PubMed: 32188701
DOI: 10.1101/gr.258640.119 -
Clinical Microbiology and Infection :... Jul 2012Most of the bacterial species that form part of the biosphere have never been cultivated. In this situation, a comprehensive study of bacterial communities requires the... (Review)
Review
Most of the bacterial species that form part of the biosphere have never been cultivated. In this situation, a comprehensive study of bacterial communities requires the utilization of non-culture-based methods, which have been named metagenomics. In this paper we review the use of different metagenomic techniques for understanding the effect of antibiotics on microbial communities, to synthesize new antimicrobial compounds and to analyse the distribution of antibiotic resistance genes in different ecosystems. These techniques include functional metagenomics, which serves to find new antibiotics or new antibiotic resistance genes, and descriptive metagenomics, which serves to analyse changes in the composition of the microbiota and to track the presence and abundance of already known antibiotic resistance genes in different ecosystems.
Topics: Anti-Bacterial Agents; Bacteria; Biota; Drug Resistance, Bacterial; Gastrointestinal Tract; Humans; Metagenome; Metagenomics
PubMed: 22647044
DOI: 10.1111/j.1469-0691.2012.03868.x