-
Microbiology Spectrum Aug 2023Staphylococcus aureus is an opportunistic pathogen and a leading cause of morbidity and mortality worldwide. Genomic-based surveillance has greatly improved our ability...
Staphylococcus aureus is an opportunistic pathogen and a leading cause of morbidity and mortality worldwide. Genomic-based surveillance has greatly improved our ability to track the emergence and spread of high-risk clones, but the full potential of genomic data is only reached when used in conjunction with detailed metadata. Here, we demonstrate the utility of an integrated approach by leveraging a curated collection of clinical and epidemiological metadata of S. aureus in the San Matteo Hospital (Italy) through a semisupervised clustering strategy. We sequenced 226 sepsis S. aureus samples, recovered over a period of 9 years. By using existing antibiotic profiling data, we selected strains that capture the full diversity of the population. Genome analysis revealed 49 sequence types, 16 of which are novel. Comparative genomic analyses of hospital- and community-acquired infection ruled out the existence of genomic features differentiating them, while evolutionary analyses of genes and traits of interest highlighted different dynamics of acquisition and loss between antibiotic resistance and virulence genes. Finally, highly resistant clones belonging to clonal complexes (CC) 8 and 22 were found to be responsible for abundant infections and deaths, while the highly virulent CC30 was responsible for rare but deadly episodes of infections. Genome sequencing is an important tool in clinical microbiology, as it allows in-depth characterization of isolates of interest and can propel genome-based surveillance studies. Such studies can benefit from methods of sample selection to capture the genomic diversity present in a data set. Here, we present an approach based on clustering of antibiotic resistance profiles that allows optimal sample selection for bacterial genomic surveillance. We apply the method to a 9-year collection of Staphylococcus aureus from a large hospital in northern Italy. Our method allows us to sequence the genomes of a large variety of strains of this important pathogen, which we then leverage to characterize the epidemiology in the hospital and to perform evolutionary analyses on genes and traits of interest. These analyses highlight different dynamics of acquisition and loss between antibiotic resistance and virulence genes.
Topics: Humans; Staphylococcus aureus; Metadata; Staphylococcal Infections; Genome, Bacterial; Anti-Bacterial Agents; Hospitals; Methicillin-Resistant Staphylococcus aureus; Microbial Sensitivity Tests
PubMed: 37458594
DOI: 10.1128/spectrum.01010-23 -
Bioinformatics (Oxford, England) Nov 2019Analysis and comparison of genomic and transcriptomic datasets have become standard procedures in biological research. However, for non-model organisms no efficient...
SUMMARY
Analysis and comparison of genomic and transcriptomic datasets have become standard procedures in biological research. However, for non-model organisms no efficient tools exist to visually work with multiple genomes and their metadata, and to annotate such data in a collaborative way. Here we present GeneNoteBook: a web based collaborative notebook for comparative genomics. GeneNoteBook allows experimental and computational researchers to query, browse, visualize and curate bioinformatic analysis results for multiple genomes. GeneNoteBook is particularly suitable for the analysis of non-model organisms, as it allows for comparing newly sequenced genomes to those of model organisms.
AVAILABILITY AND IMPLEMENTATION
GeneNoteBook is implemented as a node.js web application and depends on MongoDB and NCBI BLAST. Source code is available at https://github.com/genenotebook/genenotebook. Additionally, GeneNoteBook can be installed through Bioconda and as a Docker image. Full installation instructions and online documentation are available at https://genenotebook.github.io.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Genome; Genomics; Metadata; Software
PubMed: 31199463
DOI: 10.1093/bioinformatics/btz491 -
Scientific Data Jun 2023We present a draft Minimum Information About Geospatial Information System (MIAGIS) standard for facilitating public deposition of geospatial information system (GIS)...
We present a draft Minimum Information About Geospatial Information System (MIAGIS) standard for facilitating public deposition of geospatial information system (GIS) datasets that follows the FAIR (Findable, Accessible, Interoperable and Reusable) principles. The draft MIAGIS standard includes a deposition directory structure and a minimum javascript object notation (JSON) metadata formatted file that is designed to capture critical metadata describing GIS layers and maps as well as their sources of data and methods of generation. The associated miagis Python package facilitates the creation of this MIAGIS metadata file and directly supports metadata extraction from both Esri JSON and GEOJSON GIS data formats plus options for extraction from user-specified JSON formats. We also demonstrate their use in crafting two example depositions of ArcGIS generated maps. We hope this draft MIAGIS standard along with the supporting miagis Python package will assist in establishing a GIS standards group that will develop the draft into a full standard for the wider GIS community as well as a future public repository for GIS datasets.
Topics: Metadata; Information Systems
PubMed: 37328607
DOI: 10.1038/s41597-023-02281-1 -
Database : the Journal of Biological... 2016BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and...
BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org.
Topics: Biological Science Disciplines; Computational Biology; Crowdsourcing; Database Management Systems; Databases, Factual; Humans; Internet; Metadata; Registries; User-Computer Interface
PubMed: 27189610
DOI: 10.1093/database/baw075 -
Nucleic Acids Research Jan 2019The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major...
The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major changes, both in terms of data content and supporting infrastructure. The data content has more than doubled from around 2 million samples in 2014 to just over 5 million samples in 2018. Fast, reciprocal data exchange was fully established between sister Biosample databases and other INSDC partners, enabling a worldwide common representation and centralization of sample metadata. The BioSamples platform has been upgraded to accommodate anticipated increases in the number of submissions via GA4GH driver projects such as the Human Cell Atlas and the EGA, as well as from mirroring of NCBI dbGaP data. The BioSamples database is now the authoritative repository for all INSDC sample metadata, an ELIXIR Deposition Database for Biomolecular Data and the EMBL-EBI sample metadata hub. To support faster turnaround for sample submission, and to increase scalability and resilience, we have upgraded the BioSamples database backend storage, APIs and user interface. Finally, the website has been redesigned to allow search and retrieval of records based on specific filters, such as 'disease' or 'organism'. These changes are targeted at answering current use cases as well as providing functionalities for future emerging and anticipated developments. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use.
Topics: Biological Specimen Banks; Computational Biology; Databases, Genetic; Databases, Nucleic Acid; Genomics; Humans; Information Storage and Retrieval; Internet; Metadata; User-Computer Interface
PubMed: 30407529
DOI: 10.1093/nar/gky1061 -
Database : the Journal of Biological... May 2022Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different...
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
Topics: Data Management; Databases, Factual; Metadata; Semantic Web; Workflow
PubMed: 35616100
DOI: 10.1093/database/baac035 -
Journal of Integrative Bioinformatics Oct 2021This special issue of the contains updated specifications of COMBINE standards in systems and synthetic biology. The 2021 special issue presents four updates of...
This special issue of the contains updated specifications of COMBINE standards in systems and synthetic biology. The 2021 special issue presents four updates of standards: Synthetic Biology Open Language Visual Version 2.3, Synthetic Biology Open Language Visual Version 3.0, Simulation Experiment Description Markup Language Level 1 Version 4, and OMEX Metadata specification Version 1.2. This document can also be consulted to identify the latest specifications of all COMBINE standards.
Topics: Computational Biology; Computer Simulation; Metadata; Programming Languages; Software; Synthetic Biology
PubMed: 34674411
DOI: 10.1515/jib-2021-0026 -
Journal of Biomedical Semantics Mar 2022Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are...
BACKGROUND
Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are exchanged among heterogeneous systems. As such, this study is intended to develop a national health concept data model (HCDM) and develop a corresponding system to facilitate healthcare data standardization and centralized metadata management.
METHODS
Based on 55 data sets (4640 data items) from 7 health business domains in China, a bottom-up approach was employed to build the structure and metadata for HCDM by referencing HL7 RIM. According to ISO/IEC 11179, a top-down approach was used to develop and standardize the data elements.
RESULTS
HCDM adopted three-level architecture of class, attribute and data type, and consisted of 6 classes and 15 sub-classes. Each class had a set of descriptive attributes and every attribute was assigned a data type. 100 initial data elements (DEs) were extracted from HCDM and 144 general DEs were derived from corresponding initial DEs. Domain DEs were transformed by specializing general DEs using 12 controlled vocabularies which developed from HL7 vocabularies and actual health demands. A model-based system was successfully established to evaluate and manage the NHDD.
CONCLUSIONS
HCDM provided a unified metadata reference for multi-source data standardization and management. This approach of defining health data elements was a feasible solution in healthcare information standardization to enable healthcare interoperability in China.
Topics: Delivery of Health Care; Metadata; Semantics; Vocabulary, Controlled
PubMed: 35303946
DOI: 10.1186/s13326-022-00265-5 -
Nucleic Acids Research Jan 2021Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of...
Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of metagenomes in public repositories is increasing exponentially. The Sequence Read Archive (SRA) and the MG-RAST are the two main repositories for metagenomic data. These databases allow scientists to reanalyze samples and explore new hypotheses. However, mining samples from them can be a limiting factor, since the metadata available in these repositories is often misannotated, misleading, and decentralized, creating an overly complex environment for sample reanalysis. The main goal of the HumanMetagenomeDB is to simplify the identification and use of public human metagenomes of interest. HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.
Topics: Data Curation; Databases, Genetic; Humans; Metadata; Metagenome; Metagenomics; Reference Standards; User-Computer Interface
PubMed: 33221926
DOI: 10.1093/nar/gkaa1031 -
Bioinformatics (Oxford, England) Sep 2022Microbiome datasets are often constrained by sequencing limitations. GenBank is the largest collection of publicly available DNA sequences, which is maintained by the...
MOTIVATION
Microbiome datasets are often constrained by sequencing limitations. GenBank is the largest collection of publicly available DNA sequences, which is maintained by the National Center of Biotechnology Information (NCBI). The metadata of GenBank records are a largely understudied resource and may be uniquely leveraged to access the sum of prior studies focused on microbiome composition. Here, we developed a computational pipeline to analyze GenBank metadata, containing data on hosts, microorganisms and their place of origin. This work provides the first opportunity to leverage the totality of GenBank to shed light on compositional data practices that shape how microbiome datasets are formed as well as examine host-microbiome relationships.
RESULTS
The collected dataset contains multiple kingdoms of microorganisms, consisting of bacteria, viruses, archaea, protozoa, fungi, and invertebrate parasites, and hosts of multiple taxonomical classes, including mammals, birds and fish. A human data subset of this dataset provides insights to gaps in current microbiome data collection, which is biased towards clinically relevant pathogens. Clustering and phylogenic analysis reveals the potential to use these data to model host taxonomy and evolution, revealing groupings formed by host diet, environment and coevolution.
AVAILABILITY AND IMPLEMENTATION
GenBank Host-Microbiome Pipeline is available at https://github.com/bcbi/genbank_holobiome. The GenBank loader is available at https://github.com/bcbi/genbank_loader.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Animals; Humans; Databases, Nucleic Acid; Software; Microbiota; Viruses; Metadata; Mammals
PubMed: 35801940
DOI: 10.1093/bioinformatics/btac487