-
Bioinformatics (Oxford, England) Sep 2022Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered...
MOTIVATION
Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered sequences, and taxonomic assignments to sequences. High-quality community shareable eDNA resources rely heavily on comprehensive metadata documentation that captures the complex workflows covering field sampling, molecular biology lab work, and bioinformatic analyses. There are limited sources that provide documentation of database development on comprehensive metadata for eDNA and these workflows and no open-source software.
RESULTS
We present medna-metadata, an open-source, modular system that aligns with Findable, Accessible, Interoperable, and Reusable guiding principles that support scholarly data reuse and the database and application development of a standardized metadata collection structure that encapsulates critical aspects of field data collection, wet lab processing, and bioinformatic analysis. Medna-metadata is showcased with metabarcoding data from the Gulf of Maine (Polinski et al., 2019).
AVAILABILITY AND IMPLEMENTATION
The source code of the medna-metadata web application is hosted on GitHub (https://github.com/Maine-eDNA/medna-metadata). Medna-metadata is a docker-compose installable package. Documentation can be found at https://medna-metadata.readthedocs.io/en/latest/?badge=latest. The application is implemented in Python, PostgreSQL and PostGIS, RabbitMQ, and NGINX, with all major browsers supported. A demo can be found at https://demo.metadata.maine-edna.org/.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metadata; DNA, Environmental; Data Management; Software; Databases, Factual
PubMed: 35960154
DOI: 10.1093/bioinformatics/btac556 -
Scientific Data Sep 2023The Two Weeks in the World research project has resulted in a dataset of 3087 clinically relevant bacterial genomes with pertaining metadata, collected from 59...
The Two Weeks in the World research project has resulted in a dataset of 3087 clinically relevant bacterial genomes with pertaining metadata, collected from 59 diagnostic units in 35 countries around the world during 2020. A relational database is available with metadata and summary data from selected bioinformatic analysis, such as species prediction and identification of acquired resistance genes.
Topics: Bacteria; Computational Biology; Databases, Factual; Genome, Bacterial; Metadata
PubMed: 37717051
DOI: 10.1038/s41597-023-02502-7 -
Scientific Data Jan 2021Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has...
Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual taxa and communities of both microbes and eukaryotes. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833 ) is a collection of annotated metagenomic sample lists derived from published studies that provide basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These tables are community-curated and span multiple sub-disciplines to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks facilitate compatibility with established sequence-read archives and term-ontologies, and ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies.
Topics: Databases, Genetic; Humans; Metadata; Metagenome; Metagenomics; Publications
PubMed: 33500403
DOI: 10.1038/s41597-021-00816-y -
Scientific Data Oct 2022Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To...
Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To overcome the hurdle of disparate data sources and repositories with heterogeneous data formats a metadata crosswalk was initiated, based on existing standards. FAIR Principles were included, as well as data format specifications. The metadata crosswalk is the foundation of data provision between a Medical Data Integration Center (MeDIC) and researchers, providing a selection of metadata information for research design and requests. Based on the crosswalk, metadata items were prioritized and categorized to demonstrate that not one single predefined standard meets all requirements of a MeDIC and only a maximum data set of metadata is suitable for use. The development of a convergence format including the maximum data set is the anticipated solution for an automated transformation of metadata in a MeDIC.
Topics: Metadata; Information Storage and Retrieval; Semantics; Reference Standards
PubMed: 36307424
DOI: 10.1038/s41597-022-01792-7 -
Nucleic Acids Research Jan 2024MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and...
MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and cross-technique and covers metabolite structures and their reference spectra as well as their biological roles and locations where available. MetaboLights is the recommended metabolomics repository for a number of leading journals and ELIXIR, the European infrastructure for life science information. In this article, we describe the continued growth and diversity of submissions and the significant developments in recent years. In particular, we highlight MetaboLights Labs, our new Galaxy Project instance with repository-scale standardized workflows, and how data public on MetaboLights are being reused by the community. Metabolomics resources and data are available under the EMBL-EBI's Terms of Use at https://www.ebi.ac.uk/metabolights and under Apache 2.0 at https://github.com/EBI-Metabolights.
Topics: Metabolomics; Metadata; Databases, Genetic; Internet
PubMed: 37971328
DOI: 10.1093/nar/gkad1045 -
Studies in Health Technology and... Sep 2021Ensuring scientific reproducibility and compliance with documentation guidelines of funding bodies and journals is a topic of greatly increasing importance in biomedical...
INTRODUCTION
Ensuring scientific reproducibility and compliance with documentation guidelines of funding bodies and journals is a topic of greatly increasing importance in biomedical research. Failure to comply, or unawareness of documentation standards can have adverse effects on the translation of research into patient treatments, as well as economic implications. In the context of the German Research Foundation-funded collaborative research center (CRC) 1002, an IT-infrastructure sub-project was designed. Its goal has been to establish standardized metadata documentation and information exchange benefitting the participating research groups with minimal additional documentation efforts.
METHODS
Implementation of the self-developed menoci-based research data platform (RDP) was driven by close communication and collaboration with researchers as early adopters and experts. Requirements analysis and concept development involved in person observation of experimental procedures, interviews and collaboration with researchers and experts, as well as the investigation of available and applicable metadata standards and tools. The Drupal-based RDP features distinct modules for the different documented data and workflow types, and both the development and the types of collected metadata were continuously reviewed and evaluated with the early adopters.
RESULTS
The menoci-based RDP allows for standardized documentation, sharing and cross-referencing of different data types, workflows, and scientific publications. Different modules have been implemented for specific data types and workflows, allowing for the enrichment of entries with specific metadata and linking to further relevant entries in different modules.
DISCUSSION
Taking the workflows and datasets of the frequently involved experimental service projects as a starting point for (meta-)data types to overcome irreproducibility of research data, results in increased benefits for researchers with minimized efforts. While the menoci-based RDP with its data models and metadata schema was originally developed in a cardiological context, it has been implemented and extended to other consortia at GÃűttingen Campus and beyond in different life science research areas.
Topics: Biomedical Research; Documentation; Humans; Metadata; Reproducibility of Results; Workflow
PubMed: 34545820
DOI: 10.3233/SHTI210542 -
AMIA ... Annual Symposium Proceedings.... 2017In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those...
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
Topics: Biological Ontologies; Biomedical Research; Data Accuracy; Data Analysis; Metadata; Methods
PubMed: 29854196
DOI: No ID Found -
Sensors (Basel, Switzerland) Apr 2022The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural... (Review)
Review
The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata description and a review of related works. This is followed by showing the dataset prepared for the experiment and its division into subsets: training, validation, and evaluation. Then, the analyzed architecture of the neural network model is presented. Based on the described model, training is performed, and several quality metrics are determined for the training and validation sets. The results of the evaluation of the trained network on a separate set are shown. Detailed values for precision, recall, and the number of true and false positive and negative detections are presented. The model efficiency is high, with the metric values ranging from 0.86 for the guitar to 0.99 for drums. Finally, a discussion and a summary of the results obtained follows.
Topics: Algorithms; Benchmarking; Deep Learning; Metadata; Neural Networks, Computer
PubMed: 35459018
DOI: 10.3390/s22083033 -
Clinical Infectious Diseases : An... Oct 2021Open-source DNA sequence databases have long been touted as beneficial to public health, including the facilitation of earlier detection and response to infectious...
Open-source DNA sequence databases have long been touted as beneficial to public health, including the facilitation of earlier detection and response to infectious disease outbreaks. Of critical importance to harnessing these benefits is the metadata that describe general and other domain-specific attributes (eg, collection location, isolate type) of a sample. Unlike the sequence data, metadata are often incomplete and lack adherence to an international standard. Here, we describe the problem posed by such variable and incomplete metadata in terms of interpretative labor costs (the time and energy necessary to make sense of the signal in the genetic data) and the impact such metadata have on foodborne outbreak detection and response. Improving the quality of sequence-associated metadata would allow for earlier detection of emerging food safety hazards and allow faster response to foodborne outbreaks.
Topics: Disease Outbreaks; Food Safety; Foodborne Diseases; Humans; Metadata; Public Health; Public Health Surveillance
PubMed: 34240118
DOI: 10.1093/cid/ciab615 -
Studies in Health Technology and... Sep 2023To provide clinical data in distributed research architectures, a fundamental challenge involves defining and distributing suitable metadata within Metadata...
To provide clinical data in distributed research architectures, a fundamental challenge involves defining and distributing suitable metadata within Metadata Repositories. Especially for structured data, data elements need to be bound against suitable terminologies; otherwise, other systems will only be able to interpret the data with complex and error-prone manual involvement. As current Metadata Repository implementations lack support for querying externally defined terminologies in FHIR terminology servers, we propose an intermediate solution that uses appropriate annotations on metadata elements to allow run-time Terminology Services mediated queries of that metadata. This allows a very clear separation of concerns between the two related systems, greatly simplifying terminological maintenance. The system performed well in a prototypical deployment.
Topics: Metadata
PubMed: 37697859
DOI: 10.3233/SHTI230721