-
Journal of Medical Internet Research Jan 2022Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example,... (Review)
Review
BACKGROUND
Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term "metadata" and its use is not always unambiguous.
OBJECTIVE
This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse.
METHODS
A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used.
RESULTS
The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas.
CONCLUSIONS
Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context.
Topics: Humans; Metadata; Publications; Reference Standards
PubMed: 35014967
DOI: 10.2196/25440 -
Trends in Cancer Apr 2021Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated... (Review)
Review
Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated descriptions of samples and their handling. We discuss how machine learning-derived elements complement such descriptions to enhance the research ecosystem around genomic data.
Topics: Genomics; Humans; Machine Learning; Metadata; Neoplasms
PubMed: 33229213
DOI: 10.1016/j.trecan.2020.10.011 -
Nature Methods Dec 2021
Topics: Humans; Image Processing, Computer-Assisted; Medical Informatics; Metadata; Microscopy
PubMed: 34862498
DOI: 10.1038/s41592-021-01347-5 -
Scientific Data Sep 2022Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the...
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.
Topics: Biological Science Disciplines; Humans; Metadata; Reproducibility of Results
PubMed: 36180441
DOI: 10.1038/s41597-022-01707-6 -
Bioinformatics (Oxford, England) Jan 2023Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there...
MOTIVATION
Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction.
RESULTS
We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access.
AVAILABILITY AND IMPLEMENTATION
ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.
Topics: Software; Metadata; Databases, Nucleic Acid
PubMed: 36610997
DOI: 10.1093/bioinformatics/btac667 -
Studies in Health Technology and... May 2022Observational research benefits from a rich methodological foundation of registry development and operation published in international and national guidelines. Metadata...
Observational research benefits from a rich methodological foundation of registry development and operation published in international and national guidelines. Metadata management is an essential part of registry implementation based on concepts of data elements and value sets. The metadata from six German registries revealed vastly divergent interpretations of the concept of data elements. The different perspectives of research questions, data acquisition and data storage were all represented in the registries' catalogs of data elements. Consequently, the whole life cycle of a registry needs to be accompanied by a catalog of data elements, which has to be continuously adapted to the changing perspectives. A standard for the representation of those metadata is still missing. The FAIR Guiding Principles introduce important methodological requirements, but the tools for their fulfillment in respect to the management of metadata are still in its infancy.
Topics: Information Storage and Retrieval; Metadata; Registries
PubMed: 35612051
DOI: 10.3233/SHTI220432 -
GigaScience Sep 2021Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by...
BACKGROUND
Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.
FINDINGS
Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.
CONCLUSION
Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.
Topics: Algorithms; Bias; Metadata; Molecular Sequence Annotation; RNA; Sequence Analysis, RNA
PubMed: 34553213
DOI: 10.1093/gigascience/giab064 -
Journal of Digital Imaging Aug 2018Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an... (Review)
Review
Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.
Topics: Dermatology; Dermoscopy; Diagnostic Imaging; Humans; Internationality; Metadata; Radiology Information Systems; Reproducibility of Results; Skin Diseases; United States
PubMed: 29344752
DOI: 10.1007/s10278-017-0045-8 -
Journal of Integrative Bioinformatics Oct 2021A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups,...
A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups, enhance search and retrieval of models and data, and enable semantic comparisons between models. Motivated by these potential benefits and guided by consensus across the COmputational Modeling in BIology NEtwork (COMBINE) community, we have developed a specification for encoding annotations in Open Modeling and EXchange (OMEX)-formatted archives. This document details version 1.2 of the specification, which builds on version 1.0 published last year in this journal. In particular, this version includes a set of initial model-level annotations (whereas v 1.0 described exclusively annotations at a smaller scale). Additionally, this version uses best practices for namespaces, and introduces omex-library.org as a common root for all annotations. Distributing modeling projects within an OMEX archive is a best practice established by COMBINE, and the OMEX metadata specification presented here provides a harmonized, community-driven approach for annotating a variety of standardized model representations. This specification acts as a technical guideline for developing software tools that can support this standard, and thereby encourages broad advances in model reuse, discovery, and semantic analyses.
Topics: Computational Biology; Metadata; Reproducibility of Results; Semantics; Software
PubMed: 34668356
DOI: 10.1515/jib-2021-0020 -
Progress in Biophysics and Molecular... Jan 2022Advancements in neuroscience research have led to steadily accelerating data production and sharing. The online community repository of neural reconstructions...
Advancements in neuroscience research have led to steadily accelerating data production and sharing. The online community repository of neural reconstructions NeuroMorpho.Org grew from fewer than 1000 digitally traced neurons in 2006 to more than 140,000 cells today, including glia that now constitute 10.1% of the content. Every reconstruction consists of a detailed 3D representation of branch geometry and connectivity in a standardized format, from which a collection of morphometric features is extracted and stored. Moreover, each entry in the database is accompanied by rich metadata annotation describing the animal subject, anatomy, and experimental details. The rapid expansion of this resource in the past decade was accompanied by a parallel rise in the complexity of the available information, creating both opportunities and challenges for knowledge mining. Here, we introduce a new summary reporting functionality, allowing NeuroMorpho.Org users to efficiently download digests of metadata and morphometrics from multiple groups of similar cells for further analysis. We demonstrate the capabilities of the tool for both glia and neurons and present an illustrative statistical analysis of the resulting data.
Topics: Animals; Databases, Factual; Metadata; Neurons; Neurosciences
PubMed: 34022302
DOI: 10.1016/j.pbiomolbio.2021.05.005