-
GigaScience May 2018Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific...
BACKGROUND
Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules ("apps") that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves.
RESULTS
We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists' work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and packaging logic to work with scientific applications, opening up new opportunities for scientific software development.
Topics: Information Storage and Retrieval; Metadata; Programming Languages; Science; Software; Workflow
PubMed: 29718213
DOI: 10.1093/gigascience/giy023 -
Handbook of Experimental Pharmacology 2020While research data has become integral to the scholarly endeavour, a number of challenges hinder its development, management and dissemination. This chapter follows the...
While research data has become integral to the scholarly endeavour, a number of challenges hinder its development, management and dissemination. This chapter follows the life cycle of research data, by considering aspects ranging from storage and preservation to sharing and legal factors. While it provides a wide overview of the current ecosystem, it also pinpoints the elements comprising the modern research sharing practices such as metadata creation, the FAIR principles, identifiers, Creative Commons licencing and the various repository options. Furthermore, the chapter discusses the mandates and regulations that influence data sharing and the possible technological means of overcoming their complexity, such as blockchain systems.
Topics: Data Collection; Ecosystem; Information Dissemination; Information Storage and Retrieval; Metadata
PubMed: 31792682
DOI: 10.1007/164_2019_288 -
Advances in Biochemical... 2022In this chapter the concept of research data management is highlighted in the context of the data publication and data infrastructures. One focus of this contribution...
In this chapter the concept of research data management is highlighted in the context of the data publication and data infrastructures. One focus of this contribution lies on the topics of metadata and the FAIR data principles associated with data sharing and data infrastructures such as data repositories. The challenges for researchers and research communities towards open science are discussed and the first steps towards FAIR data infrastructures are illustrated.
Topics: Information Dissemination; Metadata
PubMed: 35091812
DOI: 10.1007/10_2021_193 -
Scientific Data Sep 2022Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the...
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.
Topics: Biological Science Disciplines; Humans; Metadata; Reproducibility of Results
PubMed: 36180441
DOI: 10.1038/s41597-022-01707-6 -
Nature Methods Dec 2021
Topics: Cell Nucleus; Humans; Image Processing, Computer-Assisted; Information Dissemination; Medical Informatics; Metadata; Microscopy; Software
PubMed: 34862504
DOI: 10.1038/s41592-021-01342-w -
Trends in Cancer Apr 2021Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated... (Review)
Review
Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated descriptions of samples and their handling. We discuss how machine learning-derived elements complement such descriptions to enhance the research ecosystem around genomic data.
Topics: Genomics; Humans; Machine Learning; Metadata; Neoplasms
PubMed: 33229213
DOI: 10.1016/j.trecan.2020.10.011 -
Journal of Medical Internet Research Jan 2022Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example,... (Review)
Review
BACKGROUND
Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term "metadata" and its use is not always unambiguous.
OBJECTIVE
This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse.
METHODS
A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used.
RESULTS
The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas.
CONCLUSIONS
Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context.
Topics: Humans; Metadata; Publications; Reference Standards
PubMed: 35014967
DOI: 10.2196/25440 -
Nature Methods Dec 2021
Topics: Humans; Image Processing, Computer-Assisted; Medical Informatics; Metadata; Microscopy
PubMed: 34862498
DOI: 10.1038/s41592-021-01347-5 -
Scientific Data Apr 2023We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials...
We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials Provenance Store (MPS), manages the metadata and experimental provenance from acquisition of raw materials, through synthesis, to a broad range of materials characterization techniques. Given the primary research goal of materials discovery of solar fuels materials, many of the characterization experiments involve electrochemistry, along with optical, structural, and compositional characterizations. The MPS is populated with all information required for executing common data queries, which typically do not involve direct query of raw data. The result is a database file that can be distributed to users so that they can independently execute queries and subsequently download the data of interest. We propose this strategy as an approach to manage the highly heterogeneous and distributed data that arises from materials science experiments, as demonstrated by the management of over 30 million experiments run on over 12 million samples in the present MPS release.
Topics: Semantics; Databases, Factual; Metadata
PubMed: 37024515
DOI: 10.1038/s41597-023-02107-0 -
Nature Genetics Oct 2020Access to medical data is central for conducting research on genomics. However, to tap these metadata (observable traits and phenotypes, diagnoses and medication, and... (Review)
Review
Access to medical data is central for conducting research on genomics. However, to tap these metadata (observable traits and phenotypes, diagnoses and medication, and labels), researchers must grapple with the complex and sensitive nature of the information. In this Perspective, we argue that, at this exciting time for genomics and artificial intelligence, several critical aspects of data generation, infrastructure and management are pillars of a modern data ecosystem. Many risks to privacy and many obstacles to medical research can be eliminated or mitigated by new secure data analytics. Finally, we discuss the potential consequences of medical data exiting the institutions and being managed by individuals. These shifts in data ownership have the potential for profound disruption and opportunity across many fields.
Topics: Artificial Intelligence; Genomics; Humans; Information Storage and Retrieval; Metadata; Privacy; Software
PubMed: 32929286
DOI: 10.1038/s41588-020-0698-y