metadata - OpenMD.com Journal Search

Proceedings. IEEE Computer Society... Jun 2021

Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While...

Summary PubMed Full Text PDF

Authors: Mandy Lu, Qingyu Zhao, Jiequan Zhang...

Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.

PubMed: 34776724
DOI: 10.1109/cvpr46437.2021.01077

The Scientific Filesystem.

GigaScience May 2018

Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific...

Summary PubMed Full Text PDF

Authors: Vanessa Sochat

BACKGROUND

Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules ("apps") that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves.

RESULTS

We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists' work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and packaging logic to work with scientific applications, opening up new opportunities for scientific software development.

Topics: Information Storage and Retrieval; Metadata; Programming Languages; Science; Software; Workflow

PubMed: 29718213
DOI: 10.1093/gigascience/giy023

Machine actionable metadata models.

Scientific Data Sep 2022

Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the...

Summary PubMed Full Text PDF

Authors: Dominique Batista, Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone...

Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset.

Topics: Biological Science Disciplines; Humans; Metadata; Reproducibility of Results

PubMed: 36180441
DOI: 10.1038/s41597-022-01707-6

Expanding and Remixing the Metadata Landscape.

Trends in Cancer Apr 2021

Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated... (Review)

Summary PubMed Full Text PDF

Review

Authors: Ariel A Hippen, Casey S Greene

Genomic data sharing accelerates research. Data are most valuable when they are accompanied by detailed metadata. To date, metadata are often human-annotated descriptions of samples and their handling. We discuss how machine learning-derived elements complement such descriptions to enhance the research ecosystem around genomic data.

Topics: Genomics; Humans; Machine Learning; Metadata; Neoplasms

PubMed: 33229213
DOI: 10.1016/j.trecan.2020.10.011

Understanding the Nature of Metadata: Systematic Review.

Journal of Medical Internet Research Jan 2022

Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example,... (Review)

Summary PubMed Full Text PDF

Review

Authors: Hannes Ulrich, Ann-Kristin Kock-Schoppenhauer, Noemi Deppenwiese...

BACKGROUND

Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term "metadata" and its use is not always unambiguous.

OBJECTIVE

This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse.

METHODS

A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used.

RESULTS

The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas.

CONCLUSIONS

Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context.

Topics: Humans; Metadata; Publications; Reference Standards

PubMed: 35014967
DOI: 10.2196/25440

The Materials Provenance Store.

Scientific Data Apr 2023

We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials...

Summary PubMed Full Text PDF

Authors: Michael J Statt, Brian A Rohr, Dan Guevarra...

We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials Provenance Store (MPS), manages the metadata and experimental provenance from acquisition of raw materials, through synthesis, to a broad range of materials characterization techniques. Given the primary research goal of materials discovery of solar fuels materials, many of the characterization experiments involve electrochemistry, along with optical, structural, and compositional characterizations. The MPS is populated with all information required for executing common data queries, which typically do not involve direct query of raw data. The result is a database file that can be distributed to users so that they can independently execute queries and subsequently download the data of interest. We propose this strategy as an approach to manage the highly heterogeneous and distributed data that arises from materials science experiments, as demonstrated by the management of over 30 million experiments run on over 12 million samples in the present MPS release.

Topics: Semantics; Databases, Factual; Metadata

PubMed: 37024515
DOI: 10.1038/s41597-023-02107-0

From ArrayExpress to BioStudies.

Nucleic Acids Research Jan 2021

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for...

Summary PubMed Full Text PDF

Authors: Ugis Sarkans, Anja Füllgrabe, Ahmed Ali...

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.

Topics: Animals; Cell Line; DNA Methylation; Databases, Genetic; Epigenesis, Genetic; Gene Expression Profiling; Genomics; High-Throughput Nucleotide Sequencing; Humans; Internet; Metadata; Oligonucleotide Array Sequence Analysis; Organ Specificity; Plants; Single-Cell Analysis; Software

PubMed: 33211879
DOI: 10.1093/nar/gkaa1062

Enabling reproducible re-analysis of single-cell data.

Genome Biology Jul 2021

Summary PubMed Full Text PDF

Authors: Michael A Skinnider, Jordan W Squair, Grégoire Courtine...

Topics: Cell Lineage; Computational Biology; Datasets as Topic; High-Throughput Nucleotide Sequencing; Humans; Metadata; Reproducibility of Results; Sequence Analysis, RNA; Single-Cell Analysis; Transcriptome

PubMed: 34311752
DOI: 10.1186/s13059-021-02422-y

Standard metadata for 3D microscopy.

Scientific Data Jul 2022

Recent advances in fluorescence microscopy techniques and tissue clearing, labeling, and staining provide unprecedented opportunities to investigate brain structure and...

Summary PubMed Full Text PDF

Authors: Alexander J Ropelewski, Megan A Rizzo, Jason R Swedlow...

Recent advances in fluorescence microscopy techniques and tissue clearing, labeling, and staining provide unprecedented opportunities to investigate brain structure and function. These experiments' images make it possible to catalog brain cell types and define their location, morphology, and connectivity in a native context, leading to a better understanding of normal development and disease etiology. Consistent annotation of metadata is needed to provide the context necessary to understand, reuse, and integrate these data. This report describes an effort to establish metadata standards for three-dimensional (3D) microscopy datasets for use by the Brain Research through Advancing Innovative Neurotechnologies® (BRAIN) Initiative and the neuroscience research community. These standards were built on existing efforts and developed with input from the brain microscopy community to promote adoption. The resulting 3D Microscopy Metadata Standards (3D-MMS) includes 91 fields organized into seven categories: Contributors, Funders, Publication, Instrument, Dataset, Specimen, and Image. Adoption of these metadata standards will ensure that investigators receive credit for their work, promote data reuse, facilitate downstream analysis of shared data, and encourage collaboration.

Topics: Brain; Datasets as Topic; Humans; Metadata; Microscopy

PubMed: 35896564
DOI: 10.1038/s41597-022-01562-5

OMEX metadata specification (version 1.2).

Journal of Integrative Bioinformatics Oct 2021

A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups,...

Summary PubMed Full Text PDF

Authors: John H Gennari, Matthias König, Goksel Misirli...

A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups, enhance search and retrieval of models and data, and enable semantic comparisons between models. Motivated by these potential benefits and guided by consensus across the COmputational Modeling in BIology NEtwork (COMBINE) community, we have developed a specification for encoding annotations in Open Modeling and EXchange (OMEX)-formatted archives. This document details version 1.2 of the specification, which builds on version 1.0 published last year in this journal. In particular, this version includes a set of initial model-level annotations (whereas v 1.0 described exclusively annotations at a smaller scale). Additionally, this version uses best practices for namespaces, and introduces omex-library.org as a common root for all annotations. Distributing modeling projects within an OMEX archive is a best practice established by COMBINE, and the OMEX metadata specification presented here provides a harmonized, community-driven approach for annotating a variety of standardized model representations. This specification acts as a technical guideline for developing software tools that can support this standard, and thereby encourages broad advances in model reuse, discovery, and semantic analyses.

Topics: Computational Biology; Metadata; Reproducibility of Results; Semantics; Software

PubMed: 34668356
DOI: 10.1515/jib-2021-0020