-
The Journal of Nursing Education Mar 2016
Topics: Education, Nursing; Metadata; Nursing; Nursing Research
PubMed: 26926211
DOI: 10.3928/01484834-20160216-01 -
Journal of the American College of... Mar 2016
Topics: Data Mining; Datasets as Topic; Diagnostic Imaging; Machine Learning; Meta-Analysis as Topic; Metadata; Radiology; Radiology Information Systems
PubMed: 26944035
DOI: 10.1016/j.jacr.2015.12.013 -
Journal of Integrative Bioinformatics Oct 2021A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups,...
A standardized approach to annotating computational biomedical models and their associated files can facilitate model reuse and reproducibility among research groups, enhance search and retrieval of models and data, and enable semantic comparisons between models. Motivated by these potential benefits and guided by consensus across the COmputational Modeling in BIology NEtwork (COMBINE) community, we have developed a specification for encoding annotations in Open Modeling and EXchange (OMEX)-formatted archives. This document details version 1.2 of the specification, which builds on version 1.0 published last year in this journal. In particular, this version includes a set of initial model-level annotations (whereas v 1.0 described exclusively annotations at a smaller scale). Additionally, this version uses best practices for namespaces, and introduces omex-library.org as a common root for all annotations. Distributing modeling projects within an OMEX archive is a best practice established by COMBINE, and the OMEX metadata specification presented here provides a harmonized, community-driven approach for annotating a variety of standardized model representations. This specification acts as a technical guideline for developing software tools that can support this standard, and thereby encourages broad advances in model reuse, discovery, and semantic analyses.
Topics: Computational Biology; Metadata; Reproducibility of Results; Semantics; Software
PubMed: 34668356
DOI: 10.1515/jib-2021-0020 -
GigaScience Dec 2022Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many...
Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many samples with their metadata. They must manage many large files that are the results of the assays or subsequent computation. Users with diverse backgrounds, ranging from computational scientists to wet-lab scientists, have dissimilar needs when it comes to data access, with programmatic interfaces being favored by the former and graphical ones by the latter. We introduce SODAR, the system for omics data access and retrieval. SODAR is a software package that addresses these challenges by providing a web-based graphical user interface for managing multiassay studies and describing them using the ISA (Investigation, Study, Assay) data model and the ISA-Tab file format. Data storage is handled using the iRODS data management system, which handles large quantities of files and substantial amounts of data. SODAR also offers programmable APIs and command-line access for metadata and file storage. SODAR supports complex omics integration studies and can be easily installed. The software is written in Python 3 and freely available at https://github.com/bihealth/sodar-server under the MIT license.
Topics: Multiomics; Metadata; Software; Information Storage and Retrieval; Data Management
PubMed: 37498129
DOI: 10.1093/gigascience/giad052 -
GigaScience Sep 2021Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by...
BACKGROUND
Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.
FINDINGS
Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.
CONCLUSION
Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.
Topics: Algorithms; Bias; Metadata; Molecular Sequence Annotation; RNA; Sequence Analysis, RNA
PubMed: 34553213
DOI: 10.1093/gigascience/giab064 -
Bioinformatics (Oxford, England) Jan 2023Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there...
MOTIVATION
Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction.
RESULTS
We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access.
AVAILABILITY AND IMPLEMENTATION
ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.
Topics: Software; Metadata; Databases, Nucleic Acid
PubMed: 36610997
DOI: 10.1093/bioinformatics/btac667 -
Journal of Medical Internet Research Mar 2023Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve... (Review)
Review
BACKGROUND
Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research.
OBJECTIVE
The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption.
METHODS
Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures.
RESULTS
We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV.
CONCLUSIONS
The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.
Topics: Humans; Biomedical Research; Metadata; PubMed; Reproducibility of Results; Software
PubMed: 36972116
DOI: 10.2196/42289 -
Progress in Biophysics and Molecular... Jan 2022Advancements in neuroscience research have led to steadily accelerating data production and sharing. The online community repository of neural reconstructions...
Advancements in neuroscience research have led to steadily accelerating data production and sharing. The online community repository of neural reconstructions NeuroMorpho.Org grew from fewer than 1000 digitally traced neurons in 2006 to more than 140,000 cells today, including glia that now constitute 10.1% of the content. Every reconstruction consists of a detailed 3D representation of branch geometry and connectivity in a standardized format, from which a collection of morphometric features is extracted and stored. Moreover, each entry in the database is accompanied by rich metadata annotation describing the animal subject, anatomy, and experimental details. The rapid expansion of this resource in the past decade was accompanied by a parallel rise in the complexity of the available information, creating both opportunities and challenges for knowledge mining. Here, we introduce a new summary reporting functionality, allowing NeuroMorpho.Org users to efficiently download digests of metadata and morphometrics from multiple groups of similar cells for further analysis. We demonstrate the capabilities of the tool for both glia and neurons and present an illustrative statistical analysis of the resulting data.
Topics: Animals; Databases, Factual; Metadata; Neurons; Neurosciences
PubMed: 34022302
DOI: 10.1016/j.pbiomolbio.2021.05.005 -
GigaScience Dec 2022Contamination detection is a important step that should be carefully considered in early stages when designing and performing microbiome studies to avoid biased...
BACKGROUND
Contamination detection is a important step that should be carefully considered in early stages when designing and performing microbiome studies to avoid biased outcomes. Detecting and removing true contaminants is challenging, especially in low-biomass samples or in studies lacking proper controls. Interactive visualizations and analysis platforms are crucial to better guide this step, to help to identify and detect noisy patterns that could potentially be contamination. Additionally, external evidence, like aggregation of several contamination detection methods and the use of common contaminants reported in the literature, could help to discover and mitigate contamination.
RESULTS
We propose GRIMER, a tool that performs automated analyses and generates a portable and interactive dashboard integrating annotation, taxonomy, and metadata. It unifies several sources of evidence to help detect contamination. GRIMER is independent of quantification methods and directly analyzes contingency tables to create an interactive and offline report. Reports can be created in seconds and are accessible for nonspecialists, providing an intuitive set of charts to explore data distribution among observations and samples and its connections with external sources. Further, we compiled and used an extensive list of possible external contaminant taxa and common contaminants with 210 genera and 627 species reported in 22 published articles.
CONCLUSION
GRIMER enables visual data exploration and analysis, supporting contamination detection in microbiome studies. The tool and data presented are open source and available at https://gitlab.com/dacs-hpi/grimer.
Topics: Microbiota; Biomass; Metadata
PubMed: 36994872
DOI: 10.1093/gigascience/giad017 -
Studies in Health Technology and... May 2023Metadata standards are well-established for many types of electrophysiological methods but are still lacking for microneurographic recordings of peripheral sensory nerve...
Metadata standards are well-established for many types of electrophysiological methods but are still lacking for microneurographic recordings of peripheral sensory nerve fibers in humans. Finding a solution for daily work in the laboratory is a complex process. We have designed templates based on odML and odML-tables to structure and capture metadata and provided an extension to the existing GUI to enable database searching.
Topics: Humans; Metadata; Palliative Care
PubMed: 37203689
DOI: 10.3233/SHTI230144