metadata - OpenMD.com Journal Search

GenoSurf: metadata driven semantic search system for integrated genomic datasets.

Database : the Journal of Biological... Jan 2019

Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research,...

Summary PubMed Full Text PDF

Authors: Arif Canakoglu, Anna Bernasconi, Andrea Colombo...

Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.

Topics: Databases, Genetic; Female; Humans; Knowledge Bases; Metadata; Semantics; Software; User-Computer Interface

PubMed: 31820804
DOI: 10.1093/database/baz132

Toward a Sample Metadata Standard in Public Proteomics Repositories.

Journal of Proteome Research Oct 2020

Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture...

Summary PubMed Full Text PDF

Authors: Yasset Perez-Riverol,

Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.

Topics: Mass Spectrometry; Metadata; Proteomics; Reproducibility of Results; Software

PubMed: 32786688
DOI: 10.1021/acs.jproteome.0c00376

Recommendations for the FAIRification of genomic track metadata.

F1000Research 2021

Many types of data from genomic analyses can be represented as genomic tracks, features linked to the genomic coordinates of a reference genome. Examples of such data...

Summary PubMed Full Text PDF

Authors: Sveinung Gundersen, Sanjay Boddu, Salvador Capella-Gutierrez...

Many types of data from genomic analyses can be represented as genomic tracks, features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.

Topics: Ecosystem; Genome; Genomics; Metadata; Software

PubMed: 34249331
DOI: 10.12688/f1000research.28449.1

Microbench: automated metadata management for systems biology benchmarking and reproducibility in Python.

Bioinformatics (Oxford, England) Oct 2022

Computational systems biology analyses typically make use of multiple software and their dependencies, which are often run across heterogeneous compute environments....

Summary PubMed Full Text PDF

Authors: Alexander L R Lubbock, Carlos F Lopez

MOTIVATION

Computational systems biology analyses typically make use of multiple software and their dependencies, which are often run across heterogeneous compute environments. This can introduce differences in performance and reproducibility. Capturing metadata (e.g. package versions, GPU model) currently requires repetitious code and is difficult to store centrally for analysis. Even where virtual environments and containers are used, updates over time mean that versioning metadata should still be captured within analysis pipelines to guarantee reproducibility.

RESULTS

Microbench is a simple and extensible Python package to automate metadata capture to a file or Redis database. Captured metadata can include execution time, software package versions, environment variables, hardware information, Python version and more, with plugins. We present three case studies demonstrating Microbench usage to benchmark code execution and examine environment metadata for reproducibility purposes.

AVAILABILITY AND IMPLEMENTATION

Install from the Python Package Index using pip install microbench. Source code is available from https://github.com/alubbock/microbench.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Topics: Benchmarking; Metadata; Reproducibility of Results; Software; Systems Biology

PubMed: 36000837
DOI: 10.1093/bioinformatics/btac580

OMeta: an ontology-based, data-driven metadata tracking system.

BMC Bioinformatics Jan 2019

The development of high-throughput sequencing and analysis has accelerated multi-omics studies of thousands of microbial species, metagenomes, and infectious disease...

Summary PubMed Full Text PDF

Authors: Indresh Singh, Mehmet Kuscuoglu, Derek M Harkins...

BACKGROUND

The development of high-throughput sequencing and analysis has accelerated multi-omics studies of thousands of microbial species, metagenomes, and infectious disease pathogens. Omics studies are enabling genotype-phenotype association studies which identify genetic determinants of pathogen virulence and drug resistance, as well as phylogenetic studies designed to track the origin and spread of disease outbreaks. These omics studies are complex and often employ multiple assay technologies including genomics, metagenomics, transcriptomics, proteomics, and metabolomics. To maximize the impact of omics studies, it is essential that data be accompanied by detailed contextual metadata (e.g., specimen, spatial-temporal, phenotypic characteristics) in clear, organized, and consistent formats. Over the years, many metadata standards developed by various metadata standards initiatives have arisen; the Genomic Standards Consortium's minimal information standards (MIxS), the GSCID/BRC Project and Sample Application Standard. Some tools exist for tracking metadata, but they do not provide event based capabilities to configure, collect, validate, and distribute metadata. To address this gap in the scientific community, an event based data-driven application, OMeta, was created that allows users to quickly configure, collect, validate, distribute, and integrate metadata.

RESULTS

A data-driven web application, OMeta, has been developed for use by researchers consisting of a browser-based interface, a command-line interface (CLI), and server-side components that provide an intuitive platform for configuring, capturing, viewing, and sharing metadata. Project and sample metadata can be set based on existing standards or based on projects goals. Recorded information includes details on the biological samples, procedures, protocols, and experimental technologies, etc. This information can be organized based on events, including sample collection, sample quantification, sequencing assay, and analysis results. OMeta enables configuration in various presentation types: checkbox, file, drop-box, ontology, and fields can be configured to use the National Center for Biomedical Ontology (NCBO), a biomedical ontology server. Furthermore, OMeta maintains a complete audit trail of all changes made by users and allows metadata export in comma separated value (CSV) format for convenient deposition of data into public databases.

CONCLUSIONS

We present, OMeta, a web-based software application that is built on data-driven principles for configuring and customizing data standards, capturing, curating, and sharing metadata.

Topics: Biological Ontologies; Databases, Factual; Metadata; Metagenomics; Phylogeny; Software; User-Computer Interface; Whole Genome Sequencing

PubMed: 30612540
DOI: 10.1186/s12859-018-2580-9

FAIRifying Clinical Studies Metadata: A Registry for the Biomedical Research.

Studies in Health Technology and... May 2021

The data produced during a research project are too often collected for the sole purpose of the study, therefore hindering profitable reuse in similar contexts. The...

Summary PubMed

Authors: Vittorio Meloni, Alessandro Sulis, Cecilia Mascia...

The data produced during a research project are too often collected for the sole purpose of the study, therefore hindering profitable reuse in similar contexts. The growing need to counteract this trend has recently led to the formalization of the FAIR principles that aim to make (meta)data Findable, Accessible, Interoperable and Reusable, for humans and machines. Since their introduction, efforts are ongoing to encourage FAIR principles adoption and to implement solutions based on them. This paper reports on the FAIR-compliant registry we developed to collect and serve metadata describing clinical trials. The design of the registry is based on the FAIR Data Point (FDP) specifications, the state-of-the-art reference for FAIRified metadata sharing. To map the metadata relevant to our use case, we have extended the DCAT-based semantic model of the FDP adopting well-established ontologies in the biomedical and clinical domain, like the Semanticscience Integrated Ontology (SIO). Current implementation is based on the Molgenis software and provides both a user interface and a REST API for metadata discovering. At present the registry is being loaded with the metadata of the 18 clinical studies included in the 'I FAIR Program', a project finalised to the dissemination of FAIR best practices among the clinical researchers in Sardinia (Italy). After a testing phase, the registry will be publicly available, while the new model and the source code will be released open source.

Topics: Biomedical Research; Humans; Italy; Metadata; Registries; Software

PubMed: 34042684
DOI: 10.3233/SHTI210281

Metadata-independent classification of MRI sequences using convolutional neural networks: Successful application to prostate MRI.

European Journal of Radiology Sep 2023

The ever-increasing volume of medical imaging data and interest in Big Data research brings challenges to data organization, categorization, and retrieval. Although the...

Summary PubMed

Authors: Georg L Baumgärtner, Charlie A Hamm, Sophia Schulze-Weddige...

PURPOSE

The ever-increasing volume of medical imaging data and interest in Big Data research brings challenges to data organization, categorization, and retrieval. Although the radiological value chain is almost entirely digital, data structuring has been widely performed pragmatically, but with insufficient naming and metadata standards for the stringent needs of image analysis. To enable automated data management independent of naming and metadata, this study focused on developing a convolutional neural network (CNN) that classifies medical images based solely on voxel data.

METHOD

A 3D CNN (3D-ResNet18) was trained using a dataset of 31,602 prostate MRI volumes with 10 different sequence types of 1243 patients. A five-fold cross-validation approach with patient-based splits was chosen for training and testing. Training was repeated with a gradual reduction in training data assessing classification accuracies to determine the minimum training data required for sufficient performance. The trained model and developed method were tested on three external datasets.

RESULTS

The model achieved an overall accuracy of 99.88 % ± 0.13 % in classifying typical prostate MRI sequence types. When being trained with approximately 10 % of the original cohort (112 patients), the CNN still achieved an accuracy of 97.43 % ± 2.10 %. In external testing the model achieved sensitivities of > 90 % for 10/15 tested sequence types.

CONCLUSIONS

The herein developed CNN enabled automatic and reliable sequence identification in prostate MRI. Ultimately, such CNN models for voxel-based sequence identification could substantially enhance the management of medical imaging data, improve workflow efficiency and data quality, and allow for robust clinical AI workflows.

Topics: Male; Humans; Prostate; Metadata; Magnetic Resonance Imaging; Neural Networks, Computer; Image Processing, Computer-Assisted

PubMed: 37453274
DOI: 10.1016/j.ejrad.2023.110964

META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration.

IEEE/ACM Transactions on Computational... 2022

The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data...

Summary PubMed

Authors: Anna Bernasconi, Arif Canakoglu, Marco Masseroli...

The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data repositories is available to drive biological and clinical research; combining information from various heterogeneous and widely dispersed sources is paramount to a number of biological discoveries. It is difficult because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and ontologies. It is well-recognized in the bioinformatics community because, in the common practice, repositories are accessed one-by-one, learning their specific metadata definitions as result of long and tedious efforts, and such practice is error-prone. In this paper, we describe META-BASE, an architecture for integrating metadata extracted from a variety of genomic data sources, based upon a structured transformation process. We present a variety of innovative techniques for data extraction, cleaning, normalization and enrichment. We propose a general, open and extensible pipeline that can easily incorporate any number of new data sources, and propose the resulting repository-already integrating several important sources-which is exposed by means of practical user interfaces to respond biological researchers' needs.

Topics: Computational Biology; Genomics; Information Storage and Retrieval; Metadata

PubMed: 32750853
DOI: 10.1109/TCBB.2020.2998954

Towards a Federation of Metadata Repositories: Addressing Technical Interoperability.

Studies in Health Technology and... Sep 2019

The utilisation of metadata repositories increasingly promotes secondary use of routinely collected data. However, this has not yet solved the problem of data exchange...

Summary PubMed

Authors: Hannes Ulrich, Jori Kern, Ann-Kristin Kock-Schoppenhauer...

The utilisation of metadata repositories increasingly promotes secondary use of routinely collected data. However, this has not yet solved the problem of data exchange across organisational boundaries. The local description of a metadata set must also be exchangeable for flawless data exchange. In previous work, a metadata exchange language QL4MDR was developed. This work aimed to examine the applicability of this exchange language. For this purpose, existing MDR implementations were identified and systematically inspected and roughly divided into two categories to distinguish between data integration and query integration. It has been shown that all the implementations can be adapted to QL4MDR. The integration of metadata is an important first step; it enables the exchange of information, which is so urgently needed for the further processing of instance data, from the metadata mappings to the transformation rules.

Topics: Metadata

PubMed: 31483257
DOI: 10.3233/SHTI190808

Design of Metadata Services for Clinical Data Interoperability in Germany.

Studies in Health Technology and... Aug 2019

Secondary use of electronic health record (EHR) data requires a detailed description of metadata, especially when data collection and data re-use are organizationally...

Summary PubMed

Authors: Matthias Löbe, Oya Beyan, Sebastian Stäubert...

Secondary use of electronic health record (EHR) data requires a detailed description of metadata, especially when data collection and data re-use are organizationally and technically far apart. This paper describes the concept of the SMITH consortium that includes conventions, processes, and tools for describing and managing metadata using common standards for semantic interoperability. It deals in particular with the chain of processing steps of data from existing information systems and provides an overview of the planned use of metadata, medical terminologies, and semantic services in the consortium.

Topics: Data Collection; Electronic Health Records; Germany; Information Systems; Metadata; Semantics

PubMed: 31438215
DOI: 10.3233/SHTI190518