-
Studies in Health Technology and... Aug 2019The FAIR principles require the reporting of rich metadata. However, when researchers use data for secondary use from external data owners, the FAIR principles require a...
The FAIR principles require the reporting of rich metadata. However, when researchers use data for secondary use from external data owners, the FAIR principles require a different implementation as if the researchers would describe their own data. In this paper, we specify how FAIR metadata can be implemented for secondary data analyses and provide a suggestion for relevant metadata.
Topics: Metadata
PubMed: 31438187
DOI: 10.3233/SHTI190490 -
BMC Bioinformatics Sep 2016Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence...
BACKGROUND
Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results.
RESULTS
We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers.
CONCLUSIONS
This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
Topics: Databases, Factual; Disease Outbreaks; Genome, Microbial; Genomics; Humans; Metadata; Software
PubMed: 27634291
DOI: 10.1186/s12859-016-1231-2 -
Acta Crystallographica. Section D,... Feb 2024Cryo-electron microscopy (cryo-EM) has witnessed radical progress in the past decade, driven by developments in hardware and software. While current software packages...
Cryo-electron microscopy (cryo-EM) has witnessed radical progress in the past decade, driven by developments in hardware and software. While current software packages include processing pipelines that simplify the image-processing workflow, they do not prioritize the in-depth analysis of crucial metadata, limiting troubleshooting for challenging data sets. The widely used RELION software package lacks a graphical native representation of the underlying metadata. Here, two web-based tools are introduced: relion_live.py, which offers real-time feedback on data collection, aiding swift decision-making during data acquisition, and relion_analyse.py, a graphical interface to represent RELION projects by plotting essential metadata including interactive data filtration and analysis. A useful script for estimating ice thickness and data quality during movie pre-processing is also presented. These tools empower researchers to analyse data efficiently and allow informed decisions during data collection and processing.
Topics: Cryoelectron Microscopy; Metadata; Image Processing, Computer-Assisted; Software; Internet
PubMed: 38265874
DOI: 10.1107/S2059798323010902 -
AMIA ... Annual Symposium Proceedings.... 2016The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of...
The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn't frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality.
Topics: Datasets as Topic; Delivery of Health Care; Metadata; Quality Control; United States
PubMed: 28269883
DOI: No ID Found -
Trials Nov 2016A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have...
BACKGROUND
A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme.
METHODS
The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored.
RESULTS
A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described.
CONCLUSIONS
The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.
Topics: Biomedical Research; Cooperative Behavior; Databases, Factual; Humans; Information Dissemination; Information Storage and Retrieval; Metadata
PubMed: 27881150
DOI: 10.1186/s13063-016-1686-5 -
PloS One 2023TV drama, through synchronization with social phenomena, allows the audience to resonate with the characters and desire to watch the next episode. In particular, drama...
TV drama, through synchronization with social phenomena, allows the audience to resonate with the characters and desire to watch the next episode. In particular, drama ratings can be the criterion for advertisers to invest in ad placement and a predictor of subsequent economic efficiency in the surrounding areas. To identify the dissemination patterns of social information about dramas, this study used machine learning to predict drama ratings and the contribution of various drama metadata, including broadcast year, broadcast season, TV stations, day of the week, broadcast time slot, genre, screenwriters, status as an original work or sequel, actors and facial features on posters. A total of 800 Japanese TV dramas broadcast during prime time between 2003 and 2020 were collected for analysis. Four machine learning classifiers, including naïve Bayes, artificial neural network, support vector machine, and random forest, were used to combine the metadata. With facial features, the accuracy of the random forest model increased from 75.80% to 77.10%, which shows that poster information can improve the accuracy of the overall predicted ratings. Using only posters to predict ratings with a convolutional neural network still obtained an accuracy rate of 71.70%. More insights about the correlations between drama metadata and social information dissemination patterns were explored.
Topics: Bayes Theorem; Metadata; Drama; Machine Learning; Information Dissemination; Support Vector Machine
PubMed: 38032993
DOI: 10.1371/journal.pone.0288932 -
Journal of the American College of... Mar 2016
Topics: Data Mining; Datasets as Topic; Diagnostic Imaging; Machine Learning; Meta-Analysis as Topic; Metadata; Radiology; Radiology Information Systems
PubMed: 26944035
DOI: 10.1016/j.jacr.2015.12.013 -
The Journal of Nursing Education Mar 2016
Topics: Education, Nursing; Metadata; Nursing; Nursing Research
PubMed: 26926211
DOI: 10.3928/01484834-20160216-01 -
Studies in Health Technology and... 2016Interoperability between systems and data sharing between domains is becoming more and more important. The portal medical-data-models.org offers more than 5.300 UMLS...
Interoperability between systems and data sharing between domains is becoming more and more important. The portal medical-data-models.org offers more than 5.300 UMLS annotated forms in CDISC ODM format in order to support interoperability, but several additional export formats are available. CDISC's ODM and HL7's framework FHIR Questionnaire resource were analyzed, a mapping between elements created and a converter implemented. The developed converter was integrated into the portal with FHIR Questionnaire XML or JSON download options. New FHIR applications can now use this large library of forms.
Topics: Electronic Health Records; Health Level Seven; Medical Record Linkage; Metadata; Semantics; Surveys and Questionnaires; Systems Integration
PubMed: 27577424
DOI: No ID Found -
International Journal of Molecular... Apr 2023Studying the association of gene function, diseases, and regulatory gene network reconstruction demands data compatibility. Data from different databases follow distinct...
Studying the association of gene function, diseases, and regulatory gene network reconstruction demands data compatibility. Data from different databases follow distinct schemas and are accessible in heterogenic ways. Although the experiments differ, data may still be related to the same biological entities. Some entities may not be strictly biological, such as geolocations of habitats or paper references, but they provide a broader context for other entities. The same entities from different datasets can share similar properties, which may or may not be found within other datasets. Joint, simultaneous data fetching from multiple data sources is complicated for the end-user or, in many cases, unsupported and inefficient due to differences in data structures and ways of accessing the data. We propose BioGraph-a new model that enables connecting and retrieving information from the linked biological data that originated from diverse datasets. We have tested the model on metadata collected from five diverse public datasets and successfully constructed a knowledge graph containing more than 17 million model objects, of which 2.5 million are individual biological entity objects. The model enables the selection of complex patterns and retrieval of matched results that can be discovered only by joining the data from multiple sources.
Topics: Metadata; Databases, Factual
PubMed: 37108117
DOI: 10.3390/ijms24086954