-
Scientific Data Feb 2019We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known...
We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample-a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples-a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4 M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets.
Topics: Biological Specimen Banks; Data Accuracy; Metadata
PubMed: 30778255
DOI: 10.1038/sdata.2019.21 -
Studies in Health Technology and... May 2021Metadata repositories are an indispensable component of data integration infrastructures and support semantic interoperability between knowledge organization systems....
Metadata repositories are an indispensable component of data integration infrastructures and support semantic interoperability between knowledge organization systems. Standards for metadata representation like the ISO/IEC 11179 as well as the Resource Description Framework (RDF) and the Simple Knowledge Organization System (SKOS) by the World Wide Web Consortium were published to ensure metadata interoperability, maintainability and sustainability. The FAIR guidelines were composed to explicate those aspects in four principles divided in fifteen sub-principles. The ISO/IEC 21526 standard extends the 11179 standard for the domain of health care and mandates that SKOS be used for certain scenarios. In medical informatics, the composition of health care SKOS classification schemes is often managed by documentalists and data scientists. They use editors, which support them in producing comprehensive and valid metadata. Current metadata editors either do not properly support the SKOS resource annotations, require server applications or make use of additional databases for metadata storage. These characteristics are contrary to the application independency and versatility of raw Unicode SKOS files, e.g. the custom text arrangement, extensibility or copy & paste editing. We provide an application that adds navigation, auto completion and validity check capabilities on top of a regular Unicode text editor.
Topics: Databases, Factual; Delivery of Health Care; Medical Informatics; Metadata; Vocabulary, Controlled
PubMed: 34042881
DOI: 10.3233/SHTI210056 -
American Journal of Biological... Apr 2024Ancient human dental calculus is a unique, nonrenewable biological resource encapsulating key information about the diets, lifestyles, and health conditions of past... (Review)
Review
OBJECTIVES
Ancient human dental calculus is a unique, nonrenewable biological resource encapsulating key information about the diets, lifestyles, and health conditions of past individuals and populations. With compounding calls its destructive analysis, it is imperative to refine the ways in which the scientific community documents, samples, and analyzes dental calculus so as to maximize its utility to the public and scientific community.
MATERIALS AND METHODS
Our research team conducted an IRB-approved survey of dental calculus researchers with diverse academic backgrounds, research foci, and analytical specializations.
RESULTS
This survey reveals variation in how metadata is collected and utilized across different subdisciplines and highlights how these differences have profound implications for dental calculus research. Moreover, the survey suggests the need for more communication between those who excavate, curate, and analyze biomolecular data from dental calculus.
DISCUSSION
Challenges in cross-disciplinary communication limit researchers' ability to effectively utilize samples in rigorous and reproducible ways. Specifically, the lack of standardized skeletal and dental metadata recording and contamination avoidance procedures hinder downstream anthropological applications, as well as the pursuit of broader paleodemographic and paleoepidemiological inquiries that rely on more complete information about the individuals sampled. To provide a path forward toward more ethical and standardized dental calculus sampling and documentation approaches, we review the current methods by which skeletal and dental metadata are recorded. We also describe trends in sampling and contamination-control approaches. Finally, we use that information to suggest new guidelines for ancient dental calculus documentation and sampling strategies that will improve research practices in the future.
Topics: Humans; Dental Calculus; Metadata; Anthropology; Communication; Documentation
PubMed: 37994571
DOI: 10.1002/ajpa.24871 -
Scientific Reports Mar 2022Forest tree improvement helps provide adapted planting stock to ensure growth productivity, fibre quality and carbon sequestration through reforestation and...
Metadata analysis indicates biased estimation of genetic parameters and gains using conventional pedigree information instead of genomic-based approaches in tree breeding.
Forest tree improvement helps provide adapted planting stock to ensure growth productivity, fibre quality and carbon sequestration through reforestation and afforestation activities. However, there is increasing doubt that conventional pedigree provides the most accurate estimates for selection and prediction of performance of improved planting stock. When the additive genetic relationships among relatives is estimated using pedigree information, it is not possible to take account of Mendelian sampling due to the random segregation of parental alleles. The use of DNA markers distributed genome-wide (multi-locus genotypes) makes it possible to estimate the realized additive genomic relationships, which takes account of the Mendelian sampling and possible pedigree errors. We reviewed a series of papers on conifer and broadleaf tree species in which both pedigree-based and marker-based estimates of genetic parameters have been reported. Using metadata analyses, we show that for heritability and genetic gains, the estimates obtained using only the pedigree information are generally biased upward compared to those obtained using DNA markers distributed genome-wide, and that genotype-by-environment (GxE) interaction can be underestimated for low to moderate heritability traits. As high-throughput genotyping becomes economically affordable, we recommend expanding the use of genomic selection to obtain more accurate estimates of genetic parameters and gains.
Topics: Alleles; Genetic Markers; Genotype; Metadata; Models, Genetic; Phenotype; Plant Breeding; Trees
PubMed: 35273188
DOI: 10.1038/s41598-022-06681-y -
Bioinformatics (Oxford, England) May 2022To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to...
SUMMARY
To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents.
AVAILABILITY AND IMPLEMENTATION
Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metadata; Semantics; Software; Biological Science Disciplines
PubMed: 35380605
DOI: 10.1093/bioinformatics/btac195 -
Nucleic Acids Research Jan 2019The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major...
The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major changes, both in terms of data content and supporting infrastructure. The data content has more than doubled from around 2 million samples in 2014 to just over 5 million samples in 2018. Fast, reciprocal data exchange was fully established between sister Biosample databases and other INSDC partners, enabling a worldwide common representation and centralization of sample metadata. The BioSamples platform has been upgraded to accommodate anticipated increases in the number of submissions via GA4GH driver projects such as the Human Cell Atlas and the EGA, as well as from mirroring of NCBI dbGaP data. The BioSamples database is now the authoritative repository for all INSDC sample metadata, an ELIXIR Deposition Database for Biomolecular Data and the EMBL-EBI sample metadata hub. To support faster turnaround for sample submission, and to increase scalability and resilience, we have upgraded the BioSamples database backend storage, APIs and user interface. Finally, the website has been redesigned to allow search and retrieval of records based on specific filters, such as 'disease' or 'organism'. These changes are targeted at answering current use cases as well as providing functionalities for future emerging and anticipated developments. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use.
Topics: Biological Specimen Banks; Computational Biology; Databases, Genetic; Databases, Nucleic Acid; Genomics; Humans; Information Storage and Retrieval; Internet; Metadata; User-Computer Interface
PubMed: 30407529
DOI: 10.1093/nar/gky1061 -
Metabolomics : Official Journal of the... Nov 2022The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based...
INTRODUCTION
The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. The Metabolomics Standard Initiative has developed a multilevel system to report confidence in metabolite identification, which involves the use of MS, MS/MS and orthogonal data. Limitations due to similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by the additional orthogonal information of the retention time (RT), since it is a system property that is different for each chromatographic setup.
OBJECTIVES
In contrast to MS data, sharing of RT data is not as widespread. The quality of data and its (re-)useability depend very much on the quality of the metadata. We aimed to evaluate the coverage and quality of this metadata from public metabolomics repositories.
METHODS
We acquired an overview on the current reporting of chromatographic separation conditions. For this purpose, we defined the following information as important details that have to be provided: column name and dimension, flow rate, temperature, composition of eluents and gradient.
RESULTS
We found that 70% of descriptions of the chromatographic setups are incomplete (according to our definition) and an additional 10% of the descriptions contained ambiguous and/or incorrect information. Accordingly, only about 20% of the descriptions allow further (re-)use of the data, e.g. for RT prediction. Therefore, we have started to develop a unified and standardized notation for chromatographic metadata with detailed and specific description of eluents, columns and gradients.
CONCLUSION
Reporting of chromatographic metadata is currently not unified. Our recommended suggestions for metadata reporting will enable more standardization and automatization in future reporting.
Topics: Metadata; Metabolomics; Tandem Mass Spectrometry; Chromatography, Liquid; Temperature
PubMed: 36436113
DOI: 10.1007/s11306-022-01956-x -
Studies in Health Technology and... 2018Interoperable metadata is key for the management of genomic information. We propose a flexible approach that we contribute to the standardization by ISO/IEC of a new...
Interoperable metadata is key for the management of genomic information. We propose a flexible approach that we contribute to the standardization by ISO/IEC of a new format for efficient and secure compressed storage and transmission of genomic information.
Topics: Genomics; Metadata
PubMed: 29678035
DOI: No ID Found -
Journal of Biomolecular Techniques : JBT Apr 2022Data management is a critical challenge required to improve the rigor and reproducibility of large projects. Adhering to Findable, Accessible, Interoperable, and...
Data management is a critical challenge required to improve the rigor and reproducibility of large projects. Adhering to Findable, Accessible, Interoperable, and Reusable (FAIR) standards provides a baseline for meeting these requirements. Although many existing repositories handle data in a FAIR-compliant manner, there are limited tools in the public domain to handle the metadata burden required to connect data from multi-omic projects that span multiple institutions and are deposited in diverse repositories. One promising approach is the SEEK platform, which allows for diverse metadata and provides an established repository. SEEK is challenged by the assumption of single deposition events where a sample is immutable once entered in the database. This is structured for published data but presents a limitation for ongoing studies where multiple sequential events may occur in a single sample at different sites. To address this issue, we have created a modified wrapper around the SEEK platform that allows for active data management by establishing more discrete sample types that are mutable to permit the expansion of the types of metadata, allowing researchers to track additional information. The use of discrete nodes also converts assays from nodes to edges, creating a network model of the study and more accurately representing the experimental process. With these changes to SEEK, users are able to collect and organize the information that researchers need to improve reusability and reproducibility as well as make data and metadata available to the scientific community through public repositories.
Topics: Databases, Factual; Metadata; Reproducibility of Results
PubMed: 35836998
DOI: 10.7171/3fc1f5fe.db404124 -
AMIA ... Annual Symposium Proceedings.... 2017In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those...
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
Topics: Biological Ontologies; Biomedical Research; Data Accuracy; Data Analysis; Metadata; Methods
PubMed: 29854196
DOI: No ID Found