-
Studies in History and Philosophy of... Dec 2022Otto Neurath's role in the so-called protocol sentence debates is typically framed as primarily an epistemologically radical rejection of empiricist foundationalism....
Otto Neurath's role in the so-called protocol sentence debates is typically framed as primarily an epistemologically radical rejection of empiricist foundationalism. However, less well recognized is that from this debate, Neurath emerges with a conception of protocol statements that functions as a radical reconceptualization of evidence. Whilst recognizably still empiricist, Neurath's conception of evidence breaks with many of the key assumptions that predominate within the empiricist tradition. In rejecting the assumption of an epistemologically privileged relationship between an observer and their own observation reports, Neurath shifts the emphasis onto the importance of contextualizing information that guarantees the stability of observation reports. In so doing, he not only provides a conception of evidence better suited to the actual role of evidence in science, but also anticipates contemporary discussion of the importance of evidential metadata.
Topics: Metadata
PubMed: 36272271
DOI: 10.1016/j.shpsa.2022.09.007 -
Bioinformatics (Oxford, England) Sep 2022Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered...
MOTIVATION
Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered sequences, and taxonomic assignments to sequences. High-quality community shareable eDNA resources rely heavily on comprehensive metadata documentation that captures the complex workflows covering field sampling, molecular biology lab work, and bioinformatic analyses. There are limited sources that provide documentation of database development on comprehensive metadata for eDNA and these workflows and no open-source software.
RESULTS
We present medna-metadata, an open-source, modular system that aligns with Findable, Accessible, Interoperable, and Reusable guiding principles that support scholarly data reuse and the database and application development of a standardized metadata collection structure that encapsulates critical aspects of field data collection, wet lab processing, and bioinformatic analysis. Medna-metadata is showcased with metabarcoding data from the Gulf of Maine (Polinski et al., 2019).
AVAILABILITY AND IMPLEMENTATION
The source code of the medna-metadata web application is hosted on GitHub (https://github.com/Maine-eDNA/medna-metadata). Medna-metadata is a docker-compose installable package. Documentation can be found at https://medna-metadata.readthedocs.io/en/latest/?badge=latest. The application is implemented in Python, PostgreSQL and PostGIS, RabbitMQ, and NGINX, with all major browsers supported. A demo can be found at https://demo.metadata.maine-edna.org/.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metadata; DNA, Environmental; Data Management; Software; Databases, Factual
PubMed: 35960154
DOI: 10.1093/bioinformatics/btac556 -
Scientific Data Oct 2022Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To...
Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To overcome the hurdle of disparate data sources and repositories with heterogeneous data formats a metadata crosswalk was initiated, based on existing standards. FAIR Principles were included, as well as data format specifications. The metadata crosswalk is the foundation of data provision between a Medical Data Integration Center (MeDIC) and researchers, providing a selection of metadata information for research design and requests. Based on the crosswalk, metadata items were prioritized and categorized to demonstrate that not one single predefined standard meets all requirements of a MeDIC and only a maximum data set of metadata is suitable for use. The development of a convergence format including the maximum data set is the anticipated solution for an automated transformation of metadata in a MeDIC.
Topics: Metadata; Information Storage and Retrieval; Semantics; Reference Standards
PubMed: 36307424
DOI: 10.1038/s41597-022-01792-7 -
BMC Medical Informatics and Decision... Mar 2019Heterogeneous healthcare instance data can hardly be integrated without harmonizing its schema-level metadata. Many medical research projects and organizations use...
BACKGROUND
Heterogeneous healthcare instance data can hardly be integrated without harmonizing its schema-level metadata. Many medical research projects and organizations use metadata repositories to edit, store and reuse data elements. However, existing metadata repositories differ regarding software implementation and have shortcomings when it comes to exchanging metadata. This work aims to define a uniform interface with a technical interlingua between the different MDR implementations in order to enable and facilitate the exchange of metadata, to query over distributed systems and to promote cooperation. To design a unified interface for multiple existing MDRs, a standardized data model must be agreed on. The ISO 11179 is an international standard for the representation of metadata, and since most MDR systems claim to be at least partially compliant, it is suitable for defining an interface thereupon. Therefore, each repository must be able to define which parts can be served and the interface must be able to handle highly linked data. GraphQL is a data access layer and defines query techniques designed to navigate easily through complex data structures.
RESULTS
We propose QLMDR, an ISO 11179-3 compatible GraphQL query language. The GraphQL schema for QLMDR is derived from the ISO 11179 standard and defines objects, fields, queries and mutation types. Entry points within the schema define the path through the graph to enable search functionalities, but also the exchange is promoted by mutation types, which allow creating, updating and deleting of metadata. QLMDR is the foundation for the uniform interface, which is implemented in a modern web-based interface prototype.
CONCLUSIONS
We have introduced a uniform query interface for metadata repositories combining the ISO 11179 standard for metadata repositories and the GraphQL query language. A reference implementation based on the existing Samply.MDR was implemented. The interface facilitates access to metadata, enables better interaction with metadata as well as a basis for connecting existing repositories. We invite other ISO 11179-based metadata repositories to take this approach into account.
Topics: Health Information Interoperability; Humans; Medical Informatics Applications; Metadata
PubMed: 30885183
DOI: 10.1186/s12911-019-0794-z -
Naunyn-Schmiedeberg's Archives of... Jun 2024An increasing fake paper problem is a cause for concern in the scientific community. These papers look scientific but contain manipulated data or are completely...
An increasing fake paper problem is a cause for concern in the scientific community. These papers look scientific but contain manipulated data or are completely fictitious. So-called paper mills produce fake papers on a large scale and publish them in the name of people who buy authorship. The aim of this study was to learn more about the characteristics of fake papers at the metadata level. We also investigated whether some of these characteristics could be used to detect fake papers. For that purpose, we examined metadata of 12 fake papers that were retracted by Naunyn-Schmiedeberg's Archives of Pharmacology (NSAP) in recent years. We also compared many of these metadata with those of a reference group of 733 articles published by NSAP. It turned out that in many characteristics the fake papers we examined did not differ substantially from the other articles. It was only noticeable that the fake papers came almost exclusively from a certain country, used non-institutional email addresses more often than average, and referenced dubious literature significantly more often. However, these three features are only of limited use in identifying fake papers. We were also able to show that fake papers not only contaminate the scientific record while they are unidentified but also continue to do so even after retraction. Our results indicate that fake papers are well made and resemble honest papers even at the metadata level. Because they contaminate the scientific record in the long term and this cannot be fully contained even by their retraction, it is particularly important to identify them before publication. Further research on the topic of fake papers is therefore urgently needed.
Topics: Metadata; Pharmacology; Periodicals as Topic; Scientific Misconduct; Humans; Retraction of Publication as Topic; Authorship
PubMed: 37994948
DOI: 10.1007/s00210-023-02850-6 -
Nucleic Acids Research Jul 2022Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into...
Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.
Topics: Transcriptome; Metadata; Search Engine
PubMed: 35524556
DOI: 10.1093/nar/gkac328 -
Nature Methods Dec 2021
Topics: Humans; Image Processing, Computer-Assisted; Medical Informatics; Metadata; Multimodal Imaging; Software
PubMed: 34635849
DOI: 10.1038/s41592-021-01288-z -
Methods of Information in Medicine May 2020The clinical research data lifecycle, from data collection to analysis results, functions in silos that restrict traceability. Traceability is a requirement for...
BACKGROUND
The clinical research data lifecycle, from data collection to analysis results, functions in silos that restrict traceability. Traceability is a requirement for regulated clinical research studies and an important attribute of nonregulated studies. Current clinical research software tools provide limited metadata traceability capabilities and are unable to query variables across all phases of the data lifecycle.
OBJECTIVES
To develop a metadata traceability framework that can help query and visualize traceability metadata, identify traceability gaps, and validate metadata traceability to improve data lineage and reproducibility within clinical research studies.
METHODS
This research follows the design science research paradigm where the objective is to create and evaluate an information technology (IT) artifact that explicitly addresses an organizational problem or opportunity. The implementation and evaluation of the IT artifact demonstrate the feasibility of both the design process and the final designed product.
RESULTS
We present Trace-XML, a metadata traceability framework that extends standard clinical research metadata models and adapts graph traversal algorithms to provide clinical research study traceability queries, validation, and visualization. Trace-XML was evaluated using analytical and qualitative methods. The analytical methods show that Trace-XML accurately and completely assesses metadata traceability within a clinical research study. A qualitative study used thematic analysis of interview data to show that Trace-XML adds utility to a researcher's ability to evaluate metadata traceability within a study.
CONCLUSION
Trace-XML benefits include features that (1) identify traceability gaps in clinical study metadata, (2) validate metadata traceability within a clinical study, and (3) query and visualize traceability metadata. The key themes that emerged from the qualitative evaluation affirm that Trace-XML adds utility to the task of creating and assessing end-to-end clinical research study traceability.
Topics: Algorithms; Biomedical Research; Data Accuracy; Data Collection; Humans; Information Dissemination; Metadata; Reproducibility of Results; Software
PubMed: 32894879
DOI: 10.1055/s-0040-1714393 -
PloS One 2023Bibliographic references containing citation information of academic literature play an important role as a medium connecting earlier and recent studies. As references...
Bibliographic references containing citation information of academic literature play an important role as a medium connecting earlier and recent studies. As references contain machine-readable metadata such as author name, title, or publication year, they have been widely used in the field of citation information services including search services for scholarly information and research trend analysis. Many institutions around the world manually extract and continuously accumulate reference metadata to provide various scholarly services. However, manually collection of reference metadata every year continues to be a burden because of the associated cost and time consumption. With the accumulation of a large volume of academic literature, several tools, including GROBID and CERMINE, that automatically extract reference metadata have been released. However, these tools have some limitations. For example, they are only applicable to references written in English, the types of extractable metadata are limited for each tool, and the performance of the tools is insufficient to replace the manual extraction of reference metadata. Therefore, in this study, we focused on constructing a high-quality corpus to automatically extract metadata from multilingual journal article references. Using our constructed corpus, we trained and evaluated a BERT-based transfer-learning model. Furthermore, we compared the performance of the BERT-based model with that of the existing model, GROBID. Currently, our corpus contains 3,815,987 multilingual references, mainly in English and Korean, with labels for 13 different metadata types. According to our experiment, the BERT-based model trained using our corpus showed excellent performance in extracting metadata not only from journal references written in English but also in other languages, particularly Korean. This corpus is available at http://doi.org/10.23057/47.
Topics: Metadata; Writing; Multilingualism; Information Services
PubMed: 36662818
DOI: 10.1371/journal.pone.0280637 -
Studies in Health Technology and... Aug 2019The FAIR principles require the reporting of rich metadata. However, when researchers use data for secondary use from external data owners, the FAIR principles require a...
The FAIR principles require the reporting of rich metadata. However, when researchers use data for secondary use from external data owners, the FAIR principles require a different implementation as if the researchers would describe their own data. In this paper, we specify how FAIR metadata can be implemented for secondary data analyses and provide a suggestion for relevant metadata.
Topics: Metadata
PubMed: 31438187
DOI: 10.3233/SHTI190490