provenance - OpenMD.com Journal Search

Data Provenance in Biomedical Research: Scoping Review.

Journal of Medical Internet Research Mar 2023

Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve... (Review)

Summary PubMed Full Text PDF

Review

Authors: Marco Johns, Thierry Meurers, Felix N Wirth...

BACKGROUND

Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research.

OBJECTIVE

The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption.

METHODS

Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures.

RESULTS

We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV.

CONCLUSIONS

The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.

Topics: Humans; Biomedical Research; Metadata; PubMed; Reproducibility of Results; Software

PubMed: 36972116
DOI: 10.2196/42289

Data Provenance in Healthcare: Approaches, Challenges, and Future Directions.

Sensors (Basel, Switzerland) Jul 2023

Data provenance means recording data origins and the history of data generation and processing. In healthcare, data provenance is one of the essential processes that... (Review)

Summary PubMed Full Text PDF

Review

Authors: Mansoor Ahmed, Amil Rohani Dar, Markus Helfert...

Data provenance means recording data origins and the history of data generation and processing. In healthcare, data provenance is one of the essential processes that make it possible to track the sources and reasons behind any problem with a user's data. With the emergence of the General Data Protection Regulation (GDPR), data provenance in healthcare systems should be implemented to give users more control over data. This SLR studies the impacts of data provenance in healthcare and GDPR-compliance-based data provenance through a systematic review of peer-reviewed articles. The SLR discusses the technologies used to achieve data provenance and various methodologies to achieve data provenance. We then explore different technologies that are applied in the healthcare domain and how they achieve data provenance. In the end, we have identified key research gaps followed by future research directions.

Topics: Biomedical Research; Delivery of Health Care

PubMed: 37514788
DOI: 10.3390/s23146495

Data Provenance Standards and Recommendations for FAIR Data.

Studies in Health Technology and... Jun 2020

This article reviews the main characteristics of five widely used data provenance models and recommendations. We suggest a set of six provenance properties that should... (Review)

Summary PubMed

Review

Authors: Malte-Levin Jauer, Thomas M Deserno

This article reviews the main characteristics of five widely used data provenance models and recommendations. We suggest a set of six provenance properties that should be satisfied by any provenance model as a basis for further implementation of provenance mechanisms, supporting the findable, accessible, interoperable and reusable (FAIR) principles for both, research and health data.

Topics: Databases, Factual

PubMed: 32570597
DOI: 10.3233/SHTI200380

Provenance Data Management in Health Information Systems: A Systematic Literature Review.

Journal of Personalized Medicine Jun 2023

This article aims to perform a Systematic Literature Review (SLR) to better understand the structures of different methods, techniques, models, methodologies, and... (Review)

Summary PubMed Full Text PDF

Review

Authors: Márcio José Sembay, Douglas Dyllon Jeronimo de Macedo, Laércio Pioli Júnior...

AIMS

This article aims to perform a Systematic Literature Review (SLR) to better understand the structures of different methods, techniques, models, methodologies, and technologies related to provenance data management in health information systems (HISs). The SLR developed here seeks to answer the questions that contribute to describing the results.

METHOD

An SLR was performed on six databases using a search string. The backward and forward snowballing technique was also used. Eligible studies were all articles in English that presented on the use of different methods, techniques, models, methodologies, and technologies related to provenance data management in HISs. The quality of the included articles was assessed to obtain a better connection to the topic studied.

RESULTS

Of the 239 studies retrieved, 14 met the inclusion criteria described in this SLR. In order to complement the retrieved studies, 3 studies were included using the backward and forward snowballing technique, totaling 17 studies dedicated to the construction of this research. Most of the selected studies were published as conference papers, which is common when involving computer science in HISs. There was a more frequent use of data provenance models from the PROV family in different HISs combined with different technologies, among which blockchain and middleware stand out. Despite the advantages found, the lack of technological structure, data interoperability problems, and the technical unpreparedness of working professionals are still challenges encountered in the management of provenance data in HISs.

CONCLUSION

It was possible to conclude the existence of different methods, techniques, models, and combined technologies, which are presented in the proposal of a taxonomy that provides researchers with a new understanding about the management of provenance data in HISs.

PubMed: 37373980
DOI: 10.3390/jpm13060991

ProvOER model: A provenance model for Open Educational Resources.

Heliyon Feb 2023

Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In... (Review)

Summary PubMed Full Text PDF

Review

Authors: Renata Ribeiro Dos Santos, Marilde Terezinha Prado Santos, Ricardo Rodrigues Ciferri...

Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In this sense, OER strongly contributes to generating and sharing educational knowledge. Due to the possibility of creating a new OER through the revision and remix activities, the original OER and the transformation process should be adequately identified. This way, the user of the OER has enough information about the history of the resource and, thus, can use it with confidence and security. In this context, determining data provenance, which describes the history of a data from its origin to its current state, becomes very relevant. For OER, there are examples of metadata standards and digital repositories that help to obtain the data provenance. However, the information collected is insufficient to identify the entire history of the provenance of OER. This article proposes a Provenance Model for OER called the ProvOER Model, which allows the documentation and identification of the provenance of OER. For this purpose, a minimum set of metadata was defined that reflects the OER intrinsic properties and the activities that created a new OER. The experiments showed that the ProvOER Model produced a suitable representation of the provenance of OER. In addition, the ProvOER Model allowed identifying the original OER used in a revise or remix activity and the continuous stretch used to create a new resource.

PubMed: 36755614
DOI: 10.1016/j.heliyon.2023.e13311

Decentralised provenance for healthcare data.

International Journal of Medical... Sep 2020

The creation and exchange of patients' Electronic Healthcare Records have developed significantly in the last decade. Patients' records are however distributed in data...

Summary PubMed

Authors: Andrea Margheri, Massimiliano Masi, Abdallah Miladi...

OBJECTIVE

The creation and exchange of patients' Electronic Healthcare Records have developed significantly in the last decade. Patients' records are however distributed in data silos across multiple healthcare facilities, posing technical and clinical challenges that may endanger patients' safety. Current healthcare sharing systems ensure interoperability of patients' records across facilities, but they have limits in presenting doctors with the clinical context of the data in the records. We design and implement a platform for managing provenance tracking of Electronic Healthcare Records based on blockchain technology, compliant with the latest healthcare standards and following the patient-informed consent preferences.

METHODS

The platform leverages two pillars: the use of international standards such as Integrating the Healthcare Enterprise (IHE), Health Level Seven International (HL7) and Fast Healthcare Interoperability Resources (FHIR) to achieve interoperability, and the use of a provenance creation process that by-design, avoids personal data storage within the blockchain. The platform consists of: (1) a smart contract implemented within the Hyperledger Fabric blockchain that manages provenance according to the W3C PROV for medical document in standardised formats (e.g. a CDA document, a FHIR resource, a DICOM study, etc.); (2) a Java Proxy that intercepts all the document submissions and retrievals for which provenance shall be evaluated; (3) a service used to retrieve the PROV document.

RESULTS

We integrated our decentralised platform with the SpiritEHR engine, an enterprise-grade healthcare system, and we stored and retrieved the available documents in the Mandel's sample CDA repository, which contained no protected health information. Using a cloud-based blockchain solution, we observed that the overhead added to the typical processing time of reading and writing medical data is in the order of milliseconds. Moreover, the integration of the Proxy at the level of exchanged messages in EHR systems allows transparent usage of provenance data in multiple health computing domains such as decision making, data reconciliation, and patient consent auditing.

CONCLUSIONS

By using international healthcare standards and a cloud-based blockchain deployment, we delivered a solution that can manage provenance of patients' records via transparent integration within the routine operations on healthcare data.

Topics: Delivery of Health Care; Electronic Health Records; Health Facilities; Health Level Seven; Humans; Information Storage and Retrieval

PubMed: 32540775
DOI: 10.1016/j.ijmedinf.2020.104197

Lightweight Distributed Provenance Model for Complex Real-world Environments.

Scientific Data Aug 2022

Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each...

Summary PubMed Full Text PDF

Authors: Rudolf Wittner, Cecilia Mascia, Matej Gallo...

Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each organization can document only parts of the objects life cycle. As a result, interconnection of distributed provenance parts forms distributed provenance chains. Dependant on the actual provenance content, complete provenance chains can provide traceability and contribute to reproducibility and FAIRness of research objects. In this paper, we define a lightweight provenance model based on W3C PROV that enables generation of distributed provenance chains in complex, multi-organizational environments. The application of the model is demonstrated with a use case spanning several steps of a real-world research pipeline - starting with the acquisition of a specimen, its processing and storage, histological examination, and the generation/collection of associated data (images, annotations, clinical data), ending with training an AI model for the detection of tumor in the images. The proposed model has become an open conceptual foundation of the currently developed ISO 23494 standard on provenance for biotechnology domain.

PubMed: 35977957
DOI: 10.1038/s41597-022-01537-6

A Provenance Task Abstraction Framework.

IEEE Computer Graphics and Applications 2019

Visual analytics tools integrate provenance recording to externalize analytic processes or user insights. Provenance can be captured on varying levels of detail, and in...

Summary PubMed

Authors: Christian Bors, John Wenskovitch, Michelle Dowling...

Visual analytics tools integrate provenance recording to externalize analytic processes or user insights. Provenance can be captured on varying levels of detail, and in turn activities can be characterized from different granularities. However, current approaches do not support inferring activities that can only be characterized across multiple levels of provenance. We propose a task abstraction framework that consists of a three stage approach, composed of 1) initializing a provenance task hierarchy, 2) parsing the provenance hierarchy by using an abstraction mapping mechanism, and 3) leveraging the task hierarchy in an analytical tool. Furthermore, we identify implications to accommodate iterative refinement, context, variability, and uncertainty during all stages of the framework. We describe a use case which exemplifies our abstraction framework, demonstrating how context can influence the provenance hierarchy to support analysis. The article concludes with an agenda, raising and discussing challenges that need to be considered for successfully implementing such a framework.

PubMed: 31603814
DOI: 10.1109/MCG.2019.2945720

The End-to-End Provenance Project.

Patterns (New York, N.Y.) May 2020

Data provenance is a machine-readable summary of the collection and computational history of a dataset. Data provenance confers or adds value to a dataset, helps...

Summary PubMed Full Text PDF

Authors: Aaron M Ellison, Emery R Boose, Barbara S Lerner...

Data provenance is a machine-readable summary of the collection and computational history of a dataset. Data provenance confers or adds value to a dataset, helps reproduce computational analyses, or validates scientific conclusions. The people of the End-to-End Provenance Project are a community of professionals who have developed software tools to collect and use data provenance.

PubMed: 33205093
DOI: 10.1016/j.patter.2020.100016

Provenance and Dynamic Consents for the Management of Medical Data.

Studies in Health Technology and... Nov 2022

Medical data describe patient health information, both in healthy and disease conditions. In any case, health institutions need to ask for patient consent in order to...

Summary PubMed

Authors: Jaime Delgado, Silvia Llorente

Medical data describe patient health information, both in healthy and disease conditions. In any case, health institutions need to ask for patient consent in order to provide their services. Patients usually give consent on a one-time basis, for a specific usage. Afterwards, if medical data usage is research, original patient consent does not apply and further consents should be required. On the other hand, provenance of medical data to verify the origin of health procedures is desirable, as digital health is increasing. We propose HIPAMS modular architecture to provide both provenance and dynamic consents for medical data as described in this paper.

Topics: Humans; Informed Consent

PubMed: 36325859
DOI: 10.3233/SHTI220978