-
Journal of Medical Internet Research Mar 2023Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve... (Review)
Review
BACKGROUND
Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research.
OBJECTIVE
The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption.
METHODS
Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures.
RESULTS
We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV.
CONCLUSIONS
The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.
Topics: Humans; Biomedical Research; Metadata; PubMed; Reproducibility of Results; Software
PubMed: 36972116
DOI: 10.2196/42289 -
Sensors (Basel, Switzerland) Jul 2023Data provenance means recording data origins and the history of data generation and processing. In healthcare, data provenance is one of the essential processes that... (Review)
Review
Data provenance means recording data origins and the history of data generation and processing. In healthcare, data provenance is one of the essential processes that make it possible to track the sources and reasons behind any problem with a user's data. With the emergence of the General Data Protection Regulation (GDPR), data provenance in healthcare systems should be implemented to give users more control over data. This SLR studies the impacts of data provenance in healthcare and GDPR-compliance-based data provenance through a systematic review of peer-reviewed articles. The SLR discusses the technologies used to achieve data provenance and various methodologies to achieve data provenance. We then explore different technologies that are applied in the healthcare domain and how they achieve data provenance. In the end, we have identified key research gaps followed by future research directions.
Topics: Biomedical Research; Delivery of Health Care
PubMed: 37514788
DOI: 10.3390/s23146495 -
Studies in Health Technology and... Jun 2020This article reviews the main characteristics of five widely used data provenance models and recommendations. We suggest a set of six provenance properties that should... (Review)
Review
This article reviews the main characteristics of five widely used data provenance models and recommendations. We suggest a set of six provenance properties that should be satisfied by any provenance model as a basis for further implementation of provenance mechanisms, supporting the findable, accessible, interoperable and reusable (FAIR) principles for both, research and health data.
Topics: Databases, Factual
PubMed: 32570597
DOI: 10.3233/SHTI200380 -
Journal of Personalized Medicine Jun 2023This article aims to perform a Systematic Literature Review (SLR) to better understand the structures of different methods, techniques, models, methodologies, and... (Review)
Review
AIMS
This article aims to perform a Systematic Literature Review (SLR) to better understand the structures of different methods, techniques, models, methodologies, and technologies related to provenance data management in health information systems (HISs). The SLR developed here seeks to answer the questions that contribute to describing the results.
METHOD
An SLR was performed on six databases using a search string. The backward and forward snowballing technique was also used. Eligible studies were all articles in English that presented on the use of different methods, techniques, models, methodologies, and technologies related to provenance data management in HISs. The quality of the included articles was assessed to obtain a better connection to the topic studied.
RESULTS
Of the 239 studies retrieved, 14 met the inclusion criteria described in this SLR. In order to complement the retrieved studies, 3 studies were included using the backward and forward snowballing technique, totaling 17 studies dedicated to the construction of this research. Most of the selected studies were published as conference papers, which is common when involving computer science in HISs. There was a more frequent use of data provenance models from the PROV family in different HISs combined with different technologies, among which blockchain and middleware stand out. Despite the advantages found, the lack of technological structure, data interoperability problems, and the technical unpreparedness of working professionals are still challenges encountered in the management of provenance data in HISs.
CONCLUSION
It was possible to conclude the existence of different methods, techniques, models, and combined technologies, which are presented in the proposal of a taxonomy that provides researchers with a new understanding about the management of provenance data in HISs.
PubMed: 37373980
DOI: 10.3390/jpm13060991 -
Heliyon Feb 2023Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In... (Review)
Review
Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In this sense, OER strongly contributes to generating and sharing educational knowledge. Due to the possibility of creating a new OER through the revision and remix activities, the original OER and the transformation process should be adequately identified. This way, the user of the OER has enough information about the history of the resource and, thus, can use it with confidence and security. In this context, determining data provenance, which describes the history of a data from its origin to its current state, becomes very relevant. For OER, there are examples of metadata standards and digital repositories that help to obtain the data provenance. However, the information collected is insufficient to identify the entire history of the provenance of OER. This article proposes a Provenance Model for OER called the ProvOER Model, which allows the documentation and identification of the provenance of OER. For this purpose, a minimum set of metadata was defined that reflects the OER intrinsic properties and the activities that created a new OER. The experiments showed that the ProvOER Model produced a suitable representation of the provenance of OER. In addition, the ProvOER Model allowed identifying the original OER used in a revise or remix activity and the continuous stretch used to create a new resource.
PubMed: 36755614
DOI: 10.1016/j.heliyon.2023.e13311 -
International Journal of Medical... Sep 2020The creation and exchange of patients' Electronic Healthcare Records have developed significantly in the last decade. Patients' records are however distributed in data...
OBJECTIVE
The creation and exchange of patients' Electronic Healthcare Records have developed significantly in the last decade. Patients' records are however distributed in data silos across multiple healthcare facilities, posing technical and clinical challenges that may endanger patients' safety. Current healthcare sharing systems ensure interoperability of patients' records across facilities, but they have limits in presenting doctors with the clinical context of the data in the records. We design and implement a platform for managing provenance tracking of Electronic Healthcare Records based on blockchain technology, compliant with the latest healthcare standards and following the patient-informed consent preferences.
METHODS
The platform leverages two pillars: the use of international standards such as Integrating the Healthcare Enterprise (IHE), Health Level Seven International (HL7) and Fast Healthcare Interoperability Resources (FHIR) to achieve interoperability, and the use of a provenance creation process that by-design, avoids personal data storage within the blockchain. The platform consists of: (1) a smart contract implemented within the Hyperledger Fabric blockchain that manages provenance according to the W3C PROV for medical document in standardised formats (e.g. a CDA document, a FHIR resource, a DICOM study, etc.); (2) a Java Proxy that intercepts all the document submissions and retrievals for which provenance shall be evaluated; (3) a service used to retrieve the PROV document.
RESULTS
We integrated our decentralised platform with the SpiritEHR engine, an enterprise-grade healthcare system, and we stored and retrieved the available documents in the Mandel's sample CDA repository, which contained no protected health information. Using a cloud-based blockchain solution, we observed that the overhead added to the typical processing time of reading and writing medical data is in the order of milliseconds. Moreover, the integration of the Proxy at the level of exchanged messages in EHR systems allows transparent usage of provenance data in multiple health computing domains such as decision making, data reconciliation, and patient consent auditing.
CONCLUSIONS
By using international healthcare standards and a cloud-based blockchain deployment, we delivered a solution that can manage provenance of patients' records via transparent integration within the routine operations on healthcare data.
Topics: Delivery of Health Care; Electronic Health Records; Health Facilities; Health Level Seven; Humans; Information Storage and Retrieval
PubMed: 32540775
DOI: 10.1016/j.ijmedinf.2020.104197 -
Scientific Data Aug 2022Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each...
Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each organization can document only parts of the objects life cycle. As a result, interconnection of distributed provenance parts forms distributed provenance chains. Dependant on the actual provenance content, complete provenance chains can provide traceability and contribute to reproducibility and FAIRness of research objects. In this paper, we define a lightweight provenance model based on W3C PROV that enables generation of distributed provenance chains in complex, multi-organizational environments. The application of the model is demonstrated with a use case spanning several steps of a real-world research pipeline - starting with the acquisition of a specimen, its processing and storage, histological examination, and the generation/collection of associated data (images, annotations, clinical data), ending with training an AI model for the detection of tumor in the images. The proposed model has become an open conceptual foundation of the currently developed ISO 23494 standard on provenance for biotechnology domain.
PubMed: 35977957
DOI: 10.1038/s41597-022-01537-6 -
IEEE Computer Graphics and Applications 2019Visual analytics tools integrate provenance recording to externalize analytic processes or user insights. Provenance can be captured on varying levels of detail, and in...
Visual analytics tools integrate provenance recording to externalize analytic processes or user insights. Provenance can be captured on varying levels of detail, and in turn activities can be characterized from different granularities. However, current approaches do not support inferring activities that can only be characterized across multiple levels of provenance. We propose a task abstraction framework that consists of a three stage approach, composed of 1) initializing a provenance task hierarchy, 2) parsing the provenance hierarchy by using an abstraction mapping mechanism, and 3) leveraging the task hierarchy in an analytical tool. Furthermore, we identify implications to accommodate iterative refinement, context, variability, and uncertainty during all stages of the framework. We describe a use case which exemplifies our abstraction framework, demonstrating how context can influence the provenance hierarchy to support analysis. The article concludes with an agenda, raising and discussing challenges that need to be considered for successfully implementing such a framework.
PubMed: 31603814
DOI: 10.1109/MCG.2019.2945720 -
Patterns (New York, N.Y.) May 2020Data provenance is a machine-readable summary of the collection and computational history of a dataset. Data provenance confers or adds value to a dataset, helps...
Data provenance is a machine-readable summary of the collection and computational history of a dataset. Data provenance confers or adds value to a dataset, helps reproduce computational analyses, or validates scientific conclusions. The people of the End-to-End Provenance Project are a community of professionals who have developed software tools to collect and use data provenance.
PubMed: 33205093
DOI: 10.1016/j.patter.2020.100016 -
Studies in Health Technology and... Nov 2022Medical data describe patient health information, both in healthy and disease conditions. In any case, health institutions need to ask for patient consent in order to...
Medical data describe patient health information, both in healthy and disease conditions. In any case, health institutions need to ask for patient consent in order to provide their services. Patients usually give consent on a one-time basis, for a specific usage. Afterwards, if medical data usage is research, original patient consent does not apply and further consents should be required. On the other hand, provenance of medical data to verify the origin of health procedures is desirable, as digital health is increasing. We propose HIPAMS modular architecture to provide both provenance and dynamic consents for medical data as described in this paper.
Topics: Humans; Informed Consent
PubMed: 36325859
DOI: 10.3233/SHTI220978