-
Studies in Health Technology and... May 2022The distributed nature of modern research emphasizes the importance of collecting and sharing the history of digital and physical material, to improve the...
The distributed nature of modern research emphasizes the importance of collecting and sharing the history of digital and physical material, to improve the reproducibility of experiments and the quality and reusability of results. Yet, the application of the current methodologies to record provenance information is largely scattered, leading to silos of provenance information at different granularities. To tackle this fragmentation, we developed the Common Provenance Model, a set of guidelines for the generation of interoperable provenance information, and to allow the reconstruction and the navigation of a continuous provenance chain. This work presents the first version of the model, available online, based on the W3C PROV Data Model and the Provenance Composition pattern.
Topics: Biological Science Disciplines; Reproducibility of Results
PubMed: 35612111
DOI: 10.3233/SHTI220489 -
Patterns (New York, N.Y.) May 2020Data provenance is a machine-readable summary of the collection and computational history of a dataset. Data provenance confers or adds value to a dataset, helps...
Data provenance is a machine-readable summary of the collection and computational history of a dataset. Data provenance confers or adds value to a dataset, helps reproduce computational analyses, or validates scientific conclusions. The people of the End-to-End Provenance Project are a community of professionals who have developed software tools to collect and use data provenance.
PubMed: 33205093
DOI: 10.1016/j.patter.2020.100016 -
The FEBS Journal Jan 2022The FEBS Journal, a leading multidisciplinary journal in the life sciences, publishes high-impact papers on diverse topics relating to molecular mechanisms underpinning...
The FEBS Journal, a leading multidisciplinary journal in the life sciences, publishes high-impact papers on diverse topics relating to molecular mechanisms underpinning biological processes. Here, Editor-in-Chief Seamus Martin discusses the critical importance of data provenance and data integrity to the scientific method and discusses some of the highlights from 2021 at The FEBS Journal.
PubMed: 34982855
DOI: 10.1111/febs.16332 -
Studies in Health Technology and... 2018Healthcare directories are vital for interoperability among healthcare providers, researchers and patients. Past efforts at directory services have not provided the... (Review)
Review
Healthcare directories are vital for interoperability among healthcare providers, researchers and patients. Past efforts at directory services have not provided the tools to allow integration of the diverse data sources. Many are overly strict, incompatible with legacy databases, and do not provide Data Provenance. A more architecture-independent system is needed to enable secure, GDPR-compatible (8) service discovery across organizational boundaries. We review our development of a portable Data Provenance Toolkit supporting provenance within Health Information Exchange (HIE) systems. The Toolkit has been integrated with client software and successfully leveraged in clinical data integration. The Toolkit validates provenance stored in a Blockchain or Directory record and creates provenance signatures, providing standardized provenance that moves with the data. This healthcare directory suite implements discovery of healthcare data by HIE and EHR systems via FHIR. Shortcomings of past directory efforts include the ability to map complex datasets and enabling interoperability via exchange endpoint discovery. By delivering data without dictating how it is stored we improve exchange and facilitate discovery on a multi-national level through open source, fully interoperable tools. With the development of Data Provenance resources we enhance exchange and improve security and usability throughout the health data continuum.
Topics: Computer Systems; Databases, Factual; Delivery of Health Care; Electronic Health Records; Humans; Software; Systems Integration
PubMed: 29866978
DOI: No ID Found -
JMIR Research Protocols Nov 2021Provenance supports the understanding of data genesis, and it is a key factor to ensure the trustworthiness of digital objects containing (sensitive) scientific data....
BACKGROUND
Provenance supports the understanding of data genesis, and it is a key factor to ensure the trustworthiness of digital objects containing (sensitive) scientific data. Provenance information contributes to a better understanding of scientific results and fosters collaboration on existing data as well as data sharing. This encompasses defining comprehensive concepts and standards for transparency and traceability, reproducibility, validity, and quality assurance during clinical and scientific data workflows and research.
OBJECTIVE
The aim of this scoping review is to investigate existing evidence regarding approaches and criteria for provenance tracking as well as disclosing current knowledge gaps in the biomedical domain. This review covers modeling aspects as well as metadata frameworks for meaningful and usable provenance information during creation, collection, and processing of (sensitive) scientific biomedical data. This review also covers the examination of quality aspects of provenance criteria.
METHODS
This scoping review will follow the methodological framework by Arksey and O'Malley. Relevant publications will be obtained by querying PubMed and Web of Science. All papers in English language will be included, published between January 1, 2006 and March 23, 2021. Data retrieval will be accompanied by manual search for grey literature. Potential publications will then be exported into a reference management software, and duplicates will be removed. Afterwards, the obtained set of papers will be transferred into a systematic review management tool. All publications will be screened, extracted, and analyzed: title and abstract screening will be carried out by 4 independent reviewers. Majority vote is required for consent to eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading will be performed independently by 2 reviewers and in the last step, key information will be extracted on a pretested template. If agreement cannot be reached, the conflict will be resolved by a domain expert. Charted data will be analyzed by categorizing and summarizing the individual data items based on the research questions. Tabular or graphical overviews will be given, if applicable.
RESULTS
The reporting follows the extension of the Preferred Reporting Items for Systematic reviews and Meta-Analyses statements for Scoping Reviews. Electronic database searches in PubMed and Web of Science resulted in 469 matches after deduplication. As of September 2021, the scoping review is in the full-text screening stage. The data extraction using the pretested charting template will follow the full-text screening stage. We expect the scoping review report to be completed by February 2022.
CONCLUSIONS
Information about the origin of healthcare data has a major impact on the quality and the reusability of scientific results as well as follow-up activities. This protocol outlines plans for a scoping review that will provide information about current approaches, challenges, or knowledge gaps with provenance tracking in biomedical sciences.
INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)
DERR1-10.2196/31750.
PubMed: 34813494
DOI: 10.2196/31750 -
Critical Reviews in Analytical Chemistry Sep 2023Soil is one type of Earth material demonstrating a wide range of physical, chemical, and biological properties. As the compositional profile of soil is a product of... (Review)
Review
Soil is one type of Earth material demonstrating a wide range of physical, chemical, and biological properties. As the compositional profile of soil is a product of interaction between numerous abiotic and biotic components, it tends to be unique by its geographic origin. Hence, soil is paramount for predicting source or origin in forensic provenance and intelligence, food provenance, biosecurity, and archaeology. In the context of forensic investigation, source tracing of soil could be executed by a comparison or provenance analysis. Soil compositional fingerprints acquired using analytical methods must be carefully interpreted suitable mathematical and statistical tools since multiple sources can contribute to the variability of soil other than its provenance. This article reviews recent trends in soil sampling and data interpretation strategies proposed for source tracing of soil evidence. Performances of soil provenance indicators are also described. Then, perspectives on possible research directions guiding forensic soil provenance are proposed. This timely critical review reveals the essential idea and gap in forensic soil provenance for stimulating the development of more efficient and effective provenance strategies.
PubMed: 37672265
DOI: 10.1080/10408347.2023.2253473 -
Patterns (New York, N.Y.) Aug 2020Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the...
Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution.
PubMed: 33205122
DOI: 10.1016/j.patter.2020.100073 -
Frontiers in Genetics 2022Fair and equitable benefit sharing of genetic resources is an expectation of the Nagoya Protocol. Although the Nagoya Protocol does not yet formally apply to Digital...
Fair and equitable benefit sharing of genetic resources is an expectation of the Nagoya Protocol. Although the Nagoya Protocol does not yet formally apply to Digital Sequence Information ("DSI"), discussions are currently underway regarding to include such data through ongoing Convention on Biological Diversity ("CBD") negotiations. While Indigenous Peoples and Local Communities ("IPLC") expect the value generated from genomic data to be subject to benefit sharing arrangements, a range of views are currently being expressed by Nation States, IPLC and other stakeholders. The use of DSI gives rise to unique considerations, creating a gray area as to how it should be considered under the Nagoya Protocol's Access and Benefit Sharing ("ABS") principles. One way for benefit sharing to be enhanced is through the connection of data to proper provenance information. A significant development is the use of digital labeling systems to ensure that the origin of samples is appropriately disclosed. The Traditional Knowledge and Biocultural Labels initiative offers a practical option for data provided to genomic databases. In particular, the BioCultural Labels ("BC Labels") are a mechanism for Indigenous communities to identify and maintain provenance, origin and authority over biocultural material and data generated from Indigenous land and waters held in research, cultural institutions and data repositories. This form of cultural metadata adds value to the research endeavor and the creation of Indigenous fields within databases adds transparency and accountability to the research environment.
PubMed: 36212139
DOI: 10.3389/fgene.2022.1014044 -
Journal of Grid Computing 2022In scientific collaboration, data sharing, the exchange of ideas and results are essential to knowledge construction and the development of science. Hence, we must...
In scientific collaboration, data sharing, the exchange of ideas and results are essential to knowledge construction and the development of science. Hence, we must guarantee interoperability, privacy, traceability (reinforcing transparency), and trust. Provenance has been widely recognized for providing a history of the steps taken in scientific experiments. Consequently, we must support traceability, assisting in scientific results' reproducibility. One of the technologies that can enhance trust in collaborative scientific experimentation is blockchain. This work proposes an architecture, named BlockFlow, based on blockchain, provenance, and cloud infrastructure to bring trust and traceability in the execution of collaborative scientific experiments. The proposed architecture is implemented on Hyperledger, and a scenario about the genomic sequencing of the SARS-CoV-2 coronavirus is used to evaluate the architecture, discussing the benefits of providing traceability and trust in collaborative scientific experimentation. Furthermore, the architecture addresses the heterogeneity of shared data, facilitating interpretation by geographically distributed researchers and analysis of such data. Through a blockchain-based architecture that provides support on provenance and blockchain, we can enhance data sharing, traceability, and trust in collaborative scientific experiments.
PubMed: 36246518
DOI: 10.1007/s10723-022-09626-x -
The Journal of Consumer Affairs 2022This article advances the riveting discussion on how this special issue contributes to the consumer well-being literature. Specifically, this article endeavors to...
This article advances the riveting discussion on how this special issue contributes to the consumer well-being literature. Specifically, this article endeavors to present an eclectic account of how the pandemics has had a lasting impact on the consumer well-being, its provenance and future research priorities for academics and practice. First, it briefly discusses the origin and relevance of the evolving issue of consumer well-being during pandemics. Second, it presents several directions for future research and third, it offers key insights for policymakers. It includes multiple research priorities that present vastly contrasting manifestations of consumer well-being. This article argues that future research will need to examine the drivers of consumer well-being during pandemics, the mechanisms that underlie the influence of pandemics on consumer well-being and the boundary conditions that accentuate/mitigate the influence of pandemic-induced factors.
PubMed: 35603324
DOI: 10.1111/joca.12445