provenance - OpenMD.com Journal Search

Post-Mortem Dental Profile as a Powerful Tool in Animal Forensic Investigations-A Review.

Animals : An Open Access Journal From... Aug 2022

Veterinary forensics is becoming more important in our society as a result of the growing demand for investigations related to crimes against animals or investigations... (Review)

Summary PubMed Full Text PDF

Review

Authors: Joan Viciano, Sandra López-Lázaro, Carmen Tanga...

Veterinary forensics is becoming more important in our society as a result of the growing demand for investigations related to crimes against animals or investigations of criminal deaths caused by animals. A veterinarian may participate as an expert witness or may be required to give forensic assistance, by providing knowledge of the specialty to establish a complete picture of the involvement of an animal and allowing the Courts to reach a verdict. By applying diverse dental profiling techniques, not only can species, sex, age-at-death, and body size of an animal be estimated, but also data about their geographical origin (provenance) and the post-mortem interval. This review concentrates on the dental techniques that use the characteristics of teeth as a means of identification of freshly deceased and skeletonised animals. Furthermore, this highlights the information that can be extracted about the animal from the post-mortem dental profile.

PubMed: 36009628
DOI: 10.3390/ani12162038

End-to-End provenance representation for the understandability and reproducibility of scientific experiments using a semantic approach.

Journal of Biomedical Semantics Jan 2022

The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and...

Summary PubMed Full Text PDF

Authors: Sheeba Samuel, Birgitta König-Ries

BACKGROUND

The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and how results are derived has become significantly more complex with the recent explosive growth of heterogeneous research data and methods. Therefore, it is important that the provenance of results is tracked, described, and managed throughout the research lifecycle starting from the beginning of an experiment to its end to ensure reproducibility of results described in publications. However, there is a lack of interoperable representation of end-to-end provenance of scientific experiments that interlinks data, processing steps, and results from an experiment's computational and non-computational processes.

RESULTS

We present the "REPRODUCE-ME" data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web. The ontology brings together different aspects of the provenance of scientific studies by interlinking non-computational data and steps with computational data and steps to achieve understandability and reproducibility. We explain the important classes and properties of the ontology and how they are mapped to existing ontologies like PROV-O and P-Plan. The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps.

CONCLUSION

We have designed and developed an interoperable way to represent the complete path of a scientific experiment consisting of computational and non-computational steps. We have applied and evaluated our approach to a set of scientific experiments in different subject domains like computational science, biological imaging, and microscopy.

Topics: Knowledge Bases; Reproducibility of Results; Semantic Web; Semantics

PubMed: 34991705
DOI: 10.1186/s13326-021-00253-1

ProvOER model: A provenance model for Open Educational Resources.

Heliyon Feb 2023

Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In... (Review)

Summary PubMed Full Text PDF

Review

Authors: Renata Ribeiro Dos Santos, Marilde Terezinha Prado Santos, Ricardo Rodrigues Ciferri...

Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In this sense, OER strongly contributes to generating and sharing educational knowledge. Due to the possibility of creating a new OER through the revision and remix activities, the original OER and the transformation process should be adequately identified. This way, the user of the OER has enough information about the history of the resource and, thus, can use it with confidence and security. In this context, determining data provenance, which describes the history of a data from its origin to its current state, becomes very relevant. For OER, there are examples of metadata standards and digital repositories that help to obtain the data provenance. However, the information collected is insufficient to identify the entire history of the provenance of OER. This article proposes a Provenance Model for OER called the ProvOER Model, which allows the documentation and identification of the provenance of OER. For this purpose, a minimum set of metadata was defined that reflects the OER intrinsic properties and the activities that created a new OER. The experiments showed that the ProvOER Model produced a suitable representation of the provenance of OER. In addition, the ProvOER Model allowed identifying the original OER used in a revise or remix activity and the continuous stretch used to create a new resource.

PubMed: 36755614
DOI: 10.1016/j.heliyon.2023.e13311

Trackable and scalable LC-MS metabolomics data processing using asari.

Nature Communications Jul 2023

Significant challenges remain in the computational processing of data from liquid chomratography-mass spectrometry (LC-MS)-based metabolomic experiments into metabolite...

Summary PubMed Full Text PDF

Authors: Shuzhao Li, Amnah Siddiqa, Maheshwor Thapa...

Significant challenges remain in the computational processing of data from liquid chomratography-mass spectrometry (LC-MS)-based metabolomic experiments into metabolite features. In this study, we examine the issues of provenance and reproducibility using the current software tools. Inconsistency among the tools examined is attributed to the deficiencies of mass alignment and controls of feature quality. To address these issues, we develop the open-source software tool asari for LC-MS metabolomics data processing. Asari is designed with a set of specific algorithmic framework and data structures, and all steps are explicitly trackable. Asari compares favorably to other tools in feature detection and quantification. It offers substantial improvement in computational performance over current tools, and it is highly scalable.

Topics: Chromatography, Liquid; Reproducibility of Results; Tandem Mass Spectrometry; Metabolomics

PubMed: 37433854
DOI: 10.1038/s41467-023-39889-1

Reproducible acquisition, management and meta-analysis of nucleotide sequence (meta)data using q2-fondue.

Bioinformatics (Oxford, England) Nov 2022

The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However,... (Meta-Analysis)

Summary PubMed Full Text PDF

Meta-Analysis

Authors: Michal Ziemski, Anja Adamov, Lina Kim...

MOTIVATION

The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles.

RESULTS

q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets.

AVAILABILITY AND IMPLEMENTATION

q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Topics: Software; Base Sequence; Ecosystem; Metadata; Metagenome

PubMed: 36130056
DOI: 10.1093/bioinformatics/btac639

FAIRSCAPE: a Framework for FAIR and Reproducible Biomedical Analytics.

Neuroinformatics Jan 2022

Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve...

Summary PubMed Full Text PDF

Authors: Maxwell Adam Levinson, Justin Niestroy, Sadnan Al Manir...

Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result's metadata. An ontology for Evidence Graphs, EVI ( https://w3id.org/EVI ), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.

Topics: Metadata; Reproducibility of Results; Software; Workflow

PubMed: 34264488
DOI: 10.1007/s12021-021-09529-4

The Status of Data Management Practices Across German Medical Data Integration Centers: Mixed Methods Study.

Journal of Medical Internet Research Nov 2023

In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data...

Summary PubMed Full Text PDF

Authors: Kerstin Gierend, Sherry Freiesleben, Dennis Kadioglu...

BACKGROUND

In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status.

OBJECTIVE

Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data.

METHODS

In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.

RESULTS

Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.

CONCLUSIONS

The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.

Topics: Humans; Data Management; Delivery of Health Care; Medical Informatics; Surveys and Questionnaires

PubMed: 37938878
DOI: 10.2196/48809

Lightweight Data-Security Ontology for IoT.

Sensors (Basel, Switzerland) Feb 2020

Although current estimates depict steady growth in Internet of Things (IoT), many works portray an as yet immature technology in terms of security. Attacks using low...

Summary PubMed Full Text PDF

Authors: Pedro Gonzalez-Gil, Juan Antonio Martinez, Antonio F Skarmeta...

Although current estimates depict steady growth in Internet of Things (IoT), many works portray an as yet immature technology in terms of security. Attacks using low performance devices, the application of new technologies and data analysis to infer private data, lack of development in some aspects of security offer a wide field for improvement. The advent of Semantic Technologies for IoT offers a new set of possibilities and challenges, like data markets, aggregators, processors and search engines, which rise the need for security. New regulations, such as GDPR , also call for novel approaches on data-security, covering personal data. In this work, we present DS4IoT, a data-security ontology for IoT, which covers the representation of data-security concepts with the novel approach of doing so from the perspective of data and introducing some new concepts such as regulations, certifications and provenance, to classical concepts such as access control methods and authentication mechanisms. In the process we followed ontological methodologies, as well as semantic web best practices, resulting in an ontology to serve as a common vocabulary for data annotation that not only distinguishes itself from previous works by its bottom-up approach, but covers new, current and interesting concepts of data-security, favouring implicit over explicit knowledge representation. Finally, this work is validated by proof of concept, by mapping the DS4IoT ontology to the NGSI-LD data model, in the frame of the IoTCrawler EU project.

PubMed: 32024127
DOI: 10.3390/s20030801

Gradient-based enhancement attacks in biomedical machine learning.

ArXiv Aug 2023

The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works...

Summary PubMed Full Text PDF

Authors: Matthew Rosenblatt, Javid Dadashkarimi, Dustin Scheinost...

The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works have investigated the ability of adversarial attacks to degrade model performance in medical imaging, the ability to falsely improve performance via recently-developed "enhancement attacks" may be a greater threat to biomedical machine learning. In the spirit of developing attacks to better understand trustworthiness, we developed two techniques to drastically enhance prediction performance of classifiers with minimal changes to features: 1) general enhancement of prediction performance, and 2) enhancement of a particular method over another. Our enhancement framework falsely improved classifiers' accuracy from 50% to almost 100% while maintaining high feature similarities between original and enhanced data (Pearson's ' > 0.99). Similarly, the method-specific enhancement framework was effective in falsely improving the performance of one method over another. For example, a simple neural network outperformed logistic regression by 17% on our enhanced dataset, although no performance differences were present in the original dataset. Crucially, the original and enhanced data were still similar ( = 0.99). Our results demonstrate the feasibility of minor data manipulations to achieve any desired prediction performance, which presents an interesting ethical challenge for the future of biomedical machine learning. These findings emphasize the need for more robust data provenance tracking and other precautionary measures to ensure the integrity of biomedical machine learning research. Code is available at https://github.com/mattrosenblatt7/enhancement_EPIMI.

PubMed: 36713237
DOI: No ID Found

A Decentralized Marketplace for Patient-Generated Health Data: Design Science Approach.

Journal of Medical Internet Research Feb 2023

Wearable devices have limited ability to store and process such data. Currently, individual users or data aggregators are unable to monetize or contribute such data to...

Summary PubMed Full Text PDF

Authors: Hemang Subramanian

BACKGROUND

Wearable devices have limited ability to store and process such data. Currently, individual users or data aggregators are unable to monetize or contribute such data to wider analytics use cases. When combined with clinical health data, such data can improve the predictive power of data-driven analytics and can proffer many benefits to improve the quality of care. We propose and provide a marketplace mechanism to make these data available while benefiting data providers.

OBJECTIVE

We aimed to propose the concept of a decentralized marketplace for patient-generated health data that can improve provenance, data accuracy, security, and privacy. Using a proof-of-concept prototype with an interplanetary file system (IPFS) and Ethereum smart contracts, we aimed to demonstrate decentralized marketplace functionality with the blockchain. We also aimed to illustrate and demonstrate the benefits of such a marketplace.

METHODS

We used a design science research methodology to define and prototype our decentralized marketplace and used the Ethereum blockchain, solidity smart-contract programming language, the web3.js library, and node.js with the MetaMask application to prototype our system.

RESULTS

We designed and implemented a prototype of a decentralized health care marketplace catering to health data. We used an IPFS to store data, provide an encryption scheme for the data, and provide smart contracts to communicate with users on the Ethereum blockchain. We met the design goals we set out to accomplish in this study.

CONCLUSIONS

A decentralized marketplace for trading patient-generated health data can be created using smart-contract technology and IPFS-based data storage. Such a marketplace can improve quality, availability, and provenance and satisfy data privacy, access, auditability, and security needs for such data when compared with centralized systems.

Topics: Humans; Blockchain; Data Accuracy; Patients; Privacy; Programming Languages

PubMed: 36848185
DOI: 10.2196/42743