-
Animals : An Open Access Journal From... Aug 2022Veterinary forensics is becoming more important in our society as a result of the growing demand for investigations related to crimes against animals or investigations... (Review)
Review
Veterinary forensics is becoming more important in our society as a result of the growing demand for investigations related to crimes against animals or investigations of criminal deaths caused by animals. A veterinarian may participate as an expert witness or may be required to give forensic assistance, by providing knowledge of the specialty to establish a complete picture of the involvement of an animal and allowing the Courts to reach a verdict. By applying diverse dental profiling techniques, not only can species, sex, age-at-death, and body size of an animal be estimated, but also data about their geographical origin (provenance) and the post-mortem interval. This review concentrates on the dental techniques that use the characteristics of teeth as a means of identification of freshly deceased and skeletonised animals. Furthermore, this highlights the information that can be extracted about the animal from the post-mortem dental profile.
PubMed: 36009628
DOI: 10.3390/ani12162038 -
Journal of Biomedical Semantics Jan 2022The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and...
BACKGROUND
The advancement of science and technologies play an immense role in the way scientific experiments are being conducted. Understanding how experiments are performed and how results are derived has become significantly more complex with the recent explosive growth of heterogeneous research data and methods. Therefore, it is important that the provenance of results is tracked, described, and managed throughout the research lifecycle starting from the beginning of an experiment to its end to ensure reproducibility of results described in publications. However, there is a lack of interoperable representation of end-to-end provenance of scientific experiments that interlinks data, processing steps, and results from an experiment's computational and non-computational processes.
RESULTS
We present the "REPRODUCE-ME" data model and ontology to describe the end-to-end provenance of scientific experiments by extending existing standards in the semantic web. The ontology brings together different aspects of the provenance of scientific studies by interlinking non-computational data and steps with computational data and steps to achieve understandability and reproducibility. We explain the important classes and properties of the ontology and how they are mapped to existing ontologies like PROV-O and P-Plan. The ontology is evaluated by answering competency questions over the knowledge base of scientific experiments consisting of computational and non-computational data and steps.
CONCLUSION
We have designed and developed an interoperable way to represent the complete path of a scientific experiment consisting of computational and non-computational steps. We have applied and evaluated our approach to a set of scientific experiments in different subject domains like computational science, biological imaging, and microscopy.
Topics: Knowledge Bases; Reproducibility of Results; Semantic Web; Semantics
PubMed: 34991705
DOI: 10.1186/s13326-021-00253-1 -
Heliyon Feb 2023Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In... (Review)
Review
Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In this sense, OER strongly contributes to generating and sharing educational knowledge. Due to the possibility of creating a new OER through the revision and remix activities, the original OER and the transformation process should be adequately identified. This way, the user of the OER has enough information about the history of the resource and, thus, can use it with confidence and security. In this context, determining data provenance, which describes the history of a data from its origin to its current state, becomes very relevant. For OER, there are examples of metadata standards and digital repositories that help to obtain the data provenance. However, the information collected is insufficient to identify the entire history of the provenance of OER. This article proposes a Provenance Model for OER called the ProvOER Model, which allows the documentation and identification of the provenance of OER. For this purpose, a minimum set of metadata was defined that reflects the OER intrinsic properties and the activities that created a new OER. The experiments showed that the ProvOER Model produced a suitable representation of the provenance of OER. In addition, the ProvOER Model allowed identifying the original OER used in a revise or remix activity and the continuous stretch used to create a new resource.
PubMed: 36755614
DOI: 10.1016/j.heliyon.2023.e13311 -
Nature Communications Jul 2023Significant challenges remain in the computational processing of data from liquid chomratography-mass spectrometry (LC-MS)-based metabolomic experiments into metabolite...
Significant challenges remain in the computational processing of data from liquid chomratography-mass spectrometry (LC-MS)-based metabolomic experiments into metabolite features. In this study, we examine the issues of provenance and reproducibility using the current software tools. Inconsistency among the tools examined is attributed to the deficiencies of mass alignment and controls of feature quality. To address these issues, we develop the open-source software tool asari for LC-MS metabolomics data processing. Asari is designed with a set of specific algorithmic framework and data structures, and all steps are explicitly trackable. Asari compares favorably to other tools in feature detection and quantification. It offers substantial improvement in computational performance over current tools, and it is highly scalable.
Topics: Chromatography, Liquid; Reproducibility of Results; Tandem Mass Spectrometry; Metabolomics
PubMed: 37433854
DOI: 10.1038/s41467-023-39889-1 -
Bioinformatics (Oxford, England) Nov 2022The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However,... (Meta-Analysis)
Meta-Analysis
MOTIVATION
The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles.
RESULTS
q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets.
AVAILABILITY AND IMPLEMENTATION
q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Software; Base Sequence; Ecosystem; Metadata; Metagenome
PubMed: 36130056
DOI: 10.1093/bioinformatics/btac639 -
Neuroinformatics Jan 2022Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve...
Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result's metadata. An ontology for Evidence Graphs, EVI ( https://w3id.org/EVI ), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.
Topics: Metadata; Reproducibility of Results; Software; Workflow
PubMed: 34264488
DOI: 10.1007/s12021-021-09529-4 -
Journal of Medical Internet Research Nov 2023In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data...
BACKGROUND
In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status.
OBJECTIVE
Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data.
METHODS
In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.
RESULTS
Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.
CONCLUSIONS
The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
Topics: Humans; Data Management; Delivery of Health Care; Medical Informatics; Surveys and Questionnaires
PubMed: 37938878
DOI: 10.2196/48809 -
Sensors (Basel, Switzerland) Feb 2020Although current estimates depict steady growth in Internet of Things (IoT), many works portray an as yet immature technology in terms of security. Attacks using low...
Although current estimates depict steady growth in Internet of Things (IoT), many works portray an as yet immature technology in terms of security. Attacks using low performance devices, the application of new technologies and data analysis to infer private data, lack of development in some aspects of security offer a wide field for improvement. The advent of Semantic Technologies for IoT offers a new set of possibilities and challenges, like data markets, aggregators, processors and search engines, which rise the need for security. New regulations, such as GDPR , also call for novel approaches on data-security, covering personal data. In this work, we present DS4IoT, a data-security ontology for IoT, which covers the representation of data-security concepts with the novel approach of doing so from the perspective of data and introducing some new concepts such as regulations, certifications and provenance, to classical concepts such as access control methods and authentication mechanisms. In the process we followed ontological methodologies, as well as semantic web best practices, resulting in an ontology to serve as a common vocabulary for data annotation that not only distinguishes itself from previous works by its bottom-up approach, but covers new, current and interesting concepts of data-security, favouring implicit over explicit knowledge representation. Finally, this work is validated by proof of concept, by mapping the DS4IoT ontology to the NGSI-LD data model, in the frame of the IoTCrawler EU project.
PubMed: 32024127
DOI: 10.3390/s20030801 -
ArXiv Aug 2023The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works...
The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works have investigated the ability of adversarial attacks to degrade model performance in medical imaging, the ability to falsely improve performance via recently-developed "enhancement attacks" may be a greater threat to biomedical machine learning. In the spirit of developing attacks to better understand trustworthiness, we developed two techniques to drastically enhance prediction performance of classifiers with minimal changes to features: 1) general enhancement of prediction performance, and 2) enhancement of a particular method over another. Our enhancement framework falsely improved classifiers' accuracy from 50% to almost 100% while maintaining high feature similarities between original and enhanced data (Pearson's ' > 0.99). Similarly, the method-specific enhancement framework was effective in falsely improving the performance of one method over another. For example, a simple neural network outperformed logistic regression by 17% on our enhanced dataset, although no performance differences were present in the original dataset. Crucially, the original and enhanced data were still similar ( = 0.99). Our results demonstrate the feasibility of minor data manipulations to achieve any desired prediction performance, which presents an interesting ethical challenge for the future of biomedical machine learning. These findings emphasize the need for more robust data provenance tracking and other precautionary measures to ensure the integrity of biomedical machine learning research. Code is available at https://github.com/mattrosenblatt7/enhancement_EPIMI.
PubMed: 36713237
DOI: No ID Found -
Journal of Medical Internet Research Feb 2023Wearable devices have limited ability to store and process such data. Currently, individual users or data aggregators are unable to monetize or contribute such data to...
BACKGROUND
Wearable devices have limited ability to store and process such data. Currently, individual users or data aggregators are unable to monetize or contribute such data to wider analytics use cases. When combined with clinical health data, such data can improve the predictive power of data-driven analytics and can proffer many benefits to improve the quality of care. We propose and provide a marketplace mechanism to make these data available while benefiting data providers.
OBJECTIVE
We aimed to propose the concept of a decentralized marketplace for patient-generated health data that can improve provenance, data accuracy, security, and privacy. Using a proof-of-concept prototype with an interplanetary file system (IPFS) and Ethereum smart contracts, we aimed to demonstrate decentralized marketplace functionality with the blockchain. We also aimed to illustrate and demonstrate the benefits of such a marketplace.
METHODS
We used a design science research methodology to define and prototype our decentralized marketplace and used the Ethereum blockchain, solidity smart-contract programming language, the web3.js library, and node.js with the MetaMask application to prototype our system.
RESULTS
We designed and implemented a prototype of a decentralized health care marketplace catering to health data. We used an IPFS to store data, provide an encryption scheme for the data, and provide smart contracts to communicate with users on the Ethereum blockchain. We met the design goals we set out to accomplish in this study.
CONCLUSIONS
A decentralized marketplace for trading patient-generated health data can be created using smart-contract technology and IPFS-based data storage. Such a marketplace can improve quality, availability, and provenance and satisfy data privacy, access, auditability, and security needs for such data when compared with centralized systems.
Topics: Humans; Blockchain; Data Accuracy; Patients; Privacy; Programming Languages
PubMed: 36848185
DOI: 10.2196/42743