-
Heliyon Feb 2023Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In... (Review)
Review
Open Educational Resources (OER) can be adapted and combined to create new resources that better meet the specific needs of different kinds of users and scenarios. In this sense, OER strongly contributes to generating and sharing educational knowledge. Due to the possibility of creating a new OER through the revision and remix activities, the original OER and the transformation process should be adequately identified. This way, the user of the OER has enough information about the history of the resource and, thus, can use it with confidence and security. In this context, determining data provenance, which describes the history of a data from its origin to its current state, becomes very relevant. For OER, there are examples of metadata standards and digital repositories that help to obtain the data provenance. However, the information collected is insufficient to identify the entire history of the provenance of OER. This article proposes a Provenance Model for OER called the ProvOER Model, which allows the documentation and identification of the provenance of OER. For this purpose, a minimum set of metadata was defined that reflects the OER intrinsic properties and the activities that created a new OER. The experiments showed that the ProvOER Model produced a suitable representation of the provenance of OER. In addition, the ProvOER Model allowed identifying the original OER used in a revise or remix activity and the continuous stretch used to create a new resource.
PubMed: 36755614
DOI: 10.1016/j.heliyon.2023.e13311 -
Bioinformatics (Oxford, England) Nov 2022The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However,... (Meta-Analysis)
Meta-Analysis
MOTIVATION
The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles.
RESULTS
q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets.
AVAILABILITY AND IMPLEMENTATION
q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Software; Base Sequence; Ecosystem; Metadata; Metagenome
PubMed: 36130056
DOI: 10.1093/bioinformatics/btac639 -
Neuroinformatics Jan 2022Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve...
Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result's metadata. An ontology for Evidence Graphs, EVI ( https://w3id.org/EVI ), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.
Topics: Metadata; Reproducibility of Results; Software; Workflow
PubMed: 34264488
DOI: 10.1007/s12021-021-09529-4 -
AMIA ... Annual Symposium Proceedings.... 2017Scientific reproducibility is critical for biomedical research as it enables us to advance science by building on previous results, helps ensure the success of...
Scientific reproducibility is critical for biomedical research as it enables us to advance science by building on previous results, helps ensure the success of increasingly expensive drug trials, and allows funding agencies to make informed decisions. However, there is a growing "crisis" of reproducibility as evidenced by a recent Nature journal survey of more than 1500 researchers that found that 70% of researchers were not able to replicate results from other research groups and more than 50% of researchers were not able reproduce their own research results. In 2016, the National Institutes of Health (NIH) announced the "Rigor and Reproducibility" guidelines to support reproducibility in biomedical research. A key component of the NIH Rigor and Reproducibility guidelines is the recording and analysis of "provenance" information, which describes the origin or history of data and plays a central role in ensuring scientific reproducibility. As part of the NIH Big Data to Knowledge (BD2K)-funded data provenance project, we have developed a new informatics framework called Provenance for Clinical and Healthcare Research (ProvCaRe) to extract, model, and analyze provenance information from published literature describing research studies. Using sleep medicine research studies that have made their data available through the National Sleep Research Resource (NSRR), we have developed an automated pipeline to identify and extract provenance metadata from published literature that is made available for analysis in the ProvCaRe knowledgebase. NSRR is the largest repository of sleep data from over 40,000 studies involving 36,000 participants and we used 75 published articles describing 6 research studies to populate the ProvCaRe knowledgebase. We evaluated the ProvCaRe knowledgebase with 28,474 "provenance triples" using hypothesis-driven queries to identify and rank research studies based on the provenance information extracted from published articles.
Topics: Algorithms; Biological Ontologies; Biomedical Research; Guidelines as Topic; Health Services Research; Humans; Knowledge Bases; Metadata; National Institutes of Health (U.S.); Reproducibility of Results; Semantics; Sleep; United States
PubMed: 29854241
DOI: No ID Found -
Journal of Medical Internet Research Nov 2023In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data...
BACKGROUND
In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status.
OBJECTIVE
Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data.
METHODS
In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.
RESULTS
Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.
CONCLUSIONS
The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
Topics: Humans; Data Management; Delivery of Health Care; Medical Informatics; Surveys and Questionnaires
PubMed: 37938878
DOI: 10.2196/48809 -
Sensors (Basel, Switzerland) Feb 2020Although current estimates depict steady growth in Internet of Things (IoT), many works portray an as yet immature technology in terms of security. Attacks using low...
Although current estimates depict steady growth in Internet of Things (IoT), many works portray an as yet immature technology in terms of security. Attacks using low performance devices, the application of new technologies and data analysis to infer private data, lack of development in some aspects of security offer a wide field for improvement. The advent of Semantic Technologies for IoT offers a new set of possibilities and challenges, like data markets, aggregators, processors and search engines, which rise the need for security. New regulations, such as GDPR , also call for novel approaches on data-security, covering personal data. In this work, we present DS4IoT, a data-security ontology for IoT, which covers the representation of data-security concepts with the novel approach of doing so from the perspective of data and introducing some new concepts such as regulations, certifications and provenance, to classical concepts such as access control methods and authentication mechanisms. In the process we followed ontological methodologies, as well as semantic web best practices, resulting in an ontology to serve as a common vocabulary for data annotation that not only distinguishes itself from previous works by its bottom-up approach, but covers new, current and interesting concepts of data-security, favouring implicit over explicit knowledge representation. Finally, this work is validated by proof of concept, by mapping the DS4IoT ontology to the NGSI-LD data model, in the frame of the IoTCrawler EU project.
PubMed: 32024127
DOI: 10.3390/s20030801 -
Frontiers in Pharmacology 2018Plants were an essential part of foraging for food and health, and for centuries remained the only medicines available to people from the remote mountain regions. Their... (Review)
Review
Plants were an essential part of foraging for food and health, and for centuries remained the only medicines available to people from the remote mountain regions. Their correct botanical provenance is an essential basis for understanding the ethnic cultures, as well as for chemical identification of the novel bioactive molecules with therapeutic effects. This work describes the use of herbal medicines in the Beskid mountain ranges located south of Krakow and Lviv, two influential medieval centers of apothecary tradition in the region. Local botanical remedies shared by Boyko, Lemko, and Gorale ethnic groups were a part of the medieval European system of medicine, used according to their Dioscoridean and Galenic qualities. Within the context of ethnic plant medicine and botanical classification, this review identified strong preferences for local use of St John's-wort ( L.), wormwood ( L.), garlic ( L.), gentian ( L.), lovage ( W.D.J. Koch), and lesser periwinkle ( L.). While Ukrainian ethnic groups favored the use of guilder-rose ( L.) and yarrow ( L.), Polish inhabitants especially valued angelica ( L.) and carline thistle ( L.). The region also holds a strong potential for collection, cultivation, and manufacture of medicinal plants and plant-based natural specialty ingredients for the food, health and cosmetic industries, in part due to high degree of biodiversity and ecological preservation. Many of these products, including whole food nutritional supplements, will soon complement conventional medicines in prevention and treatment of diseases, while adding value to agriculture and local economies.
PubMed: 29674964
DOI: 10.3389/fphar.2018.00295 -
Proceedings. International Conference... Apr 2019Data provenance tools capture the steps used to produce analyses. However, scientists must choose among work-flow provenance systems, which allow arbitrary code but only...
Data provenance tools capture the steps used to produce analyses. However, scientists must choose among work-flow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; provenance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks - for data types such as strings, images, etc. Scientists need new capabilities to identify the sources of errors, find why different code versions produce different results, and identify which parameter values affect output. We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content data objects. PROVision extends database-style provenance techniques to capture equivalences, support optimizations, and enable selective evaluation. We formalize our extensions, implement them in the PROVision system, and validate their effectiveness and scalability for common ETL and matching tasks.
PubMed: 31595143
DOI: 10.1109/ICDE.2019.00025 -
Journal of Medical Internet Research Feb 2023Wearable devices have limited ability to store and process such data. Currently, individual users or data aggregators are unable to monetize or contribute such data to...
BACKGROUND
Wearable devices have limited ability to store and process such data. Currently, individual users or data aggregators are unable to monetize or contribute such data to wider analytics use cases. When combined with clinical health data, such data can improve the predictive power of data-driven analytics and can proffer many benefits to improve the quality of care. We propose and provide a marketplace mechanism to make these data available while benefiting data providers.
OBJECTIVE
We aimed to propose the concept of a decentralized marketplace for patient-generated health data that can improve provenance, data accuracy, security, and privacy. Using a proof-of-concept prototype with an interplanetary file system (IPFS) and Ethereum smart contracts, we aimed to demonstrate decentralized marketplace functionality with the blockchain. We also aimed to illustrate and demonstrate the benefits of such a marketplace.
METHODS
We used a design science research methodology to define and prototype our decentralized marketplace and used the Ethereum blockchain, solidity smart-contract programming language, the web3.js library, and node.js with the MetaMask application to prototype our system.
RESULTS
We designed and implemented a prototype of a decentralized health care marketplace catering to health data. We used an IPFS to store data, provide an encryption scheme for the data, and provide smart contracts to communicate with users on the Ethereum blockchain. We met the design goals we set out to accomplish in this study.
CONCLUSIONS
A decentralized marketplace for trading patient-generated health data can be created using smart-contract technology and IPFS-based data storage. Such a marketplace can improve quality, availability, and provenance and satisfy data privacy, access, auditability, and security needs for such data when compared with centralized systems.
Topics: Humans; Blockchain; Data Accuracy; Patients; Privacy; Programming Languages
PubMed: 36848185
DOI: 10.2196/42743 -
PloS One 2024This study, conducted in China in November 2020, was aimed at exploring the variations in growth traits among different provenances and families as well as to select...
This study, conducted in China in November 2020, was aimed at exploring the variations in growth traits among different provenances and families as well as to select elite materials of Juglans mandshurica. Thus, seeds of 44 families from six J. mandshurica provenances in Heilongjiang and Jilin provinces were sown in the nursery and then transplanted out in the field. At the age of 5 years, seven growth traits were assessed, and a comprehensive analysis was conducted as well as selection of provenance and families. Analysis of variance revealed statistically significant (P < 0.01) differences in seven growth traits among different provenances and families, thereby justifying the pursuit of further breeding endeavors. The genetic coefficient of variation (GCV) for all traits ranged from 5.44% (branch angle) to 21.95% (tree height) whereas the phenotypic coefficient of variation (PCV) ranged from 13.74% (tapering) to 38.50% (branch number per node), indicating considerable variability across the traits. Further, all the studied traits except stem straightness degree, branch angle and branch number per node, showed high heritability (Tree height, ground diameter, mean crown width and tapering, over 0.7±0.073), indicating that the variation in these traits is primarily driven by genetic factors. Correlation analysis revealed a strong positive correlation (r > 0.8) between tree height and ground diameter (r = 0.86), tree height and mean crown width (r = 0.82), and ground diameter and mean crown width (r = 0.83). This suggests that these relationships can be employed for more precise predictions of the growth and morphological characteristics of trees, as well as the selection of superior materials. There was a strong correlation between temperature factors and growth traits. Based on the comprehensive scores in this study, Sanchazi was selected as elite provenance. Using the top-percentile selection criteria, SC1, SC8, DJC15, and DQ18 were selected as elite families. These selected families exhibit genetic gains of over 10% in tree height, ground diameter and mean crown width, signifying their significant potential in forestry for enhancing timber production and reducing production cycles, thereby contributing to sustainable forest management. In this study, the growth traits of J. mandshurica were found to exhibit stable variation, and there were correlations between these traits. The selected elite provenance and families of J. mandshurica showed faster growth, which is advantageous for the subsequent breeding and promotion of improved J. mandshurica varieties.
Topics: Juglans; Plant Breeding; Trees; Forests; China
PubMed: 38451964
DOI: 10.1371/journal.pone.0298918