-
Indian Journal of Occupational and... 2023Reproducibility is a preferred aim in any scientific research, including occupational health research. Datamanagement is an important and essential step in marching...
Reproducibility is a preferred aim in any scientific research, including occupational health research. Datamanagement is an important and essential step in marching towards reproducibility. A good datamanagement helps us stay organized, improve transparency, quality and fosters collaboration. Here we discuss how to organize and prepare for data management, how data management facilitates interoperability and accessibility, followed by storing and dissemination of data. We wrap up by providing pointers on what needs to be included in the data management plans.
PubMed: 38390491
DOI: 10.4103/ijoem.ijoem_342_22 -
Open Research Europe 2023This document outlines the types of data collected for the Digital Ludeme Project, an ERC-funded research project that aims to improve our understanding of the...
This document outlines the types of data collected for the Digital Ludeme Project, an ERC-funded research project that aims to improve our understanding of the development of games throughout human history through computational analysis of the available (partial) historical data of games. This document outlines how this data is collected, formatted and stored, and how it can be accessed. It is the aim of the Digital Ludeme Project to provide a data resource of unprecedented depth and scope for the benefit of historical games researchers worldwide. Special attention is paid to the FAIR Guiding Principles for scientific data management and stewardship.
PubMed: 38550771
DOI: 10.12688/openreseurope.16524.1 -
Annals of Medicine Dec 2024The construction of a robust healthcare information system is fundamental to enhancing countries' capabilities in the surveillance and control of hepatitis B virus...
BACKGROUND
The construction of a robust healthcare information system is fundamental to enhancing countries' capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China's rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization's (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China.
METHODS
Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital's Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework.
RESULTS
The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample ( = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients' physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05).
CONCLUSIONS
Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264).
Topics: Humans; Hepatitis B virus; Big Data; Machine Learning; China; Risk Assessment
PubMed: 38340309
DOI: 10.1080/07853890.2024.2314237 -
Scientific Reports Oct 2023Nowadays, several companies prefer storing their data on multiple data centers with replication for many reasons. The data that spans various data centers ensures the...
Nowadays, several companies prefer storing their data on multiple data centers with replication for many reasons. The data that spans various data centers ensures the fastest possible response time for customers and workforces who are geographically separated. It also provides protecting the information from the loss in case a single data center experiences a disaster. However, the amount of data is increasing at a rapid pace, which leads to challenges in storage, analysis, and various processing tasks. In this paper, we propose and design a geographically distributed data management framework to manage the massive data stored and distributed among geo-distributed data centers. The goal of the proposed framework is to enable efficient use of the distributed data blocks for various data analysis tasks. The architecture of the proposed framework is composed of a grid of geo-distributed data centers connected to a data controller (DCtrl). The DCtrl is responsible for organizing and managing the block replicas across the geo-distributed data centers. We use the BDMS system as the installed system on the distributed data centers. BDMS stores the big data file as a set of random sample data blocks, each being a random sample of the whole data file. Then, DCtrl distributes these data blocks into multiple data centers with replication. In analyzing a big data file distributed based on the proposed framework, we randomly select a sample of data blocks replicated from other data centers on any data center. We use simulation results to demonstrate the performance of the proposed framework in big data analysis across geo-distributed data centers.
PubMed: 37853092
DOI: 10.1038/s41598-023-44789-x -
Healthcare (Basel, Switzerland) Oct 2023Generative artificial intelligence (AI) and large language models (LLMs), exemplified by ChatGPT, are promising for revolutionizing data and information management in... (Review)
Review
Generative artificial intelligence (AI) and large language models (LLMs), exemplified by ChatGPT, are promising for revolutionizing data and information management in healthcare and medicine. However, there is scant literature guiding their integration for non-AI professionals. This study conducts a scoping literature review to address the critical need for guidance on integrating generative AI and LLMs into healthcare and medical practices. It elucidates the distinct mechanisms underpinning these technologies, such as Reinforcement Learning from Human Feedback (RLFH), including few-shot learning and chain-of-thought reasoning, which differentiates them from traditional, rule-based AI systems. It requires an inclusive, collaborative co-design process that engages all pertinent stakeholders, including clinicians and consumers, to achieve these benefits. Although global research is examining both opportunities and challenges, including ethical and legal dimensions, LLMs offer promising advancements in healthcare by enhancing data management, information retrieval, and decision-making processes. Continued innovation in data acquisition, model fine-tuning, prompt strategy development, evaluation, and system implementation is imperative for realizing the full potential of these technologies. Organizations should proactively engage with these technologies to improve healthcare quality, safety, and efficiency, adhering to ethical and legal guidelines for responsible application.
PubMed: 37893850
DOI: 10.3390/healthcare11202776 -
Environmental Research Nov 2023Within collaborative projects, such as the EU-funded Horizon 2020 EXIMIOUS project (Mapping Exposure-Induced Immune Effects: Connecting the Exposome and the Immunome),...
Within collaborative projects, such as the EU-funded Horizon 2020 EXIMIOUS project (Mapping Exposure-Induced Immune Effects: Connecting the Exposome and the Immunome), collection and analysis of large volumes of data pose challenges in the domain of data management, with regards to both ethical and legal aspects. However, researchers often lack the right tools and/or accurate understanding of the ethical/legal framework to independently address such challenges. With the guidance and support within and between the partner institutes (the researchers and the ethical and legal teams) in the EXIMIOUS project, we have been able to understand and solve most challenges during the first two project years. This has fed into the development of a Data Management Plan and the establishment of data management platforms in accordance with the ethical and legal framework laid down by the EU and the different national regulations of the partners involved. Through this elaborate exercise, we have acquired tools which allow us to make our research data FAIR (Findable, Accessible, Interoperable, and Reusable), while at the same time ensuring data privacy and security (GDPR compliant). Herein we share our experience of creating and managing the data workflow through an open research communication, with the aim of helping other researchers build their data management framework in their own projects. Based on the measures adopted in EXIMIOUS to ensure FAIR data management, we also put together a checklist "DMP CHECK" containing a series of recommendations based on our experience.
PubMed: 37597835
DOI: 10.1016/j.envres.2023.116886 -
PLoS Computational Biology Sep 2023Research data is accumulating rapidly and with it the challenge of fully reproducible science. As a consequence, implementation of high-quality management of scientific...
Research data is accumulating rapidly and with it the challenge of fully reproducible science. As a consequence, implementation of high-quality management of scientific data has become a global priority. The FAIR (Findable, Accesible, Interoperable and Reusable) principles provide practical guidelines for maximizing the value of research data; however, processing data using workflows-systematic executions of a series of computational tools-is equally important for good data management. The FAIR principles have recently been adapted to Research Software (FAIR4RS Principles) to promote the reproducibility and reusability of any type of research software. Here, we propose a set of 10 quick tips, drafted by experienced workflow developers that will help researchers to apply FAIR4RS principles to workflows. The tips have been arranged according to the FAIR acronym, clarifying the purpose of each tip with respect to the FAIR4RS principles. Altogether, these tips can be seen as practical guidelines for workflow developers who aim to contribute to more reproducible and sustainable computational science, aiming to positively impact the open science and FAIR community.
PubMed: 37768885
DOI: 10.1371/journal.pcbi.1011369 -
Journal of Medical Internet Research Nov 2023In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data...
BACKGROUND
In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status.
OBJECTIVE
Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data.
METHODS
In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.
RESULTS
Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.
CONCLUSIONS
The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
Topics: Humans; Data Management; Delivery of Health Care; Medical Informatics; Surveys and Questionnaires
PubMed: 37938878
DOI: 10.2196/48809 -
Frontiers in Genetics 2023With regard to the use and transfer of research participants' personal information, samples and other data nationally and internationally, it is necessary to construct a...
With regard to the use and transfer of research participants' personal information, samples and other data nationally and internationally, it is necessary to construct a data management plan. One of the key objectives of a data management plan is to explain the governance of clinical, biochemical, laboratory, molecular and other sources of data according to the regulations and policies of all relevant stakeholders. It also seeks to describe the processes involved in protecting the personal information of research participants, especially those from vulnerable populations. In most data management plans, the framework therefore consists of describing the collection, organization, use, storage, contextualization, preservation, sharing and access of/to research data and/or samples. It may also include a description of data management resources, including those associated with analyzed samples, and identifies responsible parties for the establishment, implementation and overall management of the data management strategy. Importantly, the data management plan serves to highlight potential problems with the collection, sharing, and preservation of research data. However, there are different forms of data management plans and requirements may vary due to funder guidelines and the nature of the study under consideration. This paper leverages the detailed data management plans constructed for the 'NESHIE study' and is a first attempt at providing a comprehensive template applicable to research focused on vulnerable populations, particularly those within LMICs, that includes a multi-omics approach to achieve the study aims. More particularly, this template, available for download as a supplementary document, provides a modifiable outline for future projects that involve similar sensitivities, whether in clinical research or clinical trials. It includes a description of the management not only of the data generated through standard clinical practice, but also that which is generated through the analysis of a variety of samples being collected from research participants and analyzed using multi-omics approaches.
PubMed: 38130874
DOI: 10.3389/fgene.2023.1273975 -
Clinical Research in Cardiology :... May 2024The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the... (Review)
Review
The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the progress of cardiovascular research and healthcare, and contributing to the reproducibility of research results. However, challenges remain. This position paper, written on behalf of and approved by the German Cardiac Society and German Centre for Cardiovascular Research, summarizes our current understanding of the challenges in cardiovascular research data management (RDM). These challenges include lack of time, awareness, incentives, and funding for implementing effective RDM; lack of standardization in RDM processes; a need to better identify meaningful and actionable data among the increasing volume and complexity of data being acquired; and a lack of understanding of the legal aspects of data sharing. While several tools exist to increase the degree to which data are findable, accessible, interoperable, and reusable (FAIR), more work is needed to lower the threshold for effective RDM not just in cardiovascular research but in all biomedical research, with data sharing and reuse being factored in at every stage of the scientific process. A culture of open science with FAIR research data should be fostered through education and training of early-career and established research professionals. Ultimately, FAIR RDM requires permanent, long-term effort at all levels. If outcomes can be shown to be superior and to promote better (and better value) science, modern RDM will make a positive difference to cardiovascular science and practice. The full position paper is available in the supplementary materials.
Topics: Humans; Data Management; Reproducibility of Results; Heart; Cardiovascular System; Biomedical Research
PubMed: 37847314
DOI: 10.1007/s00392-023-02303-3