-
Sensors (Basel, Switzerland) Aug 2022The Internet of Things includes all connected objects from small embedded systems with low computational power and storage capacities to efficient ones, as well as... (Review)
Review
The Internet of Things includes all connected objects from small embedded systems with low computational power and storage capacities to efficient ones, as well as moving objects like drones and autonomous vehicles. The concept of Internet of Everything expands upon this idea by adding people, data and processing. The adoption of such systems is exploding and becoming ever more significant, bringing with it questions related to the security and the privacy of these objects. A natural solution to data integrity, confidentiality and single point of failure vulnerability is the use of blockchains. Blockchains can be used as an immutable data layer for storing information, avoiding single point of failure vulnerability via decentralization and providing strong security and cryptographic tools for IoE. However, the adoption of blockchain technology in such heterogeneous systems containing light devices presents several challenges and practical issues that need to be overcome. Indeed, most of the solutions proposed to adapt blockchains to devices with low resources confront difficulty in maintaining decentralization or security. The most interesting are probably the Layer 2 solutions, which build offchain systems strongly connected to the blockchain. Among these, zk-rollup is a promising new generation of Layer 2/off-chain schemes that can remove the last obstacles to blockchain adoption in IoT, or more generally, in IoE. By increasing the scalability and enabling rule customization while preserving the same security as the Layer 1 blockchain, zk-rollups overcome restrictions on the use of blockchains for IoE. Despite their promises illustrated by recent systems proposed by startups and private companies, very few scientific publications explaining or applying this barely-known technology have been published, especially for non-financial systems. In this context, the objective of our paper is to fill this gap for IoE systems in two steps. We first propose a synthetic review of recent proposals to improve scalability including onchain (consensus, blockchain organization, …) and offchain (sidechain, rollups) solutions and we demonstrate that zk-rollups are the most promising ones. In a second step, we focus on IoE by describing several interesting features (scalability, dynamicity, data management, …) that are illustrated with various general IoE use cases.
Topics: Blockchain; Computer Security; Confidentiality; Data Management; Humans; Privacy
PubMed: 36080950
DOI: 10.3390/s22176493 -
Journal of Healthcare Engineering 2022The use of novel medications and methods to prevent, diagnose, treat, and manage diabetes requires confirmation of safety and efficacy in a well-designed study prior to...
BACKGROUND
The use of novel medications and methods to prevent, diagnose, treat, and manage diabetes requires confirmation of safety and efficacy in a well-designed study prior to widespread adoption. Diabetes clinical trials are the studies that examine these issues. The aim of the present study was to develop a web-based system for data management in diabetes clinical trials.
METHODS
The present research was a mixed-methods study conducted in 2019. To identify the required data elements and functions to develop the system, 60 researchers completed a questionnaire. The designed system was evaluated using two methods. The usability of the system was initially evaluated by a group of researchers ( = 6) using the think-aloud method, and after system improvement, the system functions were evaluated by other researchers ( = 30) using a questionnaire.
RESULTS
The main data elements which were required to develop a case report form included "study data," "participant's personal data," and "clinical data." The functional requirements of the system were "managing the study," "creating case report forms," "data management," "data quality control," and "data security and confidentiality." After using the system, researchers rated the system functions at a "good" level (6.3 ± 0.73) on a seven-point Likert scale.
CONCLUSION
Given the complexity of the data management processes in diabetes clinical trials and the widespread use of information technologies in research, the use of clinical data management systems in diabetes clinical trials seems inevitable. The system developed in the current study can facilitate and improve the process of creating and managing case report forms as well as collecting data in diabetes clinical trials.
Topics: Clinical Trials as Topic; Data Management; Diabetes Mellitus; Humans; Research Design; Surveys and Questionnaires
PubMed: 35251579
DOI: 10.1155/2022/8421529 -
Big Data Jun 2023Big data management is a key enabling factor for enterprises that want to compete in the global market. Data coming from enterprise production processes, if properly...
Big data management is a key enabling factor for enterprises that want to compete in the global market. Data coming from enterprise production processes, if properly analyzed, can provide a boost in the enterprise management and optimization, guaranteeing faster processes, better customer management, and lower overheads/costs. Guaranteeing a proper big data pipeline is the holy grail of big data, often opposed by the difficulty of evaluating the correctness of the big data pipeline results. This problem is even worse when big data pipelines are provided as a service in the cloud, and must comply with both laws and users' requirements. To this aim, assurance techniques can complete big data pipelines, providing the means to guarantee that they behave correctly, toward the deployment of big data pipelines fully compliant with laws and users' requirements. In this article, we define an assurance solution for big data based on service-level agreements, where a semiautomatic approach supports users from the definition of the requirements to the negotiation of the terms regulating the provisioned services, and the continuous refinement thereof.
Topics: Big Data; Data Management
PubMed: 36862683
DOI: 10.1089/big.2021.0369 -
Journal of Medical Internet Research Nov 2023In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data...
BACKGROUND
In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status.
OBJECTIVE
Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data.
METHODS
In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.
RESULTS
Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.
CONCLUSIONS
The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
Topics: Humans; Data Management; Delivery of Health Care; Medical Informatics; Surveys and Questionnaires
PubMed: 37938878
DOI: 10.2196/48809 -
Clinical Research in Cardiology :... May 2024The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the... (Review)
Review
The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the progress of cardiovascular research and healthcare, and contributing to the reproducibility of research results. However, challenges remain. This position paper, written on behalf of and approved by the German Cardiac Society and German Centre for Cardiovascular Research, summarizes our current understanding of the challenges in cardiovascular research data management (RDM). These challenges include lack of time, awareness, incentives, and funding for implementing effective RDM; lack of standardization in RDM processes; a need to better identify meaningful and actionable data among the increasing volume and complexity of data being acquired; and a lack of understanding of the legal aspects of data sharing. While several tools exist to increase the degree to which data are findable, accessible, interoperable, and reusable (FAIR), more work is needed to lower the threshold for effective RDM not just in cardiovascular research but in all biomedical research, with data sharing and reuse being factored in at every stage of the scientific process. A culture of open science with FAIR research data should be fostered through education and training of early-career and established research professionals. Ultimately, FAIR RDM requires permanent, long-term effort at all levels. If outcomes can be shown to be superior and to promote better (and better value) science, modern RDM will make a positive difference to cardiovascular science and practice. The full position paper is available in the supplementary materials.
Topics: Humans; Data Management; Reproducibility of Results; Heart; Cardiovascular System; Biomedical Research
PubMed: 37847314
DOI: 10.1007/s00392-023-02303-3 -
Environmental Monitoring and Assessment Dec 2023A scientifically informed approach to decision-making is key to ensuring the sustainable management of ecosystems, especially in the light of increasing human pressure...
A scientifically informed approach to decision-making is key to ensuring the sustainable management of ecosystems, especially in the light of increasing human pressure on habitats and species. Protected areas, with their long-term institutional mandate for biodiversity conservation, play an important role as data providers, for example, through the long-term monitoring of natural resources. However, poor data management often limits the use and reuse of this wealth of information. In this paper, we share lessons learned in managing long-term data from the Italian Alpine national parks. Our analysis and examples focus on specific issues faced by managers of protected areas, which partially differ from those faced by academic researchers, predominantly owing to different mission, governance, and temporal perspectives. Rigorous data quality control, the use of appropriate data management tools, and acquisition of the necessary skills remain the main obstacles. Common protocols for data collection offer great opportunities for the future, and complete recovery and documentation of time series is an urgent priority. Notably, before data can be shared, protected areas should improve their data management systems, a task that can be achieved only with adequate resources and a long-term vision. We suggest strategies that protected areas, funding agencies, and the scientific community can embrace to address these problems. The added value of our work lies in promoting engagement with managers of protected areas and in reporting and analysing their concrete requirements and problems, thereby contributing to the ongoing discussion on data management and sharing through a bottom-up approach.
Topics: Humans; Ecosystem; Conservation of Natural Resources; Data Management; Environmental Monitoring; Biodiversity
PubMed: 38051448
DOI: 10.1007/s10661-023-11851-0 -
Journal of Pharmaceutical Sciences May 2022Recent advancements in data engineering, data science, and secure cloud storage can transform the current state of global Chemistry, Manufacturing, and Controls (CMC)... (Review)
Review
Recent advancements in data engineering, data science, and secure cloud storage can transform the current state of global Chemistry, Manufacturing, and Controls (CMC) regulatory activities to automated online digital processes. Modernizing regulatory activities will facilitate simultaneous global submissions and concurrent collaborative reviews, significantly reducing global licensing timelines and variability in globally registered product details. This article describes advancements made within the pharmaceutical industry from theoretical concepts to utilization of structured content and data in CMC submissions. The term Structured Content and Data Management (SCDM) outlines the end-to-end scientific data lifecycle from capture in source systems, aggregation into a consolidated repository, and transformation into semantically structured blocks with metadata defining relationships between scientific data and business contexts. Automation of regulatory authoring (termed Structured Content Authoring) is feasible because SCDM makes data both human and machine readable. It will offer health authorities access to the digital data beyond the current standard of PDF documents and, for a review process, SCDM would "enrich the effectiveness, efficiency, and consistency of regulatory quality oversight" (Yu et al., 2019). SCDM is a novel solution for content and data management in regulatory submissions and can enable faster access to critical therapies worldwide.
Topics: Commerce; Data Management; Drug Industry; Humans
PubMed: 34610323
DOI: 10.1016/j.xphs.2021.09.046 -
BMC Bioinformatics Nov 2023Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein-protein interactions and gene...
Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein-protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took [Formula: see text] s to extract 494,872 biclusters. In the human PPI database of size [Formula: see text], our method generates 1840 biclusters in [Formula: see text] s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.
Topics: Humans; Algorithms; Databases, Factual; Cluster Analysis; Data Management; Gene Expression Profiling
PubMed: 37974081
DOI: 10.1186/s12859-023-05534-3 -
Journal of Medical Internet Research Mar 2021This paper aims to provide a perspective on data sharing practices in the context of the COVID-19 pandemic. The scientific community has made several important inroads...
This paper aims to provide a perspective on data sharing practices in the context of the COVID-19 pandemic. The scientific community has made several important inroads in the fight against COVID-19, and there are over 2500 clinical trials registered globally. Within the context of the rapidly changing pandemic, we are seeing a large number of trials conducted without results being made available. It is likely that a plethora of trials have stopped early, not for statistical reasons but due to lack of feasibility. Trials stopped early for feasibility are, by definition, statistically underpowered and thereby prone to inconclusive findings. Statistical power is not necessarily linear with the total sample size, and even small reductions in patient numbers or events can have a substantial impact on the research outcomes. Given the profusion of clinical trials investigating identical or similar treatments across different geographical and clinical contexts, one must also consider that the likelihood of a substantial number of false-positive and false-negative trials, emerging with the increasing overall number of trials, adds to public perceptions of uncertainty. This issue is complicated further by the evolving nature of the pandemic, wherein baseline assumptions on control group risk factors used to develop sample size calculations are far more challenging than those in the case of well-documented diseases. The standard answer to these challenges during nonpandemic settings is to assess each trial for statistical power and risk-of-bias and then pool the reported aggregated results using meta-analytic approaches. This solution simply will not suffice for COVID-19. Even with random-effects meta-analysis models, it will be difficult to adjust for the heterogeneity of different trials with aggregated reported data alone, especially given the absence of common data standards and outcome measures. To date, several groups have proposed structures and partnerships for data sharing. As COVID-19 has forced reconsideration of policies, processes, and interests, this is the time to advance scientific cooperation and shift the clinical research enterprise toward a data-sharing culture to maximize our response in the service of public health.
Topics: COVID-19; Clinical Trials as Topic; Data Management; Humans; Information Dissemination; Pandemics; Research Design; SARS-CoV-2
PubMed: 33684053
DOI: 10.2196/26718 -
Journal of Medical Internet Research Nov 2021Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life... (Review)
Review
BACKGROUND
Bipolar disorder (BD) is the 10th most common cause of frailty in young individuals and has triggered morbidity and mortality worldwide. Patients with BD have a life expectancy 9 to 17 years lower than that of normal people. BD is a predominant mental disorder, but it can be misdiagnosed as depressive disorder, which leads to difficulties in treating affected patients. Approximately 60% of patients with BD are treated for depression. However, machine learning provides advanced skills and techniques for better diagnosis of BD.
OBJECTIVE
This review aims to explore the machine learning algorithms used for the detection and diagnosis of bipolar disorder and its subtypes.
METHODS
The study protocol adopted the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. We explored 3 databases, namely Google Scholar, ScienceDirect, and PubMed. To enhance the search, we performed backward screening of all the references of the included studies. Based on the predefined selection criteria, 2 levels of screening were performed: title and abstract review, and full review of the articles that met the inclusion criteria. Data extraction was performed independently by all investigators. To synthesize the extracted data, a narrative synthesis approach was followed.
RESULTS
We retrieved 573 potential articles were from the 3 databases. After preprocessing and screening, only 33 articles that met our inclusion criteria were identified. The most commonly used data belonged to the clinical category (19, 58%). We identified different machine learning models used in the selected studies, including classification models (18, 55%), regression models (5, 16%), model-based clustering methods (2, 6%), natural language processing (1, 3%), clustering algorithms (1, 3%), and deep learning-based models (3, 9%). Magnetic resonance imaging data were most commonly used for classifying bipolar patients compared to other groups (11, 34%), whereas microarray expression data sets and genomic data were the least commonly used. The maximum ratio of accuracy was 98%, whereas the minimum accuracy range was 64%.
CONCLUSIONS
This scoping review provides an overview of recent studies based on machine learning models used to diagnose patients with BD regardless of their demographics or if they were compared to patients with psychiatric diagnoses. Further research can be conducted to provide clinical decision support in the health industry.
Topics: Algorithms; Bipolar Disorder; Data Management; Humans; Machine Learning; Natural Language Processing
PubMed: 34806996
DOI: 10.2196/29749