-
F1000Research 2022: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR...
: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR (findable, accessible, interoperable, reusable) principles for microscopy and bioimage analysis data across disciplines. As an initiative within Germany's National Research Data Infrastructure, we conducted this community survey in summer 2021 to assess the state of the art of bioimaging RDM and the community needs. : An online survey was conducted with a mixed question-type design. We created a questionnaire tailored to relevant topics of the bioimaging community, including specific questions on bioimaging methods and bioimage analysis, as well as more general questions on RDM principles and tools. 203 survey entries were included in the analysis covering the perspectives from various life and biomedical science disciplines and from participants at different career levels. : The results highlight the importance and value of bioimaging RDM and data sharing. However, the practical implementation of FAIR practices is impeded by technical hurdles, lack of knowledge, and insecurity about the legal aspects of data sharing. The survey participants request metadata guidelines and annotation tools and endorse the usage of image data management platforms. At present, OMERO (Open Microscopy Environment Remote Objects) is the best known and most widely used platform. Most respondents rely on image processing and analysis, which they regard as the most time-consuming step of the bioimage data workflow. While knowledge about and implementation of electronic lab notebooks and data management plans is limited, respondents acknowledge their potential value for data handling and publication. : The bioimaging community acknowledges and endorses the value of RDM and data sharing. Still, there is a need for information, guidance, and standardization to foster the adoption of FAIR data handling. This survey may help inspiring targeted measures to close this gap.
Topics: Humans; Data Management; Metadata; Information Dissemination; Surveys and Questionnaires; Workflow
PubMed: 36405555
DOI: 10.12688/f1000research.121714.2 -
American Journal of Biological... Nov 2022Previous research has shown that while missing data are common in bioarchaeological studies, they are seldom handled using statistically rigorous methods. The primary...
OBJECTIVES
Previous research has shown that while missing data are common in bioarchaeological studies, they are seldom handled using statistically rigorous methods. The primary objective of this article is to evaluate the ability of imputation to manage missing data and encourage the use of advanced statistical methods in bioarchaeology and paleopathology. An overview of missing data management in biological anthropology is provided, followed by a test of imputation and deletion methods for handling missing data.
MATERIALS AND METHODS
Missing data were simulated on complete datasets of ordinal (n = 287) and continuous (n = 369) bioarchaeological data. Missing values were imputed using five imputation methods (mean, predictive mean matching, random forest, expectation maximization, and stochastic regression) and the success of each at obtaining the parameters of the original dataset compared with pairwise and listwise deletion.
RESULTS
In all instances, listwise deletion was least successful at approximating the original parameters. Imputation of continuous data was more effective than ordinal data. Overall, no one method performed best and the amount of missing data proved a stronger predictor of imputation success.
DISCUSSION
These findings support the use of imputation methods over deletion for handling missing bioarchaeological and paleopathology data, especially when the data are continuous. Whereas deletion methods reduce sample size, imputation maintains sample size, improving statistical power and preventing bias from being introduced into the dataset.
Topics: Archaeology; Sample Size; Research Design; Data Management; Bias
PubMed: 36790608
DOI: 10.1002/ajpa.24614 -
BMJ Open Aug 2022This article aims to measure the willingness of the Swiss public to participate in personalised health research, and their preferences regarding data management and...
OBJECTIVES
This article aims to measure the willingness of the Swiss public to participate in personalised health research, and their preferences regarding data management and governance.
SETTING
Results are presented from a nationwide survey of members of the Swiss public.
PARTICIPANTS
15 106 randomly selected Swiss residents received the survey in September 2019. The response rate was 34.1% (n=5156). Respondent age ranged from 18 to 79 years, with fairly uniform spread across sex and age categories between 25 and 64 years.
PRIMARY AND SECONDARY OUTCOME MEASURES
Willingness to participate in personalised health research and opinions regarding data management and governance.
RESULTS
Most respondents preferred to be contacted and reconsented for each new project using their data (39%, 95% CI: 37.4% to 40.7%), or stated that their preference depends on the project type (29.4%, 95% CI: 27.9% to 31%). Additionally, a majority (52%, 95% CI: 50.3% to 53.8%) preferred their data or samples be stored anonymously or in coded form (43.4%, 95% CI: 41.7% to 45.1%). Of those who preferred that their data be anonymised, most also indicated a wish to be recontacted for each new project (36.8%, 95% CI: 34.5% to 39.2%); however, these preferences are in conflict. Most respondents desired to personally own their data. Finally, most Swiss respondents trust their doctors, along with researchers at universities, to protect their data.
CONCLUSION
Insight into public preference can enable Swiss biobanks and research institutions to create management and governance strategies that match the expectations and preferences of potential participants. Models allowing participants to choose how to interact with the process, while more complex, may increase individual willingness to provide data to biobanks.
Topics: Adolescent; Adult; Aged; Biological Specimen Banks; Data Management; Humans; Middle Aged; Surveys and Questionnaires; Switzerland; Trust; Young Adult
PubMed: 36028266
DOI: 10.1136/bmjopen-2022-060844 -
Therapeutic Innovation & Regulatory... Sep 2021The causes, degree and disruptive nature of mid-study database updates and other pain points were evaluated to understand if and how the clinical data management...
BACKGROUND
The causes, degree and disruptive nature of mid-study database updates and other pain points were evaluated to understand if and how the clinical data management function is managing rapid growth in data volume and diversity.
METHODS
Tufts Center for the Study of Drug Development (Tufts CSDD)-in collaboration with IBM Watson Health-conducted an online global survey between September and October 2020.
RESULTS
One hundred ninety four verified responses were analyzed. Planned and unplanned mid-study updates were the top challenges mentioned and their management was time intensive. Respondents reported an average of 4.1 planned and 3.7 unplanned mid-study updates per clinical trial.
CONCLUSION
Mid-study database updates are disruptive and present a major opportunity to accelerate cycle times and improve efficiency, particularly as protocol designs become more flexible and the diversity of data, most notably unstructured data, increases.
Topics: Data Management; Drug Development; Humans; Pain; Surveys and Questionnaires
PubMed: 33963525
DOI: 10.1007/s43441-021-00301-z -
Computer Methods and Programs in... Nov 2021In the last decade, clinical trial management systems have become an essential support tool for data management and analysis in clinical research. However, these...
BACKGROUND AND OBJECTIVES
In the last decade, clinical trial management systems have become an essential support tool for data management and analysis in clinical research. However, these clinical tools have design limitations, since they are currently not able to cover the needs of adaptation to the continuous changes in the practice of the trials due to the heterogeneous and dynamic nature of the clinical research data. These systems are usually proprietary solutions provided by vendors for specific tasks. In this work, we propose FIMED, a software solution for the flexible management of clinical data from multiple trials, moving towards personalized medicine, which can contribute positively by improving clinical researchers quality and ease in clinical trials.
METHODS
This tool allows a dynamic and incremental design of patients' profiles in the context of clinical trials, providing a flexible user interface that hides the complexity of using databases. Clinical researchers will be able to define personalized data schemas according to their needs and clinical study specifications. Thus, FIMED allows the incorporation of separate clinical data analysis from multiple trials.
RESULTS
The efficiency of the software has been demonstrated by a real-world use case for a clinical assay in Melanoma disease, which has been indeed anonymized to provide a user demonstration. FIMED currently provides three data analysis and visualization components, guaranteeing a clinical exploration for gene expression data: heatmap visualization, clusterheatmap visualization, as well as gene regulatory network inference and visualization. An instance of this tool is freely available on the web at https://khaos.uma.es/fimed. It can be accessed with a demo user account, "researcher", using the password "demo".
CONCLUSION
This paper shows FIMED as a flexible and user-friendly way of managing multidimensional clinical research data. Hence, without loss of generality, FIMED is flexible enough to be used in the context of any other disease where clinical data and assays are involved.
Topics: Data Management; Databases, Factual; Gene Regulatory Networks; Humans; Internet; Software; User-Computer Interface
PubMed: 34740063
DOI: 10.1016/j.cmpb.2021.106496 -
Journal of Chemical Information and... Jan 2022Projects in chemo- and bioinformatics often consist of scattered data in various types and are difficult to access in a meaningful way for efficient data analysis. Data...
Projects in chemo- and bioinformatics often consist of scattered data in various types and are difficult to access in a meaningful way for efficient data analysis. Data is usually too diverse to be even manipulated effectively. Sdfconf is data manipulation and analysis software to address this problem in a logical and robust manner. Other software commonly used for such tasks are either not designed with molecular and/or conformational data in mind or provide only a narrow set of tasks to be accomplished. Furthermore, many tools are only available within commercial software packages. Sdfconf is a flexible, robust, and free-of-charge tool for linking data from various sources for meaningful and efficient manipulation and analysis of molecule data sets. Sdfconf packages molecular structures and metadata into a complete ensemble, from which one can access both the whole data set and individual molecules and/or conformations. In this software note, we offer some practical examples of the utilization of sdfconf.
Topics: Computational Biology; Data Analysis; Data Management; Software
PubMed: 34932340
DOI: 10.1021/acs.jcim.1c01051 -
Sensors (Basel, Switzerland) Aug 2022The Internet of Things includes all connected objects from small embedded systems with low computational power and storage capacities to efficient ones, as well as... (Review)
Review
The Internet of Things includes all connected objects from small embedded systems with low computational power and storage capacities to efficient ones, as well as moving objects like drones and autonomous vehicles. The concept of Internet of Everything expands upon this idea by adding people, data and processing. The adoption of such systems is exploding and becoming ever more significant, bringing with it questions related to the security and the privacy of these objects. A natural solution to data integrity, confidentiality and single point of failure vulnerability is the use of blockchains. Blockchains can be used as an immutable data layer for storing information, avoiding single point of failure vulnerability via decentralization and providing strong security and cryptographic tools for IoE. However, the adoption of blockchain technology in such heterogeneous systems containing light devices presents several challenges and practical issues that need to be overcome. Indeed, most of the solutions proposed to adapt blockchains to devices with low resources confront difficulty in maintaining decentralization or security. The most interesting are probably the Layer 2 solutions, which build offchain systems strongly connected to the blockchain. Among these, zk-rollup is a promising new generation of Layer 2/off-chain schemes that can remove the last obstacles to blockchain adoption in IoT, or more generally, in IoE. By increasing the scalability and enabling rule customization while preserving the same security as the Layer 1 blockchain, zk-rollups overcome restrictions on the use of blockchains for IoE. Despite their promises illustrated by recent systems proposed by startups and private companies, very few scientific publications explaining or applying this barely-known technology have been published, especially for non-financial systems. In this context, the objective of our paper is to fill this gap for IoE systems in two steps. We first propose a synthetic review of recent proposals to improve scalability including onchain (consensus, blockchain organization, …) and offchain (sidechain, rollups) solutions and we demonstrate that zk-rollups are the most promising ones. In a second step, we focus on IoE by describing several interesting features (scalability, dynamicity, data management, …) that are illustrated with various general IoE use cases.
Topics: Blockchain; Computer Security; Confidentiality; Data Management; Humans; Privacy
PubMed: 36080950
DOI: 10.3390/s22176493 -
Big Data Jun 2023Big data management is a key enabling factor for enterprises that want to compete in the global market. Data coming from enterprise production processes, if properly...
Big data management is a key enabling factor for enterprises that want to compete in the global market. Data coming from enterprise production processes, if properly analyzed, can provide a boost in the enterprise management and optimization, guaranteeing faster processes, better customer management, and lower overheads/costs. Guaranteeing a proper big data pipeline is the holy grail of big data, often opposed by the difficulty of evaluating the correctness of the big data pipeline results. This problem is even worse when big data pipelines are provided as a service in the cloud, and must comply with both laws and users' requirements. To this aim, assurance techniques can complete big data pipelines, providing the means to guarantee that they behave correctly, toward the deployment of big data pipelines fully compliant with laws and users' requirements. In this article, we define an assurance solution for big data based on service-level agreements, where a semiautomatic approach supports users from the definition of the requirements to the negotiation of the terms regulating the provisioned services, and the continuous refinement thereof.
Topics: Big Data; Data Management
PubMed: 36862683
DOI: 10.1089/big.2021.0369 -
Journal of Medical Internet Research Nov 2023In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data...
BACKGROUND
In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status.
OBJECTIVE
Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data.
METHODS
In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.
RESULTS
Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.
CONCLUSIONS
The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality.
Topics: Humans; Data Management; Delivery of Health Care; Medical Informatics; Surveys and Questionnaires
PubMed: 37938878
DOI: 10.2196/48809 -
Clinical Research in Cardiology :... May 2024The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the... (Review)
Review
The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the progress of cardiovascular research and healthcare, and contributing to the reproducibility of research results. However, challenges remain. This position paper, written on behalf of and approved by the German Cardiac Society and German Centre for Cardiovascular Research, summarizes our current understanding of the challenges in cardiovascular research data management (RDM). These challenges include lack of time, awareness, incentives, and funding for implementing effective RDM; lack of standardization in RDM processes; a need to better identify meaningful and actionable data among the increasing volume and complexity of data being acquired; and a lack of understanding of the legal aspects of data sharing. While several tools exist to increase the degree to which data are findable, accessible, interoperable, and reusable (FAIR), more work is needed to lower the threshold for effective RDM not just in cardiovascular research but in all biomedical research, with data sharing and reuse being factored in at every stage of the scientific process. A culture of open science with FAIR research data should be fostered through education and training of early-career and established research professionals. Ultimately, FAIR RDM requires permanent, long-term effort at all levels. If outcomes can be shown to be superior and to promote better (and better value) science, modern RDM will make a positive difference to cardiovascular science and practice. The full position paper is available in the supplementary materials.
Topics: Humans; Data Management; Reproducibility of Results; Heart; Cardiovascular System; Biomedical Research
PubMed: 37847314
DOI: 10.1007/s00392-023-02303-3