data management - OpenMD.com Journal Search

CKineticsDB─An Extensible and FAIR Data Management Framework and Datahub for Multiscale Modeling in Heterogeneous Catalysis.

Journal of Chemical Information and... Jul 2023

A great advantage of computational research is its reproducibility and reusability. However, an enormous amount of computational research data in heterogeneous catalysis...

Summary PubMed

Authors: Siddhant M Lambor, Sashank Kasiraju, Dionisios G Vlachos...

A great advantage of computational research is its reproducibility and reusability. However, an enormous amount of computational research data in heterogeneous catalysis is barricaded due to logistical limitations. Sufficient provenance and characterization of data and computational environment, with uniform organization and easy accessibility, can allow the development of software tools for integration across the multiscale modeling workflow. Here, we develop the Chemical Kinetics Database, CKineticsDB, a state-of-the-art datahub for multiscale modeling, designed to be compliant with the FAIR guiding principles for scientific data management. CKineticsDB utilizes a MongoDB back-end for extensibility and adaptation to varying data formats, with a referencing-based data model to reduce redundancy in storage. We have developed a Python software program for data processing operations and with built-in features to extract data for common applications. CKineticsDB evaluates the incoming data for quality and uniformity, retains curated information from simulations, enables accurate regeneration of publication results, optimizes storage, and allows the selective retrieval of files based on domain-relevant catalyst and simulation parameters. CKineticsDB provides data from multiple scales of theory (ab initio calculations, thermochemistry, and microkinetic models) to accelerate the development of new reaction pathways, kinetic analysis of reaction mechanisms, and catalysis discovery, along with several data-driven applications.

Topics: Data Management; Kinetics; Reproducibility of Results; Software

PubMed: 37436913
DOI: 10.1021/acs.jcim.3c00123

Reply.

The Annals of Thoracic Surgery Nov 2019

Summary PubMed

Authors: B Payne Stanifer, Malcolm M DeCamp

Topics: Data Management; Humans; Surgeons

PubMed: 31653295
DOI: 10.1016/j.athoracsur.2019.04.076

Sim2Ls: FAIR simulation workflows and data.

PloS One 2022

Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while...

Summary PubMed Full Text PDF

Authors: Martin Hunt, Steven Clark, Daniel Mejia...

Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.

Topics: Computer Simulation; Data Management; Ecosystem; Reproducibility of Results; Software; Workflow

PubMed: 35271613
DOI: 10.1371/journal.pone.0264492

Bridging data management platforms and visualization tools to enable ad-hoc and smart analytics in life sciences.

Journal of Integrative Bioinformatics Dec 2022

Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and...

Summary PubMed Full Text PDF

Authors: Christian Panse, Christian Trachsel, Can Türker...

Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and maintain instruments in the range of ten to a hundred, which produce large amounts of data and serve thousands of active projects and customers. Particular emphasis has to be given to the reproducibility of the results. More and more, the entire process from building the research hypothesis, conducting the experiments, doing the measurements, through the data explorations and analysis is solely driven by very few experts in various scientific fields. Still, the ability to perform the entire data exploration in real-time on a personal computer is often hampered by the heterogeneity of software, the data structure formats of the output, and the enormous data sizes. These impact the design and architecture of the implemented software stack. At the Functional Genomics Center Zurich (FGCZ), a joint state-of-the-art research and training facility of ETH Zurich and the University of Zurich, we have developed the B-Fabric system, which has served for more than a decade, an entire life sciences community with fundamental data science support. In this paper, we sketch how such a system can be used to glue together data (including metadata), computing infrastructures (clusters and clouds), and visualization software to support instant data exploration and visual analysis. We illustrate our in-daily life implemented approach using visualization applications of mass spectrometry data.

Topics: Data Management; Reproducibility of Results; Software; Genomics

PubMed: 36073980
DOI: 10.1515/jib-2022-0031

Setting up a data management infrastructure for bioimaging.

Biological Chemistry Apr 2023

While the FAIR (indable, ccessible, nteroperable, and e-usable) principles are well accepted in the scientific community, there are still many challenges in implementing...

Summary PubMed

Authors: Susanne Kunis, Karen Bernhardt, Michael Hensel...

While the FAIR (indable, ccessible, nteroperable, and e-usable) principles are well accepted in the scientific community, there are still many challenges in implementing them in the day-to-day scientific process. Data management of microscopy images poses special challenges due to the volume, variety, and many proprietary formats. In particular, appropriate metadata collection, a basic requirement for FAIR data, is a real challenge for scientists due to the technical and content-related aspects. Researchers benefit here from interdisciplinary research network with centralized data management. The typically multimodal structure requires generalized data management and the corresponding acquisition of metadata. Here we report on the establishment of an appropriate infrastructure for the research network by a Core Facility and the development and integration of a software tool MDEmic that allows easy and convenient processing of metadata of microscopy images while providing high flexibility in terms of customization of metadata sets. Since it is also in the interest of the core facility to apply standards regarding the scope and serialization formats to realize successful and sustainable data management for bioimaging, we report on our efforts within the community to define standards in metadata, interfaces, and to reduce the barriers of daily data management.

Topics: Data Management; Software; Metadata

PubMed: 36853922
DOI: 10.1515/hsz-2022-0304

Data management strategy for a collaborative research center.

GigaScience Dec 2022

The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience...

Summary PubMed Full Text PDF

Authors: Deepti Mittal, Rebecca Mease, Thomas Kuner...

The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience data grows with each advance in data acquisition techniques and research methods. To maximize the impact of diverse research strategies, multidisciplinary, large-scale neuroscience research consortia face a number of unsolved challenges in RDM. While open science principles are largely accepted, it is practically difficult for researchers to prioritize RDM over other pressing demands. The implementation of a coherent, executable RDM plan for consortia spanning animal, human, and clinical studies is becoming increasingly challenging. Here, we present an RDM strategy implemented for the Heidelberg Collaborative Research Consortium. Our consortium combines basic and clinical research in diverse populations (animals and humans) and produces highly heterogeneous and multimodal research data (e.g., neurophysiology, neuroimaging, genetics, behavior). We present a concrete strategy for initiating early-stage RDM and FAIR data generation for large-scale collaborative research consortia, with a focus on sustainable solutions that incentivize incremental RDM while respecting research-specific requirements.

Topics: Animals; Humans; Data Management; Neuroimaging; Research Personnel

PubMed: 37401720
DOI: 10.1093/gigascience/giad049

Best practices for data management and sharing in experimental biomedical research.

Physiological Reviews Jul 2024

Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable... (Review)

Summary PubMed

Review

Authors: Teresa Cunha-Oliveira, John P A Ioannidis, Paulo J Oliveira...

Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.

Topics: Biomedical Research; Information Dissemination; Humans; Animals; Data Management

PubMed: 38451234
DOI: 10.1152/physrev.00043.2023

Characterizing Pain Points in Clinical Data Management and Assessing the Impact of Mid-Study Updates.

Therapeutic Innovation & Regulatory... Sep 2021

The causes, degree and disruptive nature of mid-study database updates and other pain points were evaluated to understand if and how the clinical data management...

Summary PubMed Full Text PDF

Authors: Beth Harper, Zachary Smith, Jane Snowdon...

BACKGROUND

The causes, degree and disruptive nature of mid-study database updates and other pain points were evaluated to understand if and how the clinical data management function is managing rapid growth in data volume and diversity.

METHODS

Tufts Center for the Study of Drug Development (Tufts CSDD)-in collaboration with IBM Watson Health-conducted an online global survey between September and October 2020.

RESULTS

One hundred ninety four verified responses were analyzed. Planned and unplanned mid-study updates were the top challenges mentioned and their management was time intensive. Respondents reported an average of 4.1 planned and 3.7 unplanned mid-study updates per clinical trial.

CONCLUSION

Mid-study database updates are disruptive and present a major opportunity to accelerate cycle times and improve efficiency, particularly as protocol designs become more flexible and the diversity of data, most notably unstructured data, increases.

Topics: Data Management; Drug Development; Humans; Pain; Surveys and Questionnaires

PubMed: 33963525
DOI: 10.1007/s43441-021-00301-z

Data management in diabetes clinical trials: a qualitative study.

Trials Mar 2022

Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in...

Summary PubMed Full Text PDF

Authors: Aynaz Nourani, Haleh Ayatollahi, Masoud Solaymani Dodaran...

BACKGROUND

Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in clinical trials. Lack of appropriate planning for data management in clinical trials may negatively influence achieving the desired results. The aim of this study was to explore data management processes in diabetes clinical trials in three research institutes in Iran.

METHOD

This was a qualitative study conducted in 2019. In this study, data were collected through in-depth semi-structured interviews with 16 researchers in three endocrinology and metabolism research institutes. To analyze data, the method of thematic analysis was used.

RESULTS

The five themes that emerged from data analysis included (1) clinical trial data collection, (2) technologies used in data management, (3) data security and confidentiality management, (4) data quality management, and (5) data management standards. In general, the findings indicated that no clear and standard process was used for data management in diabetes clinical trials, and each research center executed its own methods and processes.

CONCLUSION

According to the results, the common methods of data management in diabetes clinical trials included a set of paper-based processes. It seems that using information technology can help facilitate data management processes in a variety of clinical trials, including diabetes clinical trials.

Topics: Data Management; Diabetes Mellitus; Humans; Iran; Qualitative Research; Research Personnel

PubMed: 35241149
DOI: 10.1186/s13063-022-06110-5

Research data management for bioimaging: the 2021 NFDI4BIOIMAGE community survey.

F1000Research 2022

: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR...

Summary PubMed Full Text PDF

Authors: Christian Schmidt, Janina Hanne, Josh Moore...

: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR (findable, accessible, interoperable, reusable) principles for microscopy and bioimage analysis data across disciplines. As an initiative within Germany's National Research Data Infrastructure, we conducted this community survey in summer 2021 to assess the state of the art of bioimaging RDM and the community needs. : An online survey was conducted with a mixed question-type design. We created a questionnaire tailored to relevant topics of the bioimaging community, including specific questions on bioimaging methods and bioimage analysis, as well as more general questions on RDM principles and tools. 203 survey entries were included in the analysis covering the perspectives from various life and biomedical science disciplines and from participants at different career levels. : The results highlight the importance and value of bioimaging RDM and data sharing. However, the practical implementation of FAIR practices is impeded by technical hurdles, lack of knowledge, and insecurity about the legal aspects of data sharing. The survey participants request metadata guidelines and annotation tools and endorse the usage of image data management platforms. At present, OMERO (Open Microscopy Environment Remote Objects) is the best known and most widely used platform. Most respondents rely on image processing and analysis, which they regard as the most time-consuming step of the bioimage data workflow. While knowledge about and implementation of electronic lab notebooks and data management plans is limited, respondents acknowledge their potential value for data handling and publication. : The bioimaging community acknowledges and endorses the value of RDM and data sharing. Still, there is a need for information, guidance, and standardization to foster the adoption of FAIR data handling. This survey may help inspiring targeted measures to close this gap.

Topics: Humans; Data Management; Metadata; Information Dissemination; Surveys and Questionnaires; Workflow

PubMed: 36405555
DOI: 10.12688/f1000research.121714.2