-
Journal of Chemical Information and... Jul 2023A great advantage of computational research is its reproducibility and reusability. However, an enormous amount of computational research data in heterogeneous catalysis...
A great advantage of computational research is its reproducibility and reusability. However, an enormous amount of computational research data in heterogeneous catalysis is barricaded due to logistical limitations. Sufficient provenance and characterization of data and computational environment, with uniform organization and easy accessibility, can allow the development of software tools for integration across the multiscale modeling workflow. Here, we develop the Chemical Kinetics Database, CKineticsDB, a state-of-the-art datahub for multiscale modeling, designed to be compliant with the FAIR guiding principles for scientific data management. CKineticsDB utilizes a MongoDB back-end for extensibility and adaptation to varying data formats, with a referencing-based data model to reduce redundancy in storage. We have developed a Python software program for data processing operations and with built-in features to extract data for common applications. CKineticsDB evaluates the incoming data for quality and uniformity, retains curated information from simulations, enables accurate regeneration of publication results, optimizes storage, and allows the selective retrieval of files based on domain-relevant catalyst and simulation parameters. CKineticsDB provides data from multiple scales of theory (ab initio calculations, thermochemistry, and microkinetic models) to accelerate the development of new reaction pathways, kinetic analysis of reaction mechanisms, and catalysis discovery, along with several data-driven applications.
Topics: Data Management; Kinetics; Reproducibility of Results; Software
PubMed: 37436913
DOI: 10.1021/acs.jcim.3c00123 -
The Annals of Thoracic Surgery Nov 2019
Topics: Data Management; Humans; Surgeons
PubMed: 31653295
DOI: 10.1016/j.athoracsur.2019.04.076 -
PloS One 2022Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while...
Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.
Topics: Computer Simulation; Data Management; Ecosystem; Reproducibility of Results; Software; Workflow
PubMed: 35271613
DOI: 10.1371/journal.pone.0264492 -
Journal of Integrative Bioinformatics Dec 2022Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and...
Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and maintain instruments in the range of ten to a hundred, which produce large amounts of data and serve thousands of active projects and customers. Particular emphasis has to be given to the reproducibility of the results. More and more, the entire process from building the research hypothesis, conducting the experiments, doing the measurements, through the data explorations and analysis is solely driven by very few experts in various scientific fields. Still, the ability to perform the entire data exploration in real-time on a personal computer is often hampered by the heterogeneity of software, the data structure formats of the output, and the enormous data sizes. These impact the design and architecture of the implemented software stack. At the Functional Genomics Center Zurich (FGCZ), a joint state-of-the-art research and training facility of ETH Zurich and the University of Zurich, we have developed the B-Fabric system, which has served for more than a decade, an entire life sciences community with fundamental data science support. In this paper, we sketch how such a system can be used to glue together data (including metadata), computing infrastructures (clusters and clouds), and visualization software to support instant data exploration and visual analysis. We illustrate our in-daily life implemented approach using visualization applications of mass spectrometry data.
Topics: Data Management; Reproducibility of Results; Software; Genomics
PubMed: 36073980
DOI: 10.1515/jib-2022-0031 -
Biological Chemistry Apr 2023While the FAIR (indable, ccessible, nteroperable, and e-usable) principles are well accepted in the scientific community, there are still many challenges in implementing...
While the FAIR (indable, ccessible, nteroperable, and e-usable) principles are well accepted in the scientific community, there are still many challenges in implementing them in the day-to-day scientific process. Data management of microscopy images poses special challenges due to the volume, variety, and many proprietary formats. In particular, appropriate metadata collection, a basic requirement for FAIR data, is a real challenge for scientists due to the technical and content-related aspects. Researchers benefit here from interdisciplinary research network with centralized data management. The typically multimodal structure requires generalized data management and the corresponding acquisition of metadata. Here we report on the establishment of an appropriate infrastructure for the research network by a Core Facility and the development and integration of a software tool MDEmic that allows easy and convenient processing of metadata of microscopy images while providing high flexibility in terms of customization of metadata sets. Since it is also in the interest of the core facility to apply standards regarding the scope and serialization formats to realize successful and sustainable data management for bioimaging, we report on our efforts within the community to define standards in metadata, interfaces, and to reduce the barriers of daily data management.
Topics: Data Management; Software; Metadata
PubMed: 36853922
DOI: 10.1515/hsz-2022-0304 -
GigaScience Dec 2022The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience...
The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience data grows with each advance in data acquisition techniques and research methods. To maximize the impact of diverse research strategies, multidisciplinary, large-scale neuroscience research consortia face a number of unsolved challenges in RDM. While open science principles are largely accepted, it is practically difficult for researchers to prioritize RDM over other pressing demands. The implementation of a coherent, executable RDM plan for consortia spanning animal, human, and clinical studies is becoming increasingly challenging. Here, we present an RDM strategy implemented for the Heidelberg Collaborative Research Consortium. Our consortium combines basic and clinical research in diverse populations (animals and humans) and produces highly heterogeneous and multimodal research data (e.g., neurophysiology, neuroimaging, genetics, behavior). We present a concrete strategy for initiating early-stage RDM and FAIR data generation for large-scale collaborative research consortia, with a focus on sustainable solutions that incentivize incremental RDM while respecting research-specific requirements.
Topics: Animals; Humans; Data Management; Neuroimaging; Research Personnel
PubMed: 37401720
DOI: 10.1093/gigascience/giad049 -
Physiological Reviews Jul 2024Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable... (Review)
Review
Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.
Topics: Biomedical Research; Information Dissemination; Humans; Animals; Data Management
PubMed: 38451234
DOI: 10.1152/physrev.00043.2023 -
Therapeutic Innovation & Regulatory... Sep 2021The causes, degree and disruptive nature of mid-study database updates and other pain points were evaluated to understand if and how the clinical data management...
BACKGROUND
The causes, degree and disruptive nature of mid-study database updates and other pain points were evaluated to understand if and how the clinical data management function is managing rapid growth in data volume and diversity.
METHODS
Tufts Center for the Study of Drug Development (Tufts CSDD)-in collaboration with IBM Watson Health-conducted an online global survey between September and October 2020.
RESULTS
One hundred ninety four verified responses were analyzed. Planned and unplanned mid-study updates were the top challenges mentioned and their management was time intensive. Respondents reported an average of 4.1 planned and 3.7 unplanned mid-study updates per clinical trial.
CONCLUSION
Mid-study database updates are disruptive and present a major opportunity to accelerate cycle times and improve efficiency, particularly as protocol designs become more flexible and the diversity of data, most notably unstructured data, increases.
Topics: Data Management; Drug Development; Humans; Pain; Surveys and Questionnaires
PubMed: 33963525
DOI: 10.1007/s43441-021-00301-z -
Trials Mar 2022Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in...
BACKGROUND
Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in clinical trials. Lack of appropriate planning for data management in clinical trials may negatively influence achieving the desired results. The aim of this study was to explore data management processes in diabetes clinical trials in three research institutes in Iran.
METHOD
This was a qualitative study conducted in 2019. In this study, data were collected through in-depth semi-structured interviews with 16 researchers in three endocrinology and metabolism research institutes. To analyze data, the method of thematic analysis was used.
RESULTS
The five themes that emerged from data analysis included (1) clinical trial data collection, (2) technologies used in data management, (3) data security and confidentiality management, (4) data quality management, and (5) data management standards. In general, the findings indicated that no clear and standard process was used for data management in diabetes clinical trials, and each research center executed its own methods and processes.
CONCLUSION
According to the results, the common methods of data management in diabetes clinical trials included a set of paper-based processes. It seems that using information technology can help facilitate data management processes in a variety of clinical trials, including diabetes clinical trials.
Topics: Data Management; Diabetes Mellitus; Humans; Iran; Qualitative Research; Research Personnel
PubMed: 35241149
DOI: 10.1186/s13063-022-06110-5 -
F1000Research 2022: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR...
: Knowing the needs of the bioimaging community with respect to research data management (RDM) is essential for identifying measures that enable adoption of the FAIR (findable, accessible, interoperable, reusable) principles for microscopy and bioimage analysis data across disciplines. As an initiative within Germany's National Research Data Infrastructure, we conducted this community survey in summer 2021 to assess the state of the art of bioimaging RDM and the community needs. : An online survey was conducted with a mixed question-type design. We created a questionnaire tailored to relevant topics of the bioimaging community, including specific questions on bioimaging methods and bioimage analysis, as well as more general questions on RDM principles and tools. 203 survey entries were included in the analysis covering the perspectives from various life and biomedical science disciplines and from participants at different career levels. : The results highlight the importance and value of bioimaging RDM and data sharing. However, the practical implementation of FAIR practices is impeded by technical hurdles, lack of knowledge, and insecurity about the legal aspects of data sharing. The survey participants request metadata guidelines and annotation tools and endorse the usage of image data management platforms. At present, OMERO (Open Microscopy Environment Remote Objects) is the best known and most widely used platform. Most respondents rely on image processing and analysis, which they regard as the most time-consuming step of the bioimage data workflow. While knowledge about and implementation of electronic lab notebooks and data management plans is limited, respondents acknowledge their potential value for data handling and publication. : The bioimaging community acknowledges and endorses the value of RDM and data sharing. Still, there is a need for information, guidance, and standardization to foster the adoption of FAIR data handling. This survey may help inspiring targeted measures to close this gap.
Topics: Humans; Data Management; Metadata; Information Dissemination; Surveys and Questionnaires; Workflow
PubMed: 36405555
DOI: 10.12688/f1000research.121714.2