-
Journal of Pharmaceutical Sciences May 2022Recent advancements in data engineering, data science, and secure cloud storage can transform the current state of global Chemistry, Manufacturing, and Controls (CMC)... (Review)
Review
Recent advancements in data engineering, data science, and secure cloud storage can transform the current state of global Chemistry, Manufacturing, and Controls (CMC) regulatory activities to automated online digital processes. Modernizing regulatory activities will facilitate simultaneous global submissions and concurrent collaborative reviews, significantly reducing global licensing timelines and variability in globally registered product details. This article describes advancements made within the pharmaceutical industry from theoretical concepts to utilization of structured content and data in CMC submissions. The term Structured Content and Data Management (SCDM) outlines the end-to-end scientific data lifecycle from capture in source systems, aggregation into a consolidated repository, and transformation into semantically structured blocks with metadata defining relationships between scientific data and business contexts. Automation of regulatory authoring (termed Structured Content Authoring) is feasible because SCDM makes data both human and machine readable. It will offer health authorities access to the digital data beyond the current standard of PDF documents and, for a review process, SCDM would "enrich the effectiveness, efficiency, and consistency of regulatory quality oversight" (Yu et al., 2019). SCDM is a novel solution for content and data management in regulatory submissions and can enable faster access to critical therapies worldwide.
Topics: Commerce; Data Management; Drug Industry; Humans
PubMed: 34610323
DOI: 10.1016/j.xphs.2021.09.046 -
Sensors (Basel, Switzerland) Jun 2021Pipelines play an important role in the national/international transportation of natural gas, petroleum products, and other energy resources. Pipelines are set up in... (Review)
Review
Pipelines play an important role in the national/international transportation of natural gas, petroleum products, and other energy resources. Pipelines are set up in different environments and consequently suffer various damage challenges, such as environmental electrochemical reaction, welding defects, and external force damage, etc. Defects like metal loss, pitting, and cracks destroy the pipeline's integrity and cause serious safety issues. This should be prevented before it occurs to ensure the safe operation of the pipeline. In recent years, different non-destructive testing (NDT) methods have been developed for in-line pipeline inspection. These are magnetic flux leakage (MFL) testing, ultrasonic testing (UT), electromagnetic acoustic technology (EMAT), eddy current testing (EC). Single modality or different kinds of integrated NDT system named Pipeline Inspection Gauge (PIG) or un-piggable robotic inspection systems have been developed. Moreover, data management in conjunction with historic data for condition-based pipeline maintenance becomes important as well. In this study, various inspection methods in association with non-destructive testing are investigated. The state of the art of PIGs, un-piggable robots, as well as instrumental applications, are systematically compared. Furthermore, data models and management are utilized for defect quantification, classification, failure prediction and maintenance. Finally, the challenges, problems, and development trends of pipeline inspection as well as data management are derived and discussed.
Topics: Acoustics; Data Management; Electromagnetic Phenomena; Transportation
PubMed: 34205033
DOI: 10.3390/s21113862 -
Journal of Biotechnology Nov 2021Collaborative research is common practice in modern life sciences. For most projects several researchers from multiple universities collaborate on a specific topic....
Collaborative research is common practice in modern life sciences. For most projects several researchers from multiple universities collaborate on a specific topic. Frequently, these research projects produce a wealth of data that requires central and secure storage, which should also allow for easy sharing among project participants. Only under best circumstances, this comes with minimal technical overhead for the researchers. Moreover, the need for data to be analyzed in a reproducible way often poses a challenge for researchers without a data science background and thus represents an overly time-consuming process. Here, we report on the integration of CyVerse Austria (CAT), a new cyberinfrastructure for a local community of life science researchers, and provide two examples how it can be used to facilitate FAIR data management and reproducible analytics for teaching and research. In particular, we describe in detail how CAT can be used (i) as a teaching platform with a defined software environment and data management/sharing possibilities, and (ii) to build a data analysis pipeline using the Docker technology tailored to the needs and interests of the researcher.
Topics: Austria; Data Management; Software
PubMed: 34400238
DOI: 10.1016/j.jbiotec.2021.08.004 -
PloS One 2022Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while...
Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.
Topics: Computer Simulation; Data Management; Ecosystem; Reproducibility of Results; Software; Workflow
PubMed: 35271613
DOI: 10.1371/journal.pone.0264492 -
Biological Chemistry Apr 2023While the FAIR (indable, ccessible, nteroperable, and e-usable) principles are well accepted in the scientific community, there are still many challenges in implementing...
While the FAIR (indable, ccessible, nteroperable, and e-usable) principles are well accepted in the scientific community, there are still many challenges in implementing them in the day-to-day scientific process. Data management of microscopy images poses special challenges due to the volume, variety, and many proprietary formats. In particular, appropriate metadata collection, a basic requirement for FAIR data, is a real challenge for scientists due to the technical and content-related aspects. Researchers benefit here from interdisciplinary research network with centralized data management. The typically multimodal structure requires generalized data management and the corresponding acquisition of metadata. Here we report on the establishment of an appropriate infrastructure for the research network by a Core Facility and the development and integration of a software tool MDEmic that allows easy and convenient processing of metadata of microscopy images while providing high flexibility in terms of customization of metadata sets. Since it is also in the interest of the core facility to apply standards regarding the scope and serialization formats to realize successful and sustainable data management for bioimaging, we report on our efforts within the community to define standards in metadata, interfaces, and to reduce the barriers of daily data management.
Topics: Data Management; Software; Metadata
PubMed: 36853922
DOI: 10.1515/hsz-2022-0304 -
Journal of Visualized Experiments : JoVE Jun 2023Transmission electron microscopy (TEM) enables users to study materials at their fundamental, atomic scale. Complex experiments routinely generate thousands of images... (Review)
Review
Transmission electron microscopy (TEM) enables users to study materials at their fundamental, atomic scale. Complex experiments routinely generate thousands of images with numerous parameters that require time-consuming and complicated analysis. AXON synchronicity is a machine-vision synchronization (MVS) software solution designed to address the pain points inherent to TEM studies. Once installed on the microscope, it enables the continuous synchronization of images and metadata generated by the microscope, detector, and in situ systems during an experiment. This connectivity enables the application of machine-vision algorithms that apply a combination of spatial, beam, and digital corrections to center and track a region of interest within the field of view and provide immediate image stabilization. In addition to the substantial improvement in resolution afforded by such stabilization, metadata synchronization enables the application of computational and image analysis algorithms that calculate variables between images. This calculated metadata can be used to analyze trends or identify key areas of interest within a dataset, leading to new insights and the development of more sophisticated machine-vision capabilities in the future. One such module that builds on this calculated metadata is dose calibration and management. The dose module provides state-of-the-art calibration, tracking, and management of both the electron fluence (e/Å·s) and cumulative dose (e/Å) that is delivered to specific areas of the sample on a pixel-by-pixel basis. This enables a comprehensive overview of the interaction between the electron beam and the sample. Experiment analysis is streamlined through a dedicated analysis software in which datasets consisting of images and corresponding metadata are easily visualized, sorted, filtered, and exported. Combined, these tools facilitate efficient collaborations and experimental analysis, encourage data mining and enhance the microscopy experience.
Topics: Data Management; Workflow; Software; Microscopy, Electron, Transmission; Algorithms
PubMed: 37427942
DOI: 10.3791/65446 -
GigaScience Dec 2022The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience...
The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience data grows with each advance in data acquisition techniques and research methods. To maximize the impact of diverse research strategies, multidisciplinary, large-scale neuroscience research consortia face a number of unsolved challenges in RDM. While open science principles are largely accepted, it is practically difficult for researchers to prioritize RDM over other pressing demands. The implementation of a coherent, executable RDM plan for consortia spanning animal, human, and clinical studies is becoming increasingly challenging. Here, we present an RDM strategy implemented for the Heidelberg Collaborative Research Consortium. Our consortium combines basic and clinical research in diverse populations (animals and humans) and produces highly heterogeneous and multimodal research data (e.g., neurophysiology, neuroimaging, genetics, behavior). We present a concrete strategy for initiating early-stage RDM and FAIR data generation for large-scale collaborative research consortia, with a focus on sustainable solutions that incentivize incremental RDM while respecting research-specific requirements.
Topics: Animals; Humans; Data Management; Neuroimaging; Research Personnel
PubMed: 37401720
DOI: 10.1093/gigascience/giad049 -
Nucleic Acids Research Jan 2023The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) at the Department of Energy (DOE) Joint Genome Institute (JGI) continues to...
The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) at the Department of Energy (DOE) Joint Genome Institute (JGI) continues to provide support for users to perform comparative analysis of isolate and single cell genomes, metagenomes, and metatranscriptomes. In addition to datasets produced by the JGI, IMG v.7 also includes datasets imported from public sources such as NCBI Genbank, SRA, and the DOE National Microbiome Data Collaborative (NMDC), or submitted by external users. In the past couple years, we have continued our effort to help the user community by improving the annotation pipeline, upgrading the contents with new reference database versions, and adding new analysis functionalities such as advanced scaffold search, Average Nucleotide Identity (ANI) for high-quality metagenome bins, new cassette search, improved gene neighborhood display, and improvements to metatranscriptome data display and analysis. We also extended the collaboration and integration efforts with other DOE-funded projects such as NMDC and DOE Biology Knowledgebase (KBase).
Topics: Genomics; Data Management; Genome, Bacterial; Software; Genome, Archaeal; Databases, Genetic; Metagenome
PubMed: 36382399
DOI: 10.1093/nar/gkac976 -
Journal of Biomedical Semantics Nov 2023Open Science Graphs (OSGs) are scientific knowledge graphs representing different entities of the research lifecycle (e.g. projects, people, research outcomes,...
BACKGROUND
Open Science Graphs (OSGs) are scientific knowledge graphs representing different entities of the research lifecycle (e.g. projects, people, research outcomes, institutions) and the relationships among them. They present a contextualized view of current research that supports discovery, re-use, reproducibility, monitoring, transparency and omni-comprehensive assessment. A Data Management Plan (DMP) contains information concerning both the research processes and the data collected, generated and/or re-used during a project's lifetime. Automated solutions and workflows that connect DMPs with the actual data and other contextual information (e.g., publications, fundings) are missing from the landscape. DMPs being submitted as deliverables also limit their findability. In an open and FAIR-enabling research ecosystem information linking between research processes and research outputs is essential. ARGOS tool for FAIR data management contributes to the OpenAIRE Research Graph (RG) and utilises its underlying services and trusted sources to progressively automate validation and automations of Research Data Management (RDM) practices.
RESULTS
A comparative analysis was conducted between the data models of ARGOS and OpenAIRE Research Graph against the DMP Common Standard. Following this, we extended ARGOS with export format converters and semantic tagging, and the OpenAIRE RG with a DMP entity and semantics between existing entities and relationships. This enabled the integration of ARGOS machine actionable DMPs (ma-DMPs) to the OpenAIRE OSG, enriching and exposing DMPs as FAIR outputs.
CONCLUSIONS
This paper, to our knowledge, is the first to introduce exposing ma-DMPs in OSGs and making the link between OSGs and DMPs, introducing the latter as entities in the research lifecycle. Further, it provides insight to ARGOS DMP service interoperability practices and integrations to populate the OpenAIRE Research Graph with DMP entities and relationships and strengthen both FAIRness of outputs as well as information exchange in a standard way.
Topics: Humans; Data Management; Reproducibility of Results
PubMed: 37919767
DOI: 10.1186/s13326-023-00297-5 -
Experimental Neurology Aug 2024Effective data management and sharing have become increasingly crucial in biomedical research; however, many laboratory researchers lack the necessary tools and...
Effective data management and sharing have become increasingly crucial in biomedical research; however, many laboratory researchers lack the necessary tools and knowledge to address this challenge. This article provides an introductory guide into research data management (RDM), and the importance of FAIR (Findable, Accessible, Interoperable, and Reusable) data-sharing principles for laboratory researchers produced by practicing scientists. We explore the advantages of implementing organized data management strategies and introduce key concepts such as data standards, data documentation, and the distinction between machine and human-readable data formats. Furthermore, we offer practical guidance for creating a data management plan and establishing efficient data workflows within the laboratory setting, suitable for labs of all sizes. This includes an examination of requirements analysis, the development of a data dictionary for routine data elements, the implementation of unique subject identifiers, and the formulation of standard operating procedures (SOPs) for seamless data flow. To aid researchers in implementing these practices, we present a simple organizational system as an illustrative example, which can be tailored to suit individual needs and research requirements. By presenting a user-friendly approach, this guide serves as an introduction to the field of RDM and offers practical tips to help researchers effortlessly meet the common data management and sharing mandates rapidly becoming prevalent in biomedical research.
Topics: Humans; Biomedical Research; Data Management; Information Dissemination; Research Personnel
PubMed: 38762093
DOI: 10.1016/j.expneurol.2024.114815