-
American Journal of Biological... Nov 2022Previous research has shown that while missing data are common in bioarchaeological studies, they are seldom handled using statistically rigorous methods. The primary...
OBJECTIVES
Previous research has shown that while missing data are common in bioarchaeological studies, they are seldom handled using statistically rigorous methods. The primary objective of this article is to evaluate the ability of imputation to manage missing data and encourage the use of advanced statistical methods in bioarchaeology and paleopathology. An overview of missing data management in biological anthropology is provided, followed by a test of imputation and deletion methods for handling missing data.
MATERIALS AND METHODS
Missing data were simulated on complete datasets of ordinal (n = 287) and continuous (n = 369) bioarchaeological data. Missing values were imputed using five imputation methods (mean, predictive mean matching, random forest, expectation maximization, and stochastic regression) and the success of each at obtaining the parameters of the original dataset compared with pairwise and listwise deletion.
RESULTS
In all instances, listwise deletion was least successful at approximating the original parameters. Imputation of continuous data was more effective than ordinal data. Overall, no one method performed best and the amount of missing data proved a stronger predictor of imputation success.
DISCUSSION
These findings support the use of imputation methods over deletion for handling missing bioarchaeological and paleopathology data, especially when the data are continuous. Whereas deletion methods reduce sample size, imputation maintains sample size, improving statistical power and preventing bias from being introduced into the dataset.
Topics: Archaeology; Sample Size; Research Design; Data Management; Bias
PubMed: 36790608
DOI: 10.1002/ajpa.24614 -
Journal of Chemical Information and... Jan 2022Projects in chemo- and bioinformatics often consist of scattered data in various types and are difficult to access in a meaningful way for efficient data analysis. Data...
Projects in chemo- and bioinformatics often consist of scattered data in various types and are difficult to access in a meaningful way for efficient data analysis. Data is usually too diverse to be even manipulated effectively. Sdfconf is data manipulation and analysis software to address this problem in a logical and robust manner. Other software commonly used for such tasks are either not designed with molecular and/or conformational data in mind or provide only a narrow set of tasks to be accomplished. Furthermore, many tools are only available within commercial software packages. Sdfconf is a flexible, robust, and free-of-charge tool for linking data from various sources for meaningful and efficient manipulation and analysis of molecule data sets. Sdfconf packages molecular structures and metadata into a complete ensemble, from which one can access both the whole data set and individual molecules and/or conformations. In this software note, we offer some practical examples of the utilization of sdfconf.
Topics: Computational Biology; Data Analysis; Data Management; Software
PubMed: 34932340
DOI: 10.1021/acs.jcim.1c01051 -
Western Journal of Nursing Research Aug 2020Data repositories can support secure data management for multi-institutional and geographically dispersed research teams. Primarily designed to provide secure access,...
Data repositories can support secure data management for multi-institutional and geographically dispersed research teams. Primarily designed to provide secure access, storage, and sharing of quantitative data, limited focus has been given to the unique considerations of data repositories for qualitative research. We share our experiences of using a data repository in a large qualitative nursing research study. Over a 27-month period, data collected by this 15-member team from 83 participants included photos, audio recordings and transcripts of interviews, and field notes. The data repository supported the secure collection, storage, and management of over 1,800 files with data. However, challenges were introduced during analysis that required negotiations about the structure and processes of the data repository. We discuss strengths and limitations of data repositories, and introduce practical strategies for developing a data management plan for qualitative research, which is supported through a data repository.
Topics: Data Management; Databases, Factual; Humans; Qualitative Research
PubMed: 31665999
DOI: 10.1177/0193945919881706 -
Computer Methods and Programs in... Nov 2021In the last decade, clinical trial management systems have become an essential support tool for data management and analysis in clinical research. However, these...
BACKGROUND AND OBJECTIVES
In the last decade, clinical trial management systems have become an essential support tool for data management and analysis in clinical research. However, these clinical tools have design limitations, since they are currently not able to cover the needs of adaptation to the continuous changes in the practice of the trials due to the heterogeneous and dynamic nature of the clinical research data. These systems are usually proprietary solutions provided by vendors for specific tasks. In this work, we propose FIMED, a software solution for the flexible management of clinical data from multiple trials, moving towards personalized medicine, which can contribute positively by improving clinical researchers quality and ease in clinical trials.
METHODS
This tool allows a dynamic and incremental design of patients' profiles in the context of clinical trials, providing a flexible user interface that hides the complexity of using databases. Clinical researchers will be able to define personalized data schemas according to their needs and clinical study specifications. Thus, FIMED allows the incorporation of separate clinical data analysis from multiple trials.
RESULTS
The efficiency of the software has been demonstrated by a real-world use case for a clinical assay in Melanoma disease, which has been indeed anonymized to provide a user demonstration. FIMED currently provides three data analysis and visualization components, guaranteeing a clinical exploration for gene expression data: heatmap visualization, clusterheatmap visualization, as well as gene regulatory network inference and visualization. An instance of this tool is freely available on the web at https://khaos.uma.es/fimed. It can be accessed with a demo user account, "researcher", using the password "demo".
CONCLUSION
This paper shows FIMED as a flexible and user-friendly way of managing multidimensional clinical research data. Hence, without loss of generality, FIMED is flexible enough to be used in the context of any other disease where clinical data and assays are involved.
Topics: Data Management; Databases, Factual; Gene Regulatory Networks; Humans; Internet; Software; User-Computer Interface
PubMed: 34740063
DOI: 10.1016/j.cmpb.2021.106496 -
Environmental Monitoring and Assessment Dec 2023A scientifically informed approach to decision-making is key to ensuring the sustainable management of ecosystems, especially in the light of increasing human pressure...
A scientifically informed approach to decision-making is key to ensuring the sustainable management of ecosystems, especially in the light of increasing human pressure on habitats and species. Protected areas, with their long-term institutional mandate for biodiversity conservation, play an important role as data providers, for example, through the long-term monitoring of natural resources. However, poor data management often limits the use and reuse of this wealth of information. In this paper, we share lessons learned in managing long-term data from the Italian Alpine national parks. Our analysis and examples focus on specific issues faced by managers of protected areas, which partially differ from those faced by academic researchers, predominantly owing to different mission, governance, and temporal perspectives. Rigorous data quality control, the use of appropriate data management tools, and acquisition of the necessary skills remain the main obstacles. Common protocols for data collection offer great opportunities for the future, and complete recovery and documentation of time series is an urgent priority. Notably, before data can be shared, protected areas should improve their data management systems, a task that can be achieved only with adequate resources and a long-term vision. We suggest strategies that protected areas, funding agencies, and the scientific community can embrace to address these problems. The added value of our work lies in promoting engagement with managers of protected areas and in reporting and analysing their concrete requirements and problems, thereby contributing to the ongoing discussion on data management and sharing through a bottom-up approach.
Topics: Humans; Ecosystem; Conservation of Natural Resources; Data Management; Environmental Monitoring; Biodiversity
PubMed: 38051448
DOI: 10.1007/s10661-023-11851-0 -
Briefings in Bioinformatics Jan 2021Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a... (Review)
Review
Thousands of new experimental datasets are becoming available every day; in many cases, they are produced within the scope of large cooperative efforts, involving a variety of laboratories spread all over the world, and typically open for public use. Although the potential collective amount of available information is huge, the effective combination of such public sources is hindered by data heterogeneity, as the datasets exhibit a wide variety of notations and formats, concerning both experimental values and metadata. Thus, data integration is becoming a fundamental activity, to be performed prior to data analysis and biological knowledge discovery, consisting of subsequent steps of data extraction, normalization, matching and enrichment; once applied to heterogeneous data sources, it builds multiple perspectives over the genome, leading to the identification of meaningful relationships that could not be perceived by using incompatible data formats. In this paper, we first describe a technological pipeline from data production to data integration; we then propose a taxonomy of genomic data players (based on the distinction between contributors, repository hosts, consortia, integrators and consumers) and apply the taxonomy to describe about 30 important players in genomic data management. We specifically focus on the integrator players and analyse the issues in solving the genomic data integration challenges, as well as evaluate the computational environments that they provide to follow up data integration by means of visualization and analysis tools.
Topics: Data Management; Genome, Human; Genomics; Humans; Metadata
PubMed: 32496509
DOI: 10.1093/bib/bbaa080 -
Studies in Health Technology and... Aug 2022The data collected in the clinical registries or by data reuse require some modifications in order to suit the research needs. Several common operations are frequently...
The data collected in the clinical registries or by data reuse require some modifications in order to suit the research needs. Several common operations are frequently applied to select relevant patients across the cohort, combine data from multiple sources, add new variables if needed and create unique tables depending on the research purpose. We carried out a qualitative survey by conducting semi-structured interviews with 7 experts in data reuse and proposed a standard workflow for health data management. We implemented a R tutorial based on a synthetic data set using Jupyter Notebook for a better understanding of the data management workflow.
Topics: Data Management; Humans; Workflow
PubMed: 36073461
DOI: 10.3233/SHTI220912 -
Clinical Research in Cardiology :... May 2024The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the... (Review)
Review
The sharing and documentation of cardiovascular research data are essential for efficient use and reuse of data, thereby aiding scientific transparency, accelerating the progress of cardiovascular research and healthcare, and contributing to the reproducibility of research results. However, challenges remain. This position paper, written on behalf of and approved by the German Cardiac Society and German Centre for Cardiovascular Research, summarizes our current understanding of the challenges in cardiovascular research data management (RDM). These challenges include lack of time, awareness, incentives, and funding for implementing effective RDM; lack of standardization in RDM processes; a need to better identify meaningful and actionable data among the increasing volume and complexity of data being acquired; and a lack of understanding of the legal aspects of data sharing. While several tools exist to increase the degree to which data are findable, accessible, interoperable, and reusable (FAIR), more work is needed to lower the threshold for effective RDM not just in cardiovascular research but in all biomedical research, with data sharing and reuse being factored in at every stage of the scientific process. A culture of open science with FAIR research data should be fostered through education and training of early-career and established research professionals. Ultimately, FAIR RDM requires permanent, long-term effort at all levels. If outcomes can be shown to be superior and to promote better (and better value) science, modern RDM will make a positive difference to cardiovascular science and practice. The full position paper is available in the supplementary materials.
Topics: Humans; Data Management; Reproducibility of Results; Heart; Cardiovascular System; Biomedical Research
PubMed: 37847314
DOI: 10.1007/s00392-023-02303-3 -
Big Data Jun 2023Big data management is a key enabling factor for enterprises that want to compete in the global market. Data coming from enterprise production processes, if properly...
Big data management is a key enabling factor for enterprises that want to compete in the global market. Data coming from enterprise production processes, if properly analyzed, can provide a boost in the enterprise management and optimization, guaranteeing faster processes, better customer management, and lower overheads/costs. Guaranteeing a proper big data pipeline is the holy grail of big data, often opposed by the difficulty of evaluating the correctness of the big data pipeline results. This problem is even worse when big data pipelines are provided as a service in the cloud, and must comply with both laws and users' requirements. To this aim, assurance techniques can complete big data pipelines, providing the means to guarantee that they behave correctly, toward the deployment of big data pipelines fully compliant with laws and users' requirements. In this article, we define an assurance solution for big data based on service-level agreements, where a semiautomatic approach supports users from the definition of the requirements to the negotiation of the terms regulating the provisioned services, and the continuous refinement thereof.
Topics: Big Data; Data Management
PubMed: 36862683
DOI: 10.1089/big.2021.0369 -
Journal of Pharmaceutical Sciences May 2022Recent advancements in data engineering, data science, and secure cloud storage can transform the current state of global Chemistry, Manufacturing, and Controls (CMC)... (Review)
Review
Recent advancements in data engineering, data science, and secure cloud storage can transform the current state of global Chemistry, Manufacturing, and Controls (CMC) regulatory activities to automated online digital processes. Modernizing regulatory activities will facilitate simultaneous global submissions and concurrent collaborative reviews, significantly reducing global licensing timelines and variability in globally registered product details. This article describes advancements made within the pharmaceutical industry from theoretical concepts to utilization of structured content and data in CMC submissions. The term Structured Content and Data Management (SCDM) outlines the end-to-end scientific data lifecycle from capture in source systems, aggregation into a consolidated repository, and transformation into semantically structured blocks with metadata defining relationships between scientific data and business contexts. Automation of regulatory authoring (termed Structured Content Authoring) is feasible because SCDM makes data both human and machine readable. It will offer health authorities access to the digital data beyond the current standard of PDF documents and, for a review process, SCDM would "enrich the effectiveness, efficiency, and consistency of regulatory quality oversight" (Yu et al., 2019). SCDM is a novel solution for content and data management in regulatory submissions and can enable faster access to critical therapies worldwide.
Topics: Commerce; Data Management; Drug Industry; Humans
PubMed: 34610323
DOI: 10.1016/j.xphs.2021.09.046