-
NanoImpact Jul 2022Publishing research data using a findable, accessible, interoperable, and reusable (FAIR) approach is paramount to further innovation in many areas of research. In... (Review)
Review
Publishing research data using a findable, accessible, interoperable, and reusable (FAIR) approach is paramount to further innovation in many areas of research. In particular in developing innovative approaches to predict (eco)toxicological risks in (nano or advanced) material design where efficient use of existing data is essential. The use of tools assessing the FAIRness of data helps the future improvement of data FAIRness and therefore their re-use. This paper reviews ten FAIR assessment tools that have been evaluated and characterized using two datasets from the nanomaterials and microplastics risk assessment domain. The tools were grouped into four categories: online and offline self-assessment survey based, online (semi-) automated and other tools. We found that the online self-assessment tools can be used for a quick scan of a user's dataset due to their ease of use, little need for experience and short time investment. When a user is looking to assess full databases, and not just datasets, for their FAIRness, (semi-)automated tools are more practical. The offline assessment tools were found to be limited and unreliable due to a lack of guidance and an under-developed state. To further characterize the usability, two datasets were run through all tools to check the similarity in the tools' results. As most of the tools differ in their implementation of the FAIR principles, a large variety in outcomes was obtained. Furthermore, it was observed that only one tool gives recommendations to the user on how to improve the FAIRness of the evaluated dataset. This paper gives clear recommendations for both the user and the developer of FAIR assessment tools.
Topics: Data Management; Databases, Factual; Plastics; Risk Assessment; Self-Assessment
PubMed: 35717894
DOI: 10.1016/j.impact.2022.100402 -
Database : the Journal of Biological... Sep 2022The rapid advancement of sequencing technology, including next-generation sequencing (NGS), has greatly improved sequencing efficiency and decreased cost. Consequently,...
The rapid advancement of sequencing technology, including next-generation sequencing (NGS), has greatly improved sequencing efficiency and decreased cost. Consequently, huge amounts of genomic, transcriptomic and epigenetic data concerning cotton species have been generated and released. These large-scale data provide immense opportunities for the study of cotton genomic structure and evolution, population genetic diversity and genome-wide mining of excellent genes for important traits. However, the complexity of NGS data also causes distress, as it cannot be utilized easily. Here, we presented the cotton omics data platform COTTONOMICS (http://cotton.zju.edu.cn/), an easily accessible web database that integrates 32.5 TB of omics data including seven assembled genomes, resequencing data from 1180 allotetraploid cotton accessions and RNA-sequencing (RNA-seq), small RNA-sequencing (smRNA-seq), Chromatin Immunoprecipitation sequencing (ChIP-seq), DNase hypersensitive sites sequencing (DNase-seq) and Bisulfite sequencing (BS-seq). COTTONOMICS allows users to employ various search scenarios and retrieve information concerning the cotton genomes, genomic variation (Single nucleotide polymorphisms (SNPs) and Insertion and Deletion (InDels)), gene expression, smRNA expression, epigenetic regulation and quantitative trait locus (QTLs). The user-friendly web interface offers a variety of modules for storing, retrieving, analyzing and visualizing cotton multi-omics data to diverse ends, thereby enabling users to decipher cotton population genetics and identify potential novel genes that influence agronomically beneficial traits. Database URL: http://cotton.zju.edu.cn.
Topics: Data Management; Deoxyribonucleases; Epigenesis, Genetic; High-Throughput Nucleotide Sequencing; RNA
PubMed: 36094905
DOI: 10.1093/database/baac080 -
GigaScience Dec 2022Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many...
Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many samples with their metadata. They must manage many large files that are the results of the assays or subsequent computation. Users with diverse backgrounds, ranging from computational scientists to wet-lab scientists, have dissimilar needs when it comes to data access, with programmatic interfaces being favored by the former and graphical ones by the latter. We introduce SODAR, the system for omics data access and retrieval. SODAR is a software package that addresses these challenges by providing a web-based graphical user interface for managing multiassay studies and describing them using the ISA (Investigation, Study, Assay) data model and the ISA-Tab file format. Data storage is handled using the iRODS data management system, which handles large quantities of files and substantial amounts of data. SODAR also offers programmable APIs and command-line access for metadata and file storage. SODAR supports complex omics integration studies and can be easily installed. The software is written in Python 3 and freely available at https://github.com/bihealth/sodar-server under the MIT license.
Topics: Multiomics; Metadata; Software; Information Storage and Retrieval; Data Management
PubMed: 37498129
DOI: 10.1093/gigascience/giad052 -
Nucleic Acids Research Jan 2023The Human Microbial Metabolome Database (MiMeDB) (https://mimedb.org) is a comprehensive, multi-omic, microbiome resource that connects: (i) microbes to microbial...
The Human Microbial Metabolome Database (MiMeDB) (https://mimedb.org) is a comprehensive, multi-omic, microbiome resource that connects: (i) microbes to microbial genomes; (ii) microbial genomes to microbial metabolites; (iii) microbial metabolites to the human exposome and (iv) all of these 'omes' to human health. MiMeDB was established to consolidate the growing body of data connecting the human microbiome and the chemicals it produces to both health and disease. MiMeDB contains detailed taxonomic, microbiological and body-site location data on most known human microbes (bacteria and fungi). This microbial data is linked to extensive genomic and proteomic sequence data that is closely coupled to colourful interactive chromosomal maps. The database also houses detailed information about all the known metabolites generated by these microbes, their structural, chemical and spectral properties, the reactions and enzymes responsible for these metabolites and the primary exposome sources (food, drug, cosmetic, pollutant, etc.) that ultimately lead to the observed microbial metabolites in humans. Additional, extensively referenced data about the known or presumptive health effects, measured biosample concentrations and human protein targets for these compounds is provided. All of this information is housed in richly annotated, highly interactive, visually pleasing database that has been designed to be easy to search, easy to browse and easy to navigate. Currently MiMeDB contains data on 626 health effects or bioactivities, 1904 microbes, 3112 references, 22 054 reactions, 24 254 metabolites or exposure chemicals, 648 861 MS and NMR spectra, 6.4 million genes and 7.6 billion DNA bases. We believe that MiMeDB represents the kind of integrated, multi-omic or systems biology database that is needed to enable comprehensive multi-omic integration.
Topics: Humans; Metabolomics; Proteomics; Metabolome; Databases, Factual; Data Management
PubMed: 36215042
DOI: 10.1093/nar/gkac868 -
Scientific Data Jun 2024The demand for open data and open science is on the rise, fueled by expectations from the scientific community, calls to increase transparency and reproducibility in...
The demand for open data and open science is on the rise, fueled by expectations from the scientific community, calls to increase transparency and reproducibility in research findings, and developments such as the Final Data Management and Sharing Policy from the U.S. National Institutes of Health and a memorandum on increasing public access to federally funded research, issued by the U.S. Office of Science and Technology Policy. This paper explores the pivotal role of data repositories in biomedical research and open science, emphasizing their importance in managing, preserving, and sharing research data. Our objective is to familiarize readers with the functions of data repositories, set expectations for their services, and provide an overview of methods to evaluate their capabilities. The paper serves to introduce fundamental concepts and community-based guiding principles and aims to equip researchers, repository operators, funders, and policymakers with the knowledge to select appropriate repositories for their data management and sharing needs and foster a foundation for the open sharing and preservation of research data.
Topics: Biomedical Research; Data Management; Information Dissemination
PubMed: 38871749
DOI: 10.1038/s41597-024-03449-z -
Philosophical Transactions. Series A,... Oct 2022Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data...
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
Topics: COVID-19; Data Management; Humans; Pandemics; Software; Workflow
PubMed: 35965468
DOI: 10.1098/rsta.2021.0300 -
JCO Clinical Cancer Informatics Mar 2021For central cancer registries to become a more significant public health resource, they must evolve to capture more timely, accurate, and extensive data. Key...
For central cancer registries to become a more significant public health resource, they must evolve to capture more timely, accurate, and extensive data. Key stakeholders have called for a faster time to deliver work products, data extensions such as social determinants of health, and more relevant information for cancer control programs at the local level. The proposed model consists of near real-time reporting stages to replace the current time and labor-intensive efforts to populate a complete cancer case abstract on the basis of the 12- and 24-month data submission timelines. The first stage collects a cancer diagnosis minimum data set sufficient to describe population incidence and prevalence, which is then followed by a second stage capturing subsequent case updates and treatment data. A third stage procures targeted information in response to identified research projects' needs. The model also provides for further supplemental reports as may be defined to gather additional data. All stages leverage electronic health records' widespread development and the many emerging standards for data content, including national policies related to healthcare and technical standards for interoperability, such as the Fast Healthcare Interoperability Resources specifications to automate and accelerate reporting to central cancer registries. The emergence of application programming interfaces that allow for more interoperability among systems would be leveraged, leading to more efficient information sharing. Adopting this model will expedite cancer data availability to improve cancer control while supporting data integrity and flexibility in data items. It presents a long-term and feasible solution that addresses the extensive burden and unsustainable manual data collection requirements placed on Certified Tumor Registrars at disease reporting entities nationally.
Topics: Data Collection; Data Management; Electronic Health Records; Humans; Neoplasms; Registries
PubMed: 33760641
DOI: 10.1200/CCI.20.00177 -
Journal of Biomedical Informatics Apr 2023Data stewardship is a term that is understood in heterogenous ways. In recent organisational developments and efforts to build infrastructures and hire professional...
Data stewardship is a term that is understood in heterogenous ways. In recent organisational developments and efforts to build infrastructures and hire professional staff for research data management in various scientific fields in Europe, data stewardship is understood as mainly aiming at optimising data management in line with the FAIR principles (findability, accessibility, interoperability, reusability) forpurposes of reuse in the interests of the scientific community and the public. In addition, especially in the health and biomedical sciences some understandings of data stewardship mainly focus on the responsibility to respect the informational rights of data subjects. Following on from these different understandings and from recent developments to include ever more stakeholders in data stewardship, we propose a comprehensive understanding of data stewardship. According to this comprehensive understanding, data stewardship includes responsibilities towards all pertinent stakeholders and to equally consider and respect their legitimate rights and interests in order to build and maintain an efficient, trusted and fair data ecosystem. We also point out some of the practical challenges implied in such a comprehensive understanding.
Topics: Humans; Ecosystem; Europe; Data Management
PubMed: 36935012
DOI: 10.1016/j.jbi.2023.104337 -
Data Management Plans in the genomics research revolution of Africa: Challenges and recommendations.Journal of Biomedical Informatics Oct 2021Drafting and writing a data management plan (DMP) is increasingly seen as a key part of the academic research process. A DMP is a document that describes how a... (Review)
Review
Drafting and writing a data management plan (DMP) is increasingly seen as a key part of the academic research process. A DMP is a document that describes how a researcher will collect, document, describe, share, and preserve the data that will be generated as part of a research project. The DMP illustrates the importance of utilizing best practices through all stages of working with data while ensuring accessibility, quality, and longevity of the data. The benefits of writing a DMP include compliance with funder and institutional mandates; making research more transparent (for reproduction and validation purposes); and FAIR (findable, accessible, interoperable, reusable); protecting data subjects and compliance with the General Data Protection Regulation (GDPR) and/or local data protection policies. In this review, we highlight the importance of a DMP in modern biomedical research, explaining both the rationale and current best practices associated with DMPs. In addition, we outline various funders' requirements concerning DMPs and discuss open-source tools that facilitate the development and implementation of a DMP. Finally, we discuss DMPs in the context of African research, and the considerations that need to be made in this regard.
Topics: Africa; Biomedical Research; Data Management; Genomics; Humans; Research Design
PubMed: 34506960
DOI: 10.1016/j.jbi.2021.103900 -
Journal of Healthcare Engineering 2021Because of the availability of more than an actor and a wireless component among e-health applications, providing more security and safety is expected. Moreover,...
Because of the availability of more than an actor and a wireless component among e-health applications, providing more security and safety is expected. Moreover, ensuring data confidentiality within different services becomes a key requirement. In this paper, we propose to collect data from health and fitness smart devices deployed in connection with the proposed IoT blockchain platform. The use of these devices helps us in extracting an amount of highly valuable heath data that are filtered, analyzed, and stored in electronic health records (EHRs). Different actors of the platform, coaches, patients, and doctors, collaborate to provide an on-time diagnosis and treatment for various diseases in an easy and cost-effective way. Our main purpose is to provide a distributed, secure, and authorized access to these sensitive data using the Ethereum blockchain technology. We have designed an integrated low-powered IoT blockchain platform for a healthcare application to store and review EHRs. This architecture, based on the blockchain Ethereum, includes a web and mobile application allowing the patient as well as the medical and paramedical staff to have a secure access to health information. The Ethereum node is implemented on an embedded platform, which should provide an efficient, flexible, and secure system despite the limited resources and low power consumption of the multiprocessor platform.
Topics: Blockchain; Confidentiality; Data Management; Delivery of Health Care; Electronic Health Records; Humans
PubMed: 34336176
DOI: 10.1155/2021/9978863