-
PloS One 2021The Protection of Personal Information Act (POPIA) 2013 came into force in South Africa on 1 July 2020. It seeks to strengthen the processing of personal information,...
The Protection of Personal Information Act (POPIA) 2013 came into force in South Africa on 1 July 2020. It seeks to strengthen the processing of personal information, including health information. While POPIA is to be welcomed, there are concerns about the impact it will have on the processing of health information. To ensure that the National Health Laboratory Service [NHLS] is compliant with these new strict processing requirements and that compliance does not negatively impact upon its current screening, treatment, surveillance and research mandate, it was decided to consider the development of a NHLS POPIA Code of Conduct for Personal Health. As part of the process of developing such a Code and better understand the challenges faced in the processing of personal health information in South Africa, 19 semi-structured interviews with stakeholders were conducted between June and September 2020. Overall, respondents welcomed the introduction of POPIA. However, they felt that there are tensions between the strengthening of data protection and the use of personal information for individual patient care, treatment programmes, and research. Respondents reported a need to rethink the management of personal health information in South Africa and identified 5 issues needing to be addressed at a national and an institutional level: an understanding of the importance of personal information; an understanding of POPIA and data protection; improve data quality; improve transparency in data use; and improve accountability in data use. The application of POPIA to the processing of personal health information is challenging, complex, and likely costly. However, personal health information must be appropriately managed to ensure the privacy of the data subject is protected, but equally that it is used as a resource in the individual's and wider public interest.
Topics: Confidentiality; Data Management; Health Records, Personal; Humans; Information Dissemination; Personally Identifiable Information; South Africa
PubMed: 34928950
DOI: 10.1371/journal.pone.0260341 -
Journal of Biotechnology Nov 2021Collaborative research is common practice in modern life sciences. For most projects several researchers from multiple universities collaborate on a specific topic....
Collaborative research is common practice in modern life sciences. For most projects several researchers from multiple universities collaborate on a specific topic. Frequently, these research projects produce a wealth of data that requires central and secure storage, which should also allow for easy sharing among project participants. Only under best circumstances, this comes with minimal technical overhead for the researchers. Moreover, the need for data to be analyzed in a reproducible way often poses a challenge for researchers without a data science background and thus represents an overly time-consuming process. Here, we report on the integration of CyVerse Austria (CAT), a new cyberinfrastructure for a local community of life science researchers, and provide two examples how it can be used to facilitate FAIR data management and reproducible analytics for teaching and research. In particular, we describe in detail how CAT can be used (i) as a teaching platform with a defined software environment and data management/sharing possibilities, and (ii) to build a data analysis pipeline using the Docker technology tailored to the needs and interests of the researcher.
Topics: Austria; Data Management; Software
PubMed: 34400238
DOI: 10.1016/j.jbiotec.2021.08.004 -
GigaScience Dec 2022Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many...
Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many samples with their metadata. They must manage many large files that are the results of the assays or subsequent computation. Users with diverse backgrounds, ranging from computational scientists to wet-lab scientists, have dissimilar needs when it comes to data access, with programmatic interfaces being favored by the former and graphical ones by the latter. We introduce SODAR, the system for omics data access and retrieval. SODAR is a software package that addresses these challenges by providing a web-based graphical user interface for managing multiassay studies and describing them using the ISA (Investigation, Study, Assay) data model and the ISA-Tab file format. Data storage is handled using the iRODS data management system, which handles large quantities of files and substantial amounts of data. SODAR also offers programmable APIs and command-line access for metadata and file storage. SODAR supports complex omics integration studies and can be easily installed. The software is written in Python 3 and freely available at https://github.com/bihealth/sodar-server under the MIT license.
Topics: Multiomics; Metadata; Software; Information Storage and Retrieval; Data Management
PubMed: 37498129
DOI: 10.1093/gigascience/giad052 -
Nucleic Acids Research Jan 2023The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) at the Department of Energy (DOE) Joint Genome Institute (JGI) continues to...
The Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) at the Department of Energy (DOE) Joint Genome Institute (JGI) continues to provide support for users to perform comparative analysis of isolate and single cell genomes, metagenomes, and metatranscriptomes. In addition to datasets produced by the JGI, IMG v.7 also includes datasets imported from public sources such as NCBI Genbank, SRA, and the DOE National Microbiome Data Collaborative (NMDC), or submitted by external users. In the past couple years, we have continued our effort to help the user community by improving the annotation pipeline, upgrading the contents with new reference database versions, and adding new analysis functionalities such as advanced scaffold search, Average Nucleotide Identity (ANI) for high-quality metagenome bins, new cassette search, improved gene neighborhood display, and improvements to metatranscriptome data display and analysis. We also extended the collaboration and integration efforts with other DOE-funded projects such as NMDC and DOE Biology Knowledgebase (KBase).
Topics: Genomics; Data Management; Genome, Bacterial; Software; Genome, Archaeal; Databases, Genetic; Metagenome
PubMed: 36382399
DOI: 10.1093/nar/gkac976 -
Journal of Biomedical Semantics Nov 2023Open Science Graphs (OSGs) are scientific knowledge graphs representing different entities of the research lifecycle (e.g. projects, people, research outcomes,...
BACKGROUND
Open Science Graphs (OSGs) are scientific knowledge graphs representing different entities of the research lifecycle (e.g. projects, people, research outcomes, institutions) and the relationships among them. They present a contextualized view of current research that supports discovery, re-use, reproducibility, monitoring, transparency and omni-comprehensive assessment. A Data Management Plan (DMP) contains information concerning both the research processes and the data collected, generated and/or re-used during a project's lifetime. Automated solutions and workflows that connect DMPs with the actual data and other contextual information (e.g., publications, fundings) are missing from the landscape. DMPs being submitted as deliverables also limit their findability. In an open and FAIR-enabling research ecosystem information linking between research processes and research outputs is essential. ARGOS tool for FAIR data management contributes to the OpenAIRE Research Graph (RG) and utilises its underlying services and trusted sources to progressively automate validation and automations of Research Data Management (RDM) practices.
RESULTS
A comparative analysis was conducted between the data models of ARGOS and OpenAIRE Research Graph against the DMP Common Standard. Following this, we extended ARGOS with export format converters and semantic tagging, and the OpenAIRE RG with a DMP entity and semantics between existing entities and relationships. This enabled the integration of ARGOS machine actionable DMPs (ma-DMPs) to the OpenAIRE OSG, enriching and exposing DMPs as FAIR outputs.
CONCLUSIONS
This paper, to our knowledge, is the first to introduce exposing ma-DMPs in OSGs and making the link between OSGs and DMPs, introducing the latter as entities in the research lifecycle. Further, it provides insight to ARGOS DMP service interoperability practices and integrations to populate the OpenAIRE Research Graph with DMP entities and relationships and strengthen both FAIRness of outputs as well as information exchange in a standard way.
Topics: Humans; Data Management; Reproducibility of Results
PubMed: 37919767
DOI: 10.1186/s13326-023-00297-5 -
Frontiers in Public Health 2021ODK provides software and standards that are popular solutions for off-grid electronic data collection and has substantial code overlap and interoperability with a...
ODK provides software and standards that are popular solutions for off-grid electronic data collection and has substantial code overlap and interoperability with a number of related software products including CommCare, Enketo, Ona, SurveyCTO, and KoBoToolbox. These tools provide open-source options for off-grid use in public health data collection, management, analysis, and reporting. During the 2018-2020 Ebola epidemic in the North Kivu and Ituri regions of Democratic Republic of Congo, we used these tools to support the DRC Ministère de la Santé RDC and World Health Organization in their efforts to administer an experimental vaccine (VSV-Zebov-GP) as part of their strategy to control the transmission of infection. New functions were developed to facilitate the use of ODK, Enketo and in large scale data collection, aggregation, monitoring, and near-real-time analysis during clinical research in health emergencies. We present enhancements to ODK that include a built-in audit-trail, a framework and companion app for biometric registration of ISO/IEC 19794-2 fingerprint templates, enhanced performance features, better scalability for studies featuring millions of data form submissions, increased options for parallelization of research projects, and pipelines for automated management and analysis of data. We also developed novel encryption protocols for enhanced web-form security in Enketo. Against the backdrop of a complex and challenging epidemic response, our enhanced platform of open tools was used to collect and manage data from more than 280,000 eligible study participants who received VSV-Zebov-GP under informed consent. These data were used to determine whether the VSV-Zebov-GP was safe and effective and to guide daily field operations. We present open-source developments that make electronic data management during clinical research and health emergencies more viable and robust. These developments will also enhance and expand the functionality of a diverse range of data collection platforms that are based on the ODK software and standards.
Topics: Data Management; Electronics; Epidemics; Hemorrhagic Fever, Ebola; Humans
PubMed: 34805059
DOI: 10.3389/fpubh.2021.665584 -
Nucleic Acids Research Jan 2023The Human Microbial Metabolome Database (MiMeDB) (https://mimedb.org) is a comprehensive, multi-omic, microbiome resource that connects: (i) microbes to microbial...
The Human Microbial Metabolome Database (MiMeDB) (https://mimedb.org) is a comprehensive, multi-omic, microbiome resource that connects: (i) microbes to microbial genomes; (ii) microbial genomes to microbial metabolites; (iii) microbial metabolites to the human exposome and (iv) all of these 'omes' to human health. MiMeDB was established to consolidate the growing body of data connecting the human microbiome and the chemicals it produces to both health and disease. MiMeDB contains detailed taxonomic, microbiological and body-site location data on most known human microbes (bacteria and fungi). This microbial data is linked to extensive genomic and proteomic sequence data that is closely coupled to colourful interactive chromosomal maps. The database also houses detailed information about all the known metabolites generated by these microbes, their structural, chemical and spectral properties, the reactions and enzymes responsible for these metabolites and the primary exposome sources (food, drug, cosmetic, pollutant, etc.) that ultimately lead to the observed microbial metabolites in humans. Additional, extensively referenced data about the known or presumptive health effects, measured biosample concentrations and human protein targets for these compounds is provided. All of this information is housed in richly annotated, highly interactive, visually pleasing database that has been designed to be easy to search, easy to browse and easy to navigate. Currently MiMeDB contains data on 626 health effects or bioactivities, 1904 microbes, 3112 references, 22 054 reactions, 24 254 metabolites or exposure chemicals, 648 861 MS and NMR spectra, 6.4 million genes and 7.6 billion DNA bases. We believe that MiMeDB represents the kind of integrated, multi-omic or systems biology database that is needed to enable comprehensive multi-omic integration.
Topics: Humans; Metabolomics; Proteomics; Metabolome; Databases, Factual; Data Management
PubMed: 36215042
DOI: 10.1093/nar/gkac868 -
Online sleep diaries: considerations for system development and recommendations for data management.Sleep Oct 2023To present development considerations for online sleep diary systems that result in robust, interpretable, and reliable data; furthermore, to describe data management...
STUDY OBJECTIVES
To present development considerations for online sleep diary systems that result in robust, interpretable, and reliable data; furthermore, to describe data management procedures to address common data entry errors that occur despite those considerations.
METHODS
The online sleep diary capture component of the Sleep Healthy Using the Internet (SHUTi) intervention has been designed to promote data integrity. Features include diary entry restrictions to limit retrospective bias, reminder prompts and data visualizations to support user engagement, and data validation checks to reduce data entry errors. Despite these features, data entry errors still occur. Data management procedures relying largely on programming syntax to minimize researcher effort and maximize reliability and replicability. Presumed data entry errors are identified where users are believed to have incorrectly selected a date or AM versus PM on the 12-hour clock. Following these corrections, diaries are identified that have unresolvable errors, like negative total sleep time.
RESULTS
Using the example of one of our fully-powered, U.S. national SHUTi randomized controlled trials, we demonstrate the application of these procedures: of 45,598 total submitted diaries, 487 diaries (0.01%) required modification due to date and/or AM/PM errors and 27 diaries (<0.001%) were eliminated due to unresolvable errors.
CONCLUSION
To secure the most complete and valid data from online sleep diary systems, it is critical to consider the design of the data collection system and to develop replicable processes to manage data.
CLINICAL TRIAL REGISTRATION
Sleep Healthy Using The Internet for Older Adult Sufferers of Insomnia and Sleeplessness (SHUTiOASIS); https://clinicaltrials.gov/ct2/show/NCT03213132; ClinicalTrials.gov ID: NCT03213132.
Topics: Humans; Aged; Data Management; Retrospective Studies; Reproducibility of Results; Sleep; Sleep Initiation and Maintenance Disorders
PubMed: 37480840
DOI: 10.1093/sleep/zsad199 -
Philosophical Transactions. Series A,... Oct 2022Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data...
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
Topics: COVID-19; Data Management; Humans; Pandemics; Software; Workflow
PubMed: 35965468
DOI: 10.1098/rsta.2021.0300 -
ENeuro Feb 2023Science is changing: the volume and complexity of data are increasing, the number of studies is growing and the goal of achieving reproducible results requires new...
Research Data Management and Data Sharing for Reproducible Research-Results of a Community Survey of the German National Research Data Infrastructure Initiative Neuroscience.
Science is changing: the volume and complexity of data are increasing, the number of studies is growing and the goal of achieving reproducible results requires new solutions for scientific data management. In the field of neuroscience, the German National Research Data Infrastructure (NFDI-Neuro) initiative aims to develop sustainable solutions for research data management (RDM). To obtain an understanding of the present RDM situation in the neuroscience community, NFDI-Neuro conducted a comprehensive survey among the neuroscience community. Here, we report and analyze the results of the survey. We focused the survey and our analysis on current needs, challenges, and opinions about RDM. The German neuroscience community perceives barriers with respect to RDM and data sharing mainly linked to (1) lack of data and metadata standards, (2) lack of community adopted provenance tracking methods, (3) lack of secure and privacy preserving research infrastructure for sensitive data, (4) lack of RDM literacy, and (5) lack of resources (time, personnel, money) for proper RDM. However, an overwhelming majority of community members (91%) indicated that they would be willing to share their data with other researchers and are interested to increase their RDM skills. Taking advantage of this willingness and overcoming the existing barriers requires the systematic development of standards, tools, and infrastructure, the provision of training, education, and support, as well as additional resources for RDM to the research community and a constant dialogue with relevant stakeholders including policy makers to leverage of a culture change through adapted incentivization and regulation.
Topics: Data Management; Biomedical Research; Surveys and Questionnaires; Information Dissemination; Neurosciences
PubMed: 36750361
DOI: 10.1523/ENEURO.0215-22.2023