-
Bioinformatics (Oxford, England) May 2019We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's...
SUMMARY
We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration.
AVAILABILITY AND IMPLEMENTATION
VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA).
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Alzheimer Disease; Data Management; Genomics; High-Throughput Nucleotide Sequencing; Humans; Software
PubMed: 30351394
DOI: 10.1093/bioinformatics/bty894 -
Clinical and Translational Science Sep 2023In drug development a frequently used phrase is "data-driven". Just as high-test gas fuels a car, so drug development "runs on" high-quality data; hence, good data... (Review)
Review
In drug development a frequently used phrase is "data-driven". Just as high-test gas fuels a car, so drug development "runs on" high-quality data; hence, good data management practices, which involve case report form design, data entry, data capture, data validation, medical coding, database closure, and database locking, are critically important. This review covers the essentials of clinical data management (CDM) for the United States. It is intended to demystify CDM, which means nothing more esoteric than the collection, organization, maintenance, and analysis of data for clinical trials. The review is written with those who are new to drug development in mind and assumes only a passing familiarity with the terms and concepts that are introduced. However, its relevance may also extend to experienced professionals that feel the need to brush up on the basics. For added color and context, the review includes real-world examples with RRx-001, a new molecular entity in phase III and with fast-track status in head and neck cancer, and AdAPT-001, an oncolytic adenovirus armed with a transforming growth factor-beta (TGF-β) trap in a phase I/II clinical trial with which the authors, as employees of the biopharmaceutical company, EpicentRx, are closely involved. An alphabetized glossary of key terms and acronyms used throughout this review is also included for easy reference.
Topics: Humans; United States; Data Management
PubMed: 37382299
DOI: 10.1111/cts.13582 -
Journal of Medical Internet Research May 2019Blockchain is emerging as an innovative technology for secure data management in many areas, including medical practice. A distributed blockchain network is tolerant...
BACKGROUND
Blockchain is emerging as an innovative technology for secure data management in many areas, including medical practice. A distributed blockchain network is tolerant against network fault, and the registered data are resistant to tampering and revision. The technology has a high affinity with digital medicine like mobile health (mHealth) and provides reliability to the medical data without labor-intensive third-party contributions. On the other hand, the reliability of the medical data is not insured before registration to the blockchain network. Furthermore, there are issues with regard to how the clients' mobile devices should be dealt with and authenticated in the blockchain network in order to avoid impersonation.
OBJECTIVE
The aim of the study was to design and validate an mHealth system that enables the compatibility of the security and scalability of the medical data using blockchain technology.
METHODS
We designed an mHealth system that sends medical data to the blockchain network via relay servers. The architecture provides scalability and convenience of operation of the system. In order to ensure the reliability of the data from clients' mobile devices, hash values with chain structure (client hashchain) were calculated in the clients' devices and the results were registered on the blockchain network.
RESULTS
The system was applied and deployed in mHealth for insomnia treatment. Clinical trials for mHealth were conducted with insomnia patients. Medical data of the recruited patients were successfully registered with the blockchain network via relay servers along with the hashchain calculated on the clients' mobile devices. The correctness of the data was validated by identifying illegal data, which were made by simulating fraudulent access.
CONCLUSIONS
Our proposed mHealth system, blockchain combined with client hashchain, ensures compatibility of security and scalability in the data management of mHealth medical practice.
TRIAL REGISTRATION
UMIN Clinical Trials Registry UMIN000032951; https://upload.umin.ac.jp/cgi-open- bin/ctr_e/ctr_view.cgi?recptno=R000037564 (Archived by WebCite at http://www.webcitation.org/78HP5iFIw).
Topics: Blockchain; Data Management; Humans; Reproducibility of Results; Research Design; Telemedicine; Validation Studies as Topic
PubMed: 31099337
DOI: 10.2196/13385 -
Journal of Assisted Reproduction and... Jul 2021
Topics: Artificial Intelligence; Data Management; Fertilization in Vitro; Humans; Reproductive Medicine
PubMed: 33715133
DOI: 10.1007/s10815-021-02122-3 -
Briefings in Bioinformatics Jan 2021With advances in genomic sequencing technology, a large amount of data is publicly available for the research community to extract meaningful and reliable associations... (Review)
Review
With advances in genomic sequencing technology, a large amount of data is publicly available for the research community to extract meaningful and reliable associations among risk genes and the mechanisms of disease. However, this exponential growth of data is spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality what hinders the differentiation of clinically valid relationships from those that are less well-sustained and that could lead to wrong diagnosis. This paper presents how conceptual models can play a key role to efficiently manage genomic data. These data must be accessible, informative and reliable enough to extract valuable knowledge in the context of the identification of evidence supporting the relationship between DNA variants and disease. The approach presented in this paper provides a solution that help researchers to organize, store and process information focusing only on the data that are relevant and minimizing the impact that the information overload has in clinical and research contexts. A case-study (epilepsy) is also presented, to demonstrate its application in a real context.
Topics: Data Management; Data Systems; Epilepsy; Genetic Predisposition to Disease; Genomics; Humans
PubMed: 32533135
DOI: 10.1093/bib/bbaa100 -
Ghana Medical Journal Jun 2021The COVID-19 pandemic caused by SARS-CoV-2 is an important subject for global health. Ghana experienced low-moderate transmission of the disease when the first case was...
UNLABELLED
The COVID-19 pandemic caused by SARS-CoV-2 is an important subject for global health. Ghana experienced low-moderate transmission of the disease when the first case was detected in March 12, 2020 until the middle of July when the number of cases begun to drop. By August 24, 2020, the country's total number of confirmed cases stood at 43,622, with 263 deaths. By the same time, the Noguchi Memorial Institute for Medical Research (NMIMR) of the University of Ghana, the primary testing centre for COVID-19, had tested 285,501 with 28,878 confirmed cases. Due to database gaps, there were initial challenges with timely reporting and feedback to stakeholders during the peak surveillance period. The gaps resulted from mismatches between samples and their accompanying case investigation forms, samples without case investigation forms and vice versa, huge data entry requirements, and delayed test results. However, a revamp in data management procedures, and systems helped to improve the turnaround time for reporting results to all interested parties and partners. Additionally, inconsistencies such as multiple entries and discrepant patient-sample information were resolved by introducing a barcoding electronic capture system. Here, we describe the main challenges with COVID-19 data management and analysis in the laboratory and recommend measures for improvement.
FUNDING
The work was supported by the Government of Ghana.
Topics: COVID-19; Data Management; Disease Outbreaks; Ghana; Humans; Laboratories; Pandemics; SARS-CoV-2
PubMed: 35233115
DOI: 10.4314/gmj.v55i2s.8 -
Sichuan Da Xue Xue Bao. Yi Xue Ban =... Sep 2022Focusing on the undergraduate specialty construction of big data management and application in medical colleges and universities in the context of New Medical Education,...
Focusing on the undergraduate specialty construction of big data management and application in medical colleges and universities in the context of New Medical Education, we first analyzed, in this paper, the demand for trained personnel of this specialization and the status of program construction at the national and regional levels. Then, taking Anhui Medical University as an example, a key medical university in Anhui Province, we introduced the preparations made by medical colleges and universities to set up big data management and application specialty. Finally, from the perspectives of the objectives of personnel training, curriculum system, and practical teaching system, we presented in detail the exploratory efforts made by Anhui Medical University to construct a training system for personnel specializing in big data management and application. In this paper, we reported mainly the work done on the exploration of the personnel training curriculum system, covering general education, professional education, and extracurricular activities, highlighting the interdisciplinary characteristics of a personnel training curricular system that integrates medicine, engineering, and management. We also reported on a practice teaching system that combined in-class practical teaching and extracurricular activities, and that incorporated tiered contents of increasing challenge--basic practice level, cognitive practice level, comprehensive practice level, and innovative practice level. This study is expected to provide useful references for the training of personnel specializing in medical big data in the context of New Medical Education.
Topics: Big Data; Curriculum; Data Management; Humans; Schools, Medical; Universities
PubMed: 36224679
DOI: 10.12182/20220960302 -
BMC Genomics Dec 2022As the amount of genomic data continues to grow, there is an increasing need for systematic ways to organize, explore, compare, analyze and share this data. Despite...
BACKGROUND
As the amount of genomic data continues to grow, there is an increasing need for systematic ways to organize, explore, compare, analyze and share this data. Despite this, there is a lack of suitable platforms to meet this need.
RESULTS
OpenGenomeBrowser is a self-hostable, open-source platform to manage access to genomic data and drastically simplifying comparative genomics analyses. It enables users to interactively generate phylogenetic trees, compare gene loci, browse biochemical pathways, perform gene trait matching, create dot plots, execute BLAST searches, and access the data. It features a flexible user management system, and its modular folder structure enables the organization of genomic data and metadata, and to automate analyses. We tested OpenGenomeBrowser with bacterial, archaeal and yeast genomes. We provide a docker container to make installation and hosting simple. The source code, documentation, tutorials for OpenGenomeBrowser are available at opengenomebrowser.github.io and a demo server is freely accessible at opengenomebrowser.bioinformatics.unibe.ch .
CONCLUSIONS
To our knowledge, OpenGenomeBrowser is the first self-hostable, database-independent comparative genome browser. It drastically simplifies commonly used bioinformatics workflows and enables convenient as well as fast data exploration.
Topics: Phylogeny; Data Management; Genomics; Genome; Computational Biology; Software
PubMed: 36575383
DOI: 10.1186/s12864-022-09086-3 -
PloS One 2022Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while...
Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.
Topics: Computer Simulation; Data Management; Ecosystem; Reproducibility of Results; Software; Workflow
PubMed: 35271613
DOI: 10.1371/journal.pone.0264492 -
Journal of Medical Internet Research Nov 2021Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time.... (Review)
Review
BACKGROUND
Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time. Hence, to aid in diagnosing skin cancer, artificial intelligence (AI) tools are being used, including shallow and deep machine learning-based methodologies that are trained to detect and classify skin cancer using computer algorithms and deep neural networks.
OBJECTIVE
The aim of this study was to identify and group the different types of AI-based technologies used to detect and classify skin cancer. The study also examined the reliability of the selected papers by studying the correlation between the data set size and the number of diagnostic classes with the performance metrics used to evaluate the models.
METHODS
We conducted a systematic search for papers using Institute of Electrical and Electronics Engineers (IEEE) Xplore, Association for Computing Machinery Digital Library (ACM DL), and Ovid MEDLINE databases following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. The studies included in this scoping review had to fulfill several selection criteria: being specifically about skin cancer, detecting or classifying skin cancer, and using AI technologies. Study selection and data extraction were independently conducted by two reviewers. Extracted data were narratively synthesized, where studies were grouped based on the diagnostic AI techniques and their evaluation metrics.
RESULTS
We retrieved 906 papers from the 3 databases, of which 53 were eligible for this review. Shallow AI-based techniques were used in 14 studies, and deep AI-based techniques were used in 39 studies. The studies used up to 11 evaluation metrics to assess the proposed models, where 39 studies used accuracy as the primary evaluation metric. Overall, studies that used smaller data sets reported higher accuracy.
CONCLUSIONS
This paper examined multiple AI-based skin cancer detection models. However, a direct comparison between methods was hindered by the varied use of different evaluation metrics and image types. Performance scores were affected by factors such as data set size, number of diagnostic classes, and techniques. Hence, the reliability of shallow and deep models with higher accuracy scores was questionable since they were trained and tested on relatively small data sets of a few diagnostic classes.
Topics: Algorithms; Artificial Intelligence; Data Management; Humans; Reproducibility of Results; Skin Neoplasms
PubMed: 34821566
DOI: 10.2196/22934