-
PloS One 2022Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while...
Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.
Topics: Computer Simulation; Data Management; Ecosystem; Reproducibility of Results; Software; Workflow
PubMed: 35271613
DOI: 10.1371/journal.pone.0264492 -
Journal of Medical Internet Research Nov 2021Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time.... (Review)
Review
BACKGROUND
Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time. Hence, to aid in diagnosing skin cancer, artificial intelligence (AI) tools are being used, including shallow and deep machine learning-based methodologies that are trained to detect and classify skin cancer using computer algorithms and deep neural networks.
OBJECTIVE
The aim of this study was to identify and group the different types of AI-based technologies used to detect and classify skin cancer. The study also examined the reliability of the selected papers by studying the correlation between the data set size and the number of diagnostic classes with the performance metrics used to evaluate the models.
METHODS
We conducted a systematic search for papers using Institute of Electrical and Electronics Engineers (IEEE) Xplore, Association for Computing Machinery Digital Library (ACM DL), and Ovid MEDLINE databases following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. The studies included in this scoping review had to fulfill several selection criteria: being specifically about skin cancer, detecting or classifying skin cancer, and using AI technologies. Study selection and data extraction were independently conducted by two reviewers. Extracted data were narratively synthesized, where studies were grouped based on the diagnostic AI techniques and their evaluation metrics.
RESULTS
We retrieved 906 papers from the 3 databases, of which 53 were eligible for this review. Shallow AI-based techniques were used in 14 studies, and deep AI-based techniques were used in 39 studies. The studies used up to 11 evaluation metrics to assess the proposed models, where 39 studies used accuracy as the primary evaluation metric. Overall, studies that used smaller data sets reported higher accuracy.
CONCLUSIONS
This paper examined multiple AI-based skin cancer detection models. However, a direct comparison between methods was hindered by the varied use of different evaluation metrics and image types. Performance scores were affected by factors such as data set size, number of diagnostic classes, and techniques. Hence, the reliability of shallow and deep models with higher accuracy scores was questionable since they were trained and tested on relatively small data sets of a few diagnostic classes.
Topics: Algorithms; Artificial Intelligence; Data Management; Humans; Reproducibility of Results; Skin Neoplasms
PubMed: 34821566
DOI: 10.2196/22934 -
Journal of Integrative Bioinformatics Dec 2022Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and...
Core facilities have to offer technologies that best serve the needs of their users and provide them a competitive advantage in research. They have to set up and maintain instruments in the range of ten to a hundred, which produce large amounts of data and serve thousands of active projects and customers. Particular emphasis has to be given to the reproducibility of the results. More and more, the entire process from building the research hypothesis, conducting the experiments, doing the measurements, through the data explorations and analysis is solely driven by very few experts in various scientific fields. Still, the ability to perform the entire data exploration in real-time on a personal computer is often hampered by the heterogeneity of software, the data structure formats of the output, and the enormous data sizes. These impact the design and architecture of the implemented software stack. At the Functional Genomics Center Zurich (FGCZ), a joint state-of-the-art research and training facility of ETH Zurich and the University of Zurich, we have developed the B-Fabric system, which has served for more than a decade, an entire life sciences community with fundamental data science support. In this paper, we sketch how such a system can be used to glue together data (including metadata), computing infrastructures (clusters and clouds), and visualization software to support instant data exploration and visual analysis. We illustrate our in-daily life implemented approach using visualization applications of mass spectrometry data.
Topics: Data Management; Reproducibility of Results; Software; Genomics
PubMed: 36073980
DOI: 10.1515/jib-2022-0031 -
GigaScience Dec 2022The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience...
The importance of effective research data management (RDM) strategies to support the generation of Findable, Accessible, Interoperable, and Reusable (FAIR) neuroscience data grows with each advance in data acquisition techniques and research methods. To maximize the impact of diverse research strategies, multidisciplinary, large-scale neuroscience research consortia face a number of unsolved challenges in RDM. While open science principles are largely accepted, it is practically difficult for researchers to prioritize RDM over other pressing demands. The implementation of a coherent, executable RDM plan for consortia spanning animal, human, and clinical studies is becoming increasingly challenging. Here, we present an RDM strategy implemented for the Heidelberg Collaborative Research Consortium. Our consortium combines basic and clinical research in diverse populations (animals and humans) and produces highly heterogeneous and multimodal research data (e.g., neurophysiology, neuroimaging, genetics, behavior). We present a concrete strategy for initiating early-stage RDM and FAIR data generation for large-scale collaborative research consortia, with a focus on sustainable solutions that incentivize incremental RDM while respecting research-specific requirements.
Topics: Animals; Humans; Data Management; Neuroimaging; Research Personnel
PubMed: 37401720
DOI: 10.1093/gigascience/giad049 -
Molecular & Cellular Proteomics : MCP 2021Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories....
Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitive, human proteomics data. Both in the United States and in the European Union, there are legal frameworks protecting the privacy of individuals. Implementing privacy standards for publicly released research data in genomics and transcriptomics has led to processes to control who may access the data, so-called "controlled access" data. In parallel with the technological developments in the field, it is clear that the privacy risks of sharing proteomics data need to be properly assessed and managed. In our view, the proteomics community must be proactive in addressing these issues. Yet a careful balance must be kept. On the one hand, neglecting to address the potential of identifiability in human proteomics data could lead to reputational damage of the field, while on the other hand, erecting barriers to open access to clinical proteomics data will inevitably reduce reuse of proteomics data and could substantially delay critical discoveries in biomedical research. In order to balance these apparently conflicting requirements for data privacy and efficient use and reuse of research efforts through the sharing of clinical proteomics data, development efforts will be needed at different levels including bioinformatics infrastructure, policymaking, and mechanisms of oversight.
Topics: Confidentiality; Data Management; Humans; Information Dissemination; Proteomics
PubMed: 33711481
DOI: 10.1016/j.mcpro.2021.100071 -
Journal of Healthcare Engineering 2021Nowadays, the adoption of Internet of Things (IoT) technology worldwide is accelerating the digital transformation of healthcare industry. In this context, smart...
Nowadays, the adoption of Internet of Things (IoT) technology worldwide is accelerating the digital transformation of healthcare industry. In this context, smart healthcare (s-healthcare) solutions are ensuring better and innovative opportunities for healthcare providers to improve patients' care. However, these solutions raise also new challenges in terms of security and privacy due to the diversity of stakeholders, the centralized data management, and the resulting lack of trustworthiness, accountability, and control. In this paper, we propose an end-to-end Blockchain-based and privacy-preserving framework called SmartMedChain for data sharing in s-healthcare environment. The Blockchain is built on Hyperledger Fabric and stores encrypted health data by using the InterPlanetary File System (IPFS), a distributed data storage solution with high resiliency and scalability. Indeed, compared to other propositions and based on the concept of smart contracts, our solution combines both data access control and data usage auditing measures for both Medical IoT data and Electronic Health Records (EHRs) generated by s-healthcare services. In addition, s-healthcare stakeholders can be held accountable by introducing an innovative Privacy Agreement Management scheme that monitors the execution of the service in respect of patient preferences and in accordance with relevant privacy laws. Security analysis and experimental results show that the proposed SmartMedChain is feasible and efficient for s-healthcare environments.
Topics: Blockchain; Data Management; Delivery of Health Care; Electronic Health Records; Humans; Privacy
PubMed: 34777733
DOI: 10.1155/2021/4145512 -
Database : the Journal of Biological... Oct 2023The European Union Data Collection Framework (DCF) states that scientific data-driven assessments are essential to achieve sustainable fisheries. To respond to the DCF...
The European Union Data Collection Framework (DCF) states that scientific data-driven assessments are essential to achieve sustainable fisheries. To respond to the DCF call, this study introduces the information systems developed and used by Institut Català de Recerca per a la Governança del Mar (ICATMAR), the Catalan Institute of Research for the Governance of the Seas. The information systems include data from a biological monitoring, curation, processing, analysis, publication and web visualization for bottom trawl fisheries. Over the 4 years of collected data (2019-2022), the sampling program developed a dataset of over 1.1 million sampled individuals accounting for 24.6 tons of catch. The sampling data are ingested into a database through a data input website ensuring data management control and quality. The standardized metrics are automatically calculated and the data are published in the web visualizer, combined with fishing landings and Vessel Monitoring System (VMS) records. As the combination of remote sensing data with fisheries monitoring offers new approaches for ecosystem assessment, the collected fisheries data are also visualized in combination with georeferenced seabed habitats from the European Marine Observation and Data Network (EMODnet), climate and sea conditions from Copernicus Monitoring Environment Marine Service (CMEMS) on the web browser. Three public web-based products have been developed in the visualizer: geolocated bottom trawl samplings, biomass distribution per port or season and length-frequency charts per species. These information systems aim to fulfil the gaps in the scientific community, administration and civil society to access high-quality data for fisheries management, following the Findable, Accessible, Interoperable, Reusable (FAIR) principles, enabling scientific knowledge transfer. Database URL https://icatmar.github.io/VISAP/(www.icatmar.cat).
Topics: Humans; Animals; Ecosystem; Fisheries; Data Management; Data Collection; Web Browser; Fishes
PubMed: 37864836
DOI: 10.1093/database/baad067 -
Journal of Pain and Symptom Management Jul 2022Prospective cohort studies of individuals with serious illness and their family members, such as children receiving palliative care and their parents, pose challenges... (Review)
Review
CONTEXT
Prospective cohort studies of individuals with serious illness and their family members, such as children receiving palliative care and their parents, pose challenges regarding data management.
OBJECTIVE
To describe the design and lessons learned regarding the data management system for the Pediatric Palliative Care Research Network's Shared Data and Research (SHARE) project, a multicenter prospective cohort study of children receiving pediatric palliative care (PPC) and their parents, and to describe important attributes of this system, with specific considerations for the design of future studies.
METHODS
The SHARE study consists of 643 PPC patients and up to two of their parents who enrolled from April 2017 to December 2020 at seven children's hospitals across the United States. Data regarding demographics, patient symptoms, goals of care, and other characteristics were collected directly from parents or patients at 6 timepoints over a 24-month follow-up period and stored electronically in a centralized location. Using medical record numbers, primary collected data was linked to administrative hospitalization data containing diagnostic and procedure codes and other data elements. Important attributes of the data infrastructure include linkage of primary and administrative data; centralized availability of multilingual questionnaires; electronic data collection and storage system; time-stamping of instrument completion; and a separate but connected study administrative database used to track enrollment.
CONCLUSIONS
Investigators planning future multicenter prospective cohort studies can consider attributes of the data infrastructure we describe when designing their data management system.
Topics: Child; Cohort Studies; Data Management; Humans; Multicenter Studies as Topic; Palliative Care; Prospective Studies; Surveys and Questionnaires; United States
PubMed: 35339611
DOI: 10.1016/j.jpainsymman.2022.03.006 -
International Journal of Population... 2021Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual...
Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
Topics: Alberta; Cohort Studies; Data Collection; Data Management; Female; Humans; Pregnancy; Sample Size
PubMed: 34888420
DOI: 10.23889/ijpds.v6i1.1680 -
Trials Mar 2022Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in...
BACKGROUND
Clinical trials play an important role in expanding the knowledge of diabetes prevention, diagnosis, and treatment, and data management is one of the main issues in clinical trials. Lack of appropriate planning for data management in clinical trials may negatively influence achieving the desired results. The aim of this study was to explore data management processes in diabetes clinical trials in three research institutes in Iran.
METHOD
This was a qualitative study conducted in 2019. In this study, data were collected through in-depth semi-structured interviews with 16 researchers in three endocrinology and metabolism research institutes. To analyze data, the method of thematic analysis was used.
RESULTS
The five themes that emerged from data analysis included (1) clinical trial data collection, (2) technologies used in data management, (3) data security and confidentiality management, (4) data quality management, and (5) data management standards. In general, the findings indicated that no clear and standard process was used for data management in diabetes clinical trials, and each research center executed its own methods and processes.
CONCLUSION
According to the results, the common methods of data management in diabetes clinical trials included a set of paper-based processes. It seems that using information technology can help facilitate data management processes in a variety of clinical trials, including diabetes clinical trials.
Topics: Data Management; Diabetes Mellitus; Humans; Iran; Qualitative Research; Research Personnel
PubMed: 35241149
DOI: 10.1186/s13063-022-06110-5