-
Digital Health 2024The study aimed to propose a multimodal model that incorporates both macroscopic and microscopic images and analyze its influence on clinicians' decision-making with...
OBJECTIVES
The study aimed to propose a multimodal model that incorporates both macroscopic and microscopic images and analyze its influence on clinicians' decision-making with different levels of experience.
METHODS
First, we constructed a multimodal dataset for five skin disorders. Next, we trained unimodal models on three different types of images and selected the best-performing models as the base learners. Then, we used a soft voting strategy to create the multimodal model. Finally, 12 clinicians were divided into three groups, with each group including one director dermatologist, one dermatologist-in-charge, one resident dermatologist, and one general practitioner. They were asked to diagnose the skin disorders in four unaided situations (macroscopic images only, dermatopathological images only, macroscopic and dermatopathological images, all images and metadata), and three aided situations (macroscopic images with model 1 aid, dermatopathological images with model 2&3 aid, all images with multimodal model 4 aid). The clinicians' diagnosis accuracy and time for each diagnosis were recorded.
RESULTS
Among the trained models, the vision transformer (ViT) achieved the best performance, with accuracies of 0.8636, 0.9545, 0.9673, and AUCs of 0.9823, 0.9952, 0.9989 on the training set, respectively. However, on the external validation set, they only achieved accuracies of 0.70, 0.90, and 0.94, respectively. The multimodal model performed well compared to the unimodal models, achieving an accuracy of 0.98 on the external validation set. The results of logit regression analysis indicate that all models are helpful to clinicians in making diagnostic decisions [Odds Ratios (OR) > 1], while metadata does not provide assistance to clinicians (OR < 1). Linear analysis results indicate that metadata significantly increases clinicians' diagnosis time ( < 0.05), while model assistance does not ( > 0.05).
CONCLUSIONS
The results of this study suggest that the multimodal model effectively improves clinicians' diagnostic performance without significantly increasing the diagnostic time. However, further large-scale prospective studies are necessary.
PubMed: 38784049
DOI: 10.1177/20552076241257087 -
Scientific Data May 2024Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for...
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Topics: Biomedical Research; Data Management; Metadata; Software
PubMed: 38778016
DOI: 10.1038/s41597-024-03349-2 -
Frontiers in Cellular and Infection... 2024Sharing microbiome data among researchers fosters new innovations and reduces cost for research. Practically, this means that the (meta)data will have to be...
INTRODUCTION
Sharing microbiome data among researchers fosters new innovations and reduces cost for research. Practically, this means that the (meta)data will have to be standardized, transparent and readily available for researchers. The microbiome data and associated metadata will then be described with regards to composition and origin, in order to maximize the possibilities for application in various contexts of research. Here, we propose a set of tools and protocols to develop a real-time FAIR (Findable. Accessible, Interoperable and Reusable) compliant database for the handling and storage of human microbiome and host-associated data.
METHODS
The conflicts arising from privacy laws with respect to metadata, possible human genome sequences in the metagenome shotgun data and FAIR implementations are discussed. Alternate pathways for achieving compliance in such conflicts are analyzed. Sample traceable and sensitive microbiome data, such as DNA sequences or geolocalized metadata are identified, and the role of the GDPR (General Data Protection Regulation) data regulations are considered. For the construction of the database, procedures have been realized to make data FAIR compliant, while preserving privacy of the participants providing the data.
RESULTS AND DISCUSSION
An open-source development platform, Supabase, was used to implement the microbiome database. Researchers can deploy this real-time database to access, upload, download and interact with human microbiome data in a FAIR complaint manner. In addition, a large language model (LLM) powered by ChatGPT is developed and deployed to enable knowledge dissemination and non-expert usage of the database.
Topics: Humans; Microbiota; Databases, Factual; Metadata; Metagenome; Information Dissemination; Computational Biology; Metagenomics; Databases, Genetic
PubMed: 38774631
DOI: 10.3389/fcimb.2024.1384809 -
Journal of Environmental Management Jun 2024Drawing upon an extensive body of valuation literature focused on water quality, this paper performs a meta-analysis benefit transfer exercise aimed at quantifying... (Meta-Analysis)
Meta-Analysis
Drawing upon an extensive body of valuation literature focused on water quality, this paper performs a meta-analysis benefit transfer exercise aimed at quantifying willingness to pay (WTP) for an enhancement in drinking water quality for households that have been directly exposed to Perfluoroalkyl Substances (PFAS) over recent decades in Italy. The analysis compiles metadata of 72 WTP estimates extracted from 40 previous valuation studies conducted in advanced economies. The benefit transfer is realized estimating a meta regression model (MRM) which includes both study design and socio-economic explanatory variables, according to the Weak Structural Utility Theoretic approach. To determine the most suitable MRM specification, a comparative evaluation of various model configurations is developed exploiting the Least Absolute Shrinkage and Selection Operator (LASSO) selection criterion, and assessing their predictive performances in terms of transfer error and explanatory power. The mean transfer error (MTE) and the adjusted R-squared of the preferred MRM are in line with past published meta-analyses (0.665 and 0.607, respectively). The parameters estimated in the model align with both economic theory and intuition. The benefit transfer process results in an estimated annual WTP of € 250.80 per household for improved drinking water quality in the PFAS-affected area and an aggregated value of social benefits from PFAS decontamination of around € 12 million.
Topics: Drinking Water; Water Quality; Water Pollutants, Chemical; Humans; Italy; Fluorocarbons
PubMed: 38772240
DOI: 10.1016/j.jenvman.2024.121143 -
PloS One 2024Typical machine learning classification benchmark problems often ignore the full input data structures present in real-world classification problems. Here we aim to...
Typical machine learning classification benchmark problems often ignore the full input data structures present in real-world classification problems. Here we aim to represent additional information as "hints" for classification. We show that under a specific realistic conditional independence assumption, the hint information can be included by late fusion. In two experiments involving image classification with hints taking the form of text metadata, we demonstrate the feasibility and performance of the fusion scheme. We fuse the output of pre-trained image classifiers with the output of pre-trained text models. We show that calibration of the pre-trained models is crucial for the performance of the fused model. We compare the performance of the fusion scheme with a mid-level fusion scheme based on support vector machines and find that these two methods tend to perform quite similarly, albeit the late fusion scheme has only negligible computational costs.
Topics: Support Vector Machine; Machine Learning; Algorithms; Image Processing, Computer-Assisted; Humans
PubMed: 38771772
DOI: 10.1371/journal.pone.0301360 -
Microbiology Spectrum May 2024causes animal tuberculosis in livestock and wildlife, with an impact on animal health and production, wildlife management, and public health. In this work, we sampled a...
UNLABELLED
causes animal tuberculosis in livestock and wildlife, with an impact on animal health and production, wildlife management, and public health. In this work, we sampled a multi-host tuberculosis community from the official hotspot risk area of Portugal over 16 years, generating the largest available data set in the country. Using phylogenetic and ecological modeling, we aimed to reconstruct the history of circulating lineages across the livestock-wildlife interface to inform intervention and the implementation of genomic surveillance within the official eradication plan. We find evidence for the co-circulation of European 1 (Eu1), Eu2, and Eu3 clonal complexes, with Eu3 providing sufficient temporal signal for further phylogenetic investigation. The Eu3 most recent common ancestor (bovine) was dated in the 1990s, subsequently transitioning to wildlife (red deer and wild boar). Isolate clustering based on sample metadata was used to inform phylogenetic inference, unravelng frequent transmission between two clusters that represent an ecological corridor of previously unrecognized importance in Portugal. The latter was associated with transmission at the livestock-wildlife interface toward locations with higher temperature and precipitation, lower agriculture and road density, and lower host densities. This is the first analysis of Eu3 complex in Iberia, shedding light on background ecological factors underlying long-term transmission and informing where efforts could be focused within the larger hotspot risk area of Portugal.
IMPORTANCE
Efforts to strengthen surveillance and control of animal tuberculosis (TB) are ongoing worlwide. Here, we developed an eco-phylodynamic framework based on discrete phylogenetic approaches informed by whole-genome sequence data representing a multi-host transmission system at the livestock-wildlife interface, within a rich ecological landscape in Portugal, to understand transmission processes and translate this knowledge into disease management benefits. We find evidence for the co-circulation of several clades, with frequent transmission of the Eu3 lineage among cattle and wildlife populations. Most transition events between different ecological settings took place toward host, climate and land use gradients, underscoring animal TB expansion and a potential corridor of unrecognized importance for maintenance. Results stress that animal TB is an established wildlife disease without ecological barriers, showing that control measures in place are insufficient to prevent long-distance transmission and spillover across multi-host communities, demanding new interventions targeting livestock-wildlife interactions.
PubMed: 38771094
DOI: 10.1128/spectrum.03829-23 -
Data in Brief Jun 2024The Google Play Store is widely recognized as one of the largest platforms for downloading applications, both free and paid. On a daily basis, millions of users avail...
CONTEXT
The Google Play Store is widely recognized as one of the largest platforms for downloading applications, both free and paid. On a daily basis, millions of users avail themselves of this marketplace, sharing their thoughts through various means such as star ratings, user comments, suggestions, and feedback. These insights, in the form of comments and feedback, constitute a valuable resource for organizations, competitors, and emerging companies seeking to expand their market presence. These comments provide insights into app deficiencies, suggestions for new features, identified issues, and potential enhancements. Unlocking the potential of this repository of suggestions holds significant value.
OBJECTIVE
This study sought to gather and analyze user reviews from the Google Play store for leading game apps. The primary aim was to construct a dataset for subsequent analysis utilizing requirements engineering, machine learning, and competitive assessment.
METHODOLOGY
The authors employed a Python-based web scraping method to extract a comprehensive set of over 429,000+ reviews from the Google Play pages of selected apps. The scraped data encompassed reviewer names (removed due to privacy), ratings, and the textual content of the reviews.
RESULTS
The outcome was a dataset comprising the extracted user reviews, ratings, and associated metadata. A total of 429,000+ reviews were acquired through the scraping process for popular apps like Subway Surfers, Candy Crush Saga, PUBG Mobile, among others. This dataset not only serves as a valuable educational resource for instructors, aiding in the training of students in data analysis, but also offers practitioners the opportunity for in-depth examination and insights (in the past data of top apps).
PubMed: 38770040
DOI: 10.1016/j.dib.2024.110499 -
ArXiv May 2024Hyperpolarized (HP) C MRI has shown promise as a valuable modality for measurements of metabolism and is currently in human trials at 15 research sites worldwide. With...
Hyperpolarized (HP) C MRI has shown promise as a valuable modality for measurements of metabolism and is currently in human trials at 15 research sites worldwide. With this growth it is important to adopt standardized data storage practices as it will allow sites to meaningfully compare data. In this paper we (1) describe data that we believe should be stored and (2) demonstrate pipelines and methods that utilize the Digital Imaging and Communications in Medicine (DICOM) standard. This includes proposing a set of minimum set of information that is specific to HP C MRI studies. We then show where the majority of these can be fit into existing DICOM Attributes, primarily via the "Contrast/Bolus" module. We also demonstrate pipelines for utilizing DICOM for HP C MRI. DICOM is the most common standard for clinical medical image storage and provides the flexibility to accommodate the unique aspects of HP C MRI, including the HP agent information but also spectroscopic and metabolite dimensions. The pipelines shown include creating DICOM objects for studies on human and animal imaging systems with various pulse sequences. We also show a python-based method to efficiently modify DICOM objects to incorporate the unique HP C MRI information that is not captured by existing pipelines. Moreover, we propose best practices for HP C MRI data storage that will support future multi-site trials, research studies and technical developments of this imaging technique.
PubMed: 38764595
DOI: No ID Found -
American Journal of Obstetrics and... May 2024Gestational diabetes mellitus affects up to 10% of pregnancies and is classified into subtypes gestational diabetes subtype A1 (GDMA1) (managed by lifestyle...
BACKGROUND
Gestational diabetes mellitus affects up to 10% of pregnancies and is classified into subtypes gestational diabetes subtype A1 (GDMA1) (managed by lifestyle modifications) and gestational diabetes subtype A2 (GDMA2) (requiring medication). However, whether these subtypes are distinct clinical entities or more reflective of an extended spectrum of normal pregnancy endocrine physiology remains unclear.
OBJECTIVE
Integrated bulk RNA-sequencing (RNA-seq), single-cell RNA-sequencing (scRNA-seq), and spatial transcriptomics harbors the potential to reveal disease gene signatures in subsets of cells and tissue microenvironments. We aimed to combine these high-resolution technologies with rigorous classification of diabetes subtypes in pregnancy. We hypothesized that differences between preexisting type 2 and gestational diabetes subtypes would be associated with altered gene expression profiles in specific placental cell populations.
STUDY DESIGN
In a large case-cohort design, we compared validated cases of GDMA1, GDMA2, and type 2 diabetes mellitus (T2DM) to healthy controls by bulk RNA-seq (n=54). Quantitative analyses with reverse transcription and quantitative PCR of presumptive genes of significant interest were undertaken in an independent and nonoverlapping validation cohort of similarly well-characterized cases and controls (n=122). Additional integrated analyses of term placental single-cell, single-nuclei, and spatial transcriptomics data enabled us to determine the cellular subpopulations and niches that aligned with the GDMA1, GDMA2, and T2DM gene expression signatures at higher resolution and with greater confidence.
RESULTS
Dimensional reduction of the bulk RNA-seq data revealed that the most common source of placental gene expression variation was the diabetic disease subtype. Relative to controls, we found 2052 unique and significantly differentially expressed genes (-2
2 thresholds; q<0.05 Wald Test) among GDMA1 placental specimens, 267 among GDMA2, and 1520 among T2DM. Several candidate marker genes (chorionic somatomammotropin hormone 1 [CSH1], period circadian regulator 1 [PER1], phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta [PIK3CB], forkhead box O1 [FOXO1], epidermal growth factor receptor [EGFR], interleukin 2 receptor subunit beta [IL2RB], superoxide dismutase 3 [SOD3], dedicator of cytokinesis 5 [DOCK5], suppressor of glucose, and autophagy associated 1 [SOGA1]) were validated in an independent and nonoverlapping validation cohort (q<0.05 Tukey). Functional enrichment revealed the pathways and genes most impacted for each diabetes subtype, and the degree of proximal similarity to other subclassifications. Surprisingly, GDMA1 and T2DM placental signatures were more alike by virtue of increased expression of chromatin remodeling and epigenetic regulation genes, while albumin was the top marker for GDMA2 with increased expression of placental genes in the wound healing pathway. Assessment of these gene signatures in single-cell, single-nuclei, and spatial transcriptomics data revealed high specificity and variability by placental cell and microarchitecture types. For example, at the cellular and spatial (eg, microarchitectural) levels, distinguishing features were observed in extravillous trophoblasts (GDMA1) and macrophages (GDMA2). Lastly, we utilized these data to train and evaluate 4 machine learning models to estimate our confidence in predicting the control or diabetes status of placental transcriptome specimens with no available clinical metadata. CONCLUSION
Consistent with the distinct association of perinatal outcome risk, placentae from GDMA1, GDMA2, and T2DM-affected pregnancies harbor unique gene signatures that can be further distinguished by altered placental cellular subtypes and microarchitectural niches.
PubMed: 38763341
DOI: 10.1016/j.ajog.2024.05.014 -
The Journal of Adolescent Health :... Jun 2024To assess the relevance of the Sustainable Development Goals (SDGs) framework for adolescent health measurement, both in terms of age disaggregation and different health...
PURPOSE
To assess the relevance of the Sustainable Development Goals (SDGs) framework for adolescent health measurement, both in terms of age disaggregation and different health domains captured, and how the adolescent health indicators recommended by the Global Action for Measurement of Adolescent Health (GAMA) can complement the SDG framework.
METHODS
We conducted a desk review to systematically map all 248 SDG indicators using the UN metadata repository in three steps: 1) age-related mandates for SDG reporting; 2) linkages between the SDG indicators and priority areas for adolescent health measurement; 3) comparison between the GAMA indicators and the SDG framework.
RESULTS
Of the 248 SDG indicators, 35 (14%) targeted an age range overlapping with adolescence (10-19 years) and 33 (13%) called for age disaggregation. Only one indicator (3.7.2 "adolescent birth rate") covered the entire 10-19 age range. Almost half (41%) of the SDG indicators were directly related to adolescent health, but only 33 of those (13% of all SDG indicators) overlapped with the ages 10-19, and 15 (6% of all SDG indicators) explicitly mandated age disaggregation. Among the 47 GAMA indicators, five corresponded to existing SDG indicators, and eight were adolescent-specific age adaptations. Several GAMA indicators shed light on aspects not tracked in the SDG framework, such as obesity, mental health, physical activity, and bullying among 10-19-year-olds.
DISCUSSION
Adolescent health cannot be monitored comprehensively with the SDG framework alone. The GAMA indicators complement this framework via age-disaggregated adaptations and by tracking aspects of adolescent health currently absent from the SDGs.
Topics: Humans; Adolescent; Sustainable Development; Adolescent Health; Global Health; Child; Health Status Indicators; Goals; Female; Young Adult; Male
PubMed: 38762262
DOI: 10.1016/j.jadohealth.2024.01.004