-
BioRxiv : the Preprint Server For... May 2024Cryo-electron tomography (cryo-ET) and subtomogram averaging (STA) are becoming the preferred methodologies for investigating subcellular and macromolecular structures...
Cryo-electron tomography (cryo-ET) and subtomogram averaging (STA) are becoming the preferred methodologies for investigating subcellular and macromolecular structures in native or near-native environments. While cryo-ET is amenable to a wide range of biological problems, these problems often have data processing requirements that need to be individually optimized, precluding the notion of a one-size-fits-all processing pipeline. Cryo-ET data processing is also becoming progressively more complex due to an increasing number of packages for each processing step. Though each package has its own strengths and weaknesses, independent development and different data formats makes them difficult to interface with one another. TOMOMAN (TOMOgram MANager) is an extensible package for streamlining the interoperability of packages, enabling users to develop project-specific processing workflows. TOMOMAN does this by maintaining an internal metadata format and wrapping external packages to manage and perform preprocessing, from raw tilt-series data to reconstructed tomograms. TOMOMAN can also export this metadata between various STA packages. TOMOMAN also includes tools for archiving projects to data repositories; allowing subsequent users to download TOMOMAN projects and directly resume processing where it was previously left off. By tracking essential metadata, TOMOMAN streamlines data sharing, which improves reproducibility of published results, reduces computational costs by minimizing reprocessing, and enables distributed cryo-ET projects between multiple groups and institutions. TOMOMAN provides a way for users to test different software packages to develop processing workflows that meet the specific needs of their biological questions and to distribute their results with the broader scientific community.
PubMed: 38746401
DOI: 10.1101/2024.05.02.589639 -
Research Square Apr 2024In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new...
In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new process model for multimodal Data Fusion for Data Mining, integrating embeddings and the Cross-Industry Standard Process for Data Mining with the existing Data Fusion Information Group model. Our model aims to decrease computational costs, complexity, and bias while improving efficiency and reliability. We also propose "disentangled dense fusion," a novel embedding fusion method designed to optimize mutual information and facilitate dense inter-modality feature interaction, thereby minimizing redundant information. We demonstrate the model's efficacy through three use cases: predicting diabetic retinopathy using retinal images and patient metadata, domestic violence prediction employing satellite imagery, internet, and census data, and identifying clinical and demographic features from radiography images and clinical notes. The model achieved a Macro F1 score of 0.92 in diabetic retinopathy prediction, an R-squared of 0.854 and sMAPE of 24.868 in domestic violence prediction, and a macro AUC of 0.92 and 0.99 for disease prediction and sex classification, respectively, in radiological analysis. These results underscore the Data Fusion for Data Mining model's potential to significantly impact multimodal data processing, promoting its adoption in diverse, resource-constrained settings.
PubMed: 38746100
DOI: 10.21203/rs.3.rs-4277992/v1 -
Frontiers in Endocrinology 2024Decidualisation, the process whereby endometrial stromal cells undergo morphological and functional transformation in preparation for trophoblast invasion, is often...
Decidualisation, the process whereby endometrial stromal cells undergo morphological and functional transformation in preparation for trophoblast invasion, is often disrupted in women with polycystic ovary syndrome (PCOS) resulting in complications with pregnancy and/or infertility. The transcription factor Wilms tumour suppressor 1 (WT1) is a key regulator of the decidualization process, which is reduced in patients with PCOS, a complex condition characterized by increased expression of androgen receptor in endometrial cells and high presence of circulating androgens. Using genome-wide chromatin immunoprecipitation approaches on primary human endometrial stromal cells, we identify key genes regulated by WT1 during decidualization, including homeobox transcription factors which are important for regulating cell differentiation. Furthermore, we found that AR in PCOS patients binds to the same DNA regions as WT1 in samples from healthy endometrium, suggesting dysregulation of genes important to decidualisation pathways in PCOS endometrium due to competitive binding between WT1 and AR. Integrating RNA-seq and H3K4me3 and H3K27ac ChIP-seq metadata with our WT1/AR data, we identified a number of key genes involved in immune response and angiogenesis pathways that are dysregulated in PCOS patients. This is likely due to epigenetic alterations at distal enhancer regions allowing AR to recruit cofactors such as MAGEA11, and demonstrates the consequences of AR disruption of WT1 in PCOS endometrium.
Topics: Humans; Female; Polycystic Ovary Syndrome; Endometrium; WT1 Proteins; Receptors, Androgen; Stromal Cells; Adult; Regulatory Sequences, Nucleic Acid
PubMed: 38745948
DOI: 10.3389/fendo.2024.1368494 -
Methods of Information in Medicine May 2024Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.
BACKGROUND
Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.
OBJECTIVE
To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal).
METHODS
The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models.
RESULTS
The most frequent keyword is "clinical trial" ( = 18,777), and the most frequent disease-specific keyword is "breast neoplasms" ( = 1,943). Most data items are available in English ( = 545,749) and German ( = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes.
CONCLUSION
To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.
PubMed: 38740374
DOI: 10.1055/s-0044-1786839 -
Translational Cancer Research Apr 2024The carcinogenesis and progression of colon adenocarcinoma (COAD) are intensively related to the abnormal expression of the zinc finger (ZNF) protein genes. We aimed to...
BACKGROUND
The carcinogenesis and progression of colon adenocarcinoma (COAD) are intensively related to the abnormal expression of the zinc finger (ZNF) protein genes. We aimed to employ these genes to provide a reliable prognosis and treatment stratification tool for COAD patients.
METHODS
Cox and the least absolute shrinkage and selection operator (LASSO) regression analysis were applied, utilizing The Cancer Genome Atlas (TCGA) metadata, to build a ZNF protein gene-based prognostic model. Using this model, patients in the training cohort and testing cohort (GSE17537) were labelled as either high or low risk. Kaplan-Meier (KM) survival analysis and time-dependent receiver operating characteristic (ROC) curve analysis were performed in the patients with opposite risk status to assess the predictive ability in each cohort. The potentiality of the mechanism was explored by the estimation of stromal and immune cells in malignant tumor tissues using expression data (ESTIMATE), single-sample gene set enrichment analysis (ssGSEA), gene set enrichment analysis (GSEA), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG). Finally, the degrees of expression of model genes were validated by immunohistochemistry (IHC).
RESULTS
The prognostic model consisting of INSM1, PHF21B, RNF138, SYTL4, WRNIP1, ZNF585B, and ZNF514, classified patients into opposite risk statuses. Patients in the high-risk subset had a considerably lower chance of surviving compared to those in the low-risk subset. There is a high probability that these model genes were attached to immune-related biological processes, which can be confirmed by the results of the above mechanistic methods. Moreover, patients in the low-risk subset also significantly outperformed the patients in the high-risk subset when calculating immune cells and function scores. Drug sensitivity and tumor immune dysfunction and exclusion (TIDE) analyses showed a clear difference in the immunological and chemotherapeutic efficacy predictions within the two risk groups. Additionally, the degrees of expression of model genes in high-risk and low-risk subsets presented great discrepancies.
CONCLUSIONS
The signature may be applied as a predictive classifier to shepherd special medication for COAD patients.
PubMed: 38737696
DOI: 10.21037/tcr-23-2158 -
Diagnostics (Basel, Switzerland) Apr 2024Cardiovascular diseases (CVDs) are a leading cause of mortality worldwide. Early detection and effective risk assessment are crucial for implementing preventive measures...
Cardiovascular diseases (CVDs) are a leading cause of mortality worldwide. Early detection and effective risk assessment are crucial for implementing preventive measures and improving patient outcomes for CVDs. This work presents a novel approach to CVD risk assessment using fundus images, leveraging the inherent connection between retinal microvascular changes and systemic vascular health. This study aims to develop a predictive model for the early detection of CVDs by evaluating retinal vascular parameters. This methodology integrates both handcrafted features derived through mathematical computation and retinal vascular patterns extracted by artificial intelligence (AI) models. By combining these approaches, we seek to enhance the accuracy and reliability of CVD risk prediction in individuals. The methodology integrates state-of-the-art computer vision algorithms and AI techniques in a multi-stage architecture to extract relevant features from retinal fundus images. These features encompass a range of vascular parameters, including vessel caliber, tortuosity, and branching patterns. Additionally, a deep learning (DL)-based binary classification model is incorporated to enhance predictive accuracy. A dataset comprising fundus images and comprehensive metadata from the clinical trials conducted is utilized for training and validation. The proposed approach demonstrates promising results in the early prediction of CVD risk factors. The interpretability of the approach is enhanced through visualization techniques that highlight the regions of interest within the fundus images that are contributing to the risk predictions. Furthermore, the validation conducted in the clinical trials and the performance analysis of the proposed approach shows the potential to provide early and accurate predictions. The proposed system not only aids in risk stratification but also serves as a valuable tool for identifying vascular abnormalities that may precede overt cardiovascular events. The approach has achieved an accuracy of 85% and the findings of this study underscore the feasibility and efficacy of leveraging fundus images for cardiovascular risk assessment. As a non-invasive and cost-effective modality, fundus image analysis presents a scalable solution for population-wide screening programs. This research contributes to the evolving landscape of precision medicine by providing an innovative tool for proactive cardiovascular health management. Future work will focus on refining the solution's robustness, exploring additional risk factors, and validating its performance in additional and diverse clinical settings.
PubMed: 38732342
DOI: 10.3390/diagnostics14090928 -
Diagnostics (Basel, Switzerland) Apr 2024Circulating tumor DNA (ctDNA) holds promise as a biomarker for predicting clinical responses to therapy in solid tumors, and multiple ctDNA assays are in development....
Circulating tumor DNA (ctDNA) holds promise as a biomarker for predicting clinical responses to therapy in solid tumors, and multiple ctDNA assays are in development. However, the heterogeneity in ctDNA levels prior to treatment (baseline) across different cancer types and stages and across ctDNA assays has not been widely studied. Friends of Cancer Research formed a collaboration across multiple commercial ctDNA assay developers to assess baseline ctDNA levels across five cancer types in early- and late-stage disease. This retrospective study included eight commercial ctDNA assay developers providing summary-level de-identified data for patients with non-small cell lung cancer (NSCLC), bladder, breast, prostate, and head and neck squamous cell carcinoma following a common analysis protocol. Baseline ctDNA levels across late-stage cancer types were similarly detected, highlighting the potential use of ctDNA as a biomarker in these cancer types. Variability was observed in ctDNA levels across assays in early-stage NSCLC, indicative of the contribution of assay analytical performance and methodology on variability. We identified key data elements, including assay characteristics and clinicopathological metadata, that need to be standardized for future meta-analyses across multiple assays. This work facilitates evidence generation opportunities to support the use of ctDNA as a biomarker for clinical response.
PubMed: 38732326
DOI: 10.3390/diagnostics14090912 -
Scientific Data May 2024This work presents a maturity model for assessing catalogues of semantic artefacts, one of the keystones that permit semantic interoperability of systems. We defined the...
This work presents a maturity model for assessing catalogues of semantic artefacts, one of the keystones that permit semantic interoperability of systems. We defined the dimensions and related features to include in the maturity model by analysing the current literature and existing catalogues of semantic artefacts provided by experts. In addition, we assessed 26 different catalogues to demonstrate the effectiveness of the maturity model, which includes 12 different dimensions (Metadata, Openness, Quality, Availability, Statistics, PID, Governance, Community, Sustainability, Technology, Transparency, and Assessment) and 43 related features (or sub-criteria) associated with these dimensions. Such a maturity model is one of the first attempts to provide recommendations for governance and processes for preserving and maintaining semantic artefacts and helps assess/address interoperability challenges.
PubMed: 38730252
DOI: 10.1038/s41597-024-03185-4 -
PloS One 2024Efficient NTDs elimination strategies require effective surveillance and targeted interventions. Traditional methods are costly and time-consuming, often failing to...
INTRODUCTION
Efficient NTDs elimination strategies require effective surveillance and targeted interventions. Traditional methods are costly and time-consuming, often failing to cover entire populations in case of movement restrictions. To address these challenges, a morbidity image-based surveillance system is being developed. This innovative approach which leverages the smartphone technology aims at simultaneous surveillance of multiple NTDs, enhancing cost-efficiency, reliability, and community involvement, particularly in areas with movement constraints. Moreover, it holds promise for post-elimination surveillance.
METHODOLOGY
The pilot of this method will be conducted across three states in southern Nigeria. It will target people affected by Neglected Tropical Diseases and members of their communities. The new surveillance method will be introduced to target communities in the selected states through community stakeholder's advocacy meetings and awareness campaigns. The pilot which is set to span eighteen months, entails sensitizing NTDs-affected individuals and community members using signposts, posters, and handbills, to capture photos of NTDs manifestations upon notice using smartphones. These images, along with pertinent demographic information, will be transmitted to a dedicated server through WhatsApp or Telegram accounts. The received images will be reviewed and organized at backend and then forwarded to a panel of experts for identification and annotation to specific NTDs. Data generated, along with geocoordinate information, will be used to create NTDs morbidity hotspot maps using ArcGIS. Accompanying metadata will be used to generate geographic and demographic distributions of various NTDs identified. To protect privacy, people will be encouraged to send manifestation photos of the affected body part only without any identifiable features.
EVALUATION PROTOCOL
NTDs prevalence data obtained using conventional surveillance methods from both the pilot and selected control states during the pilot period will be compared with data from the CIMS-NTDs method to determine its effectiveness.
EXPECTED RESULTS AND CONCLUSION
It is expected that an effective, privacy-conscious, population inclusive new method for NTDs surveillance, with the potential to yield real-time data for the identification of morbidity hotspots and distribution patterns of NTDs will be established. The results will provide insights into the effectiveness of the new surveillance method in comparison to traditional approaches, potentially advancing NTDs elimination strategies.
Topics: Neglected Diseases; Humans; Nigeria; Crowdsourcing; Smartphone; Pilot Projects; Tropical Medicine; Population Surveillance; Morbidity
PubMed: 38728272
DOI: 10.1371/journal.pone.0303179 -
BMC Bioinformatics May 2024Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with...
BACKGROUND
Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing.
RESULTS
Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources.
CONCLUSIONS
Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
Topics: Workflow; Software; Data Curation; Metadata; Databases, Genetic; Genomics; Computational Biology
PubMed: 38724907
DOI: 10.1186/s12859-024-05803-9