-
Briefings in Bioinformatics Jan 2024Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of...
Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction through experimental approaches is expensive, time-consuming and error-prone. To accelerate and expedite the research related to enhancers identification and strength prediction, around 19 computational frameworks have been proposed. These frameworks used machine and deep learning methods that take raw DNA sequences and predict enhancer's presence and strength. However, these frameworks still lack in performance and are not useful in real time analysis. This paper presents a novel deep learning framework that uses language modeling strategies for transforming DNA sequences into statistical feature space. It applies transfer learning by training a language model in an unsupervised fashion by predicting a group of nucleotides also known as k-mers based on the context of existing k-mers in a sequence. At the classification stage, it presents a novel classifier that reaps the benefits of two different architectures: convolutional neural network and attention mechanism. The proposed framework is evaluated over the enhancer identification benchmark dataset where it outperforms the existing best-performing framework by 5%, and 9% in terms of accuracy and MCC. Similarly, when evaluated over the enhancer strength prediction benchmark dataset, it outperforms the existing best-performing framework by 4%, and 7% in terms of accuracy and MCC.
Topics: Benchmarking; Medicine; Neural Networks, Computer; Nucleotides; Regulatory Sequences, Nucleic Acid
PubMed: 38385876
DOI: 10.1093/bib/bbae030 -
Heliyon Feb 2024Allergic asthma is driven by an antigen-specific immune response. This study aimed to identify immune-related differentially expressed genes in childhood asthma and...
OBJECTIVE
Allergic asthma is driven by an antigen-specific immune response. This study aimed to identify immune-related differentially expressed genes in childhood asthma and establish a classification diagnostic model based on these genes.
METHODS
GSE65204 and GSE19187 were downloaded and served as training set and validation set. The immune cell composition was evaluated with ssGSEA algorithm based on the immune-related gene set. Modules that significantly related to the asthma were selected by WGCNA algorithm. The immune-related differentially expressed genes (DE-IRGs) were screened, the protein-protein interaction network and diagnostic model of DE-IRGs was constructed. The pathway and immune correlation analysis of hub DE-IRGs was analyzed.
RESULTS
Eight immune cell types exhibited varying levels of abundance between the asthma and control groups. A total of 112 differentially expressed immune-related genes (DE-IRGs) was identified. Through the application of four ranking methods (MCC, MNC, DEGREE, and EPC), 17 hub DE-IRGs with overlapping significance were further selected. Subsequently, 8 optimized were identified using univariate logistic regression analysis and the LASSO regression algorithm, based on which a robust diagnostic model was constructed. Notably, TNF and CD40LG emerged as direct participants in asthma-related signaling pathways, displaying a positive correlation with the immune cell types of immature B cells, activated B cells, activated CD8 T cells, activated CD4 T cells, and myeloid-derived suppressor cells.
CONCLUSION
The diagnostic model constructed using the DE-IRGs (CCL5, CCR5, CD40LG, CD8A, IL2RB, PDCD1, TNF, and ZAP70) exhibited high and specific diagnostic value for childhood asthma. The diagnostic model may contribute to the diagnosis of childhood asthma.
PubMed: 38375253
DOI: 10.1016/j.heliyon.2024.e25735 -
Frontiers in Immunology 2024Idiopathic pulmonary fibrosis (IPF) is characterized by progressive lung dysfunction due to excessive collagen production and tissue scarring. Despite recent...
INTRODUCTION
Idiopathic pulmonary fibrosis (IPF) is characterized by progressive lung dysfunction due to excessive collagen production and tissue scarring. Despite recent advancements, the molecular mechanisms remain unclear.
METHODS
RNA sequencing identified 475 differentially expressed genes (DEGs) in the TGF-β1-induced primary lung fibrosis model. Gene expression chips GSE101286 and GSE110147 from NCBI gene expression omnibus (GEO) database were analyzed using GEO2R, revealing 94 DEGs in IPF lung tissue samples. The gene ontology (GO) and pathway enrichment, Protein-protein interaction (PPI) network construction, and Maximal Clique Centrality (MCC) scoring were performed. Experimental validation included RT-qPCR, Immunohistochemistry (IHC), and Western Blot, with siRNA used for gene knockdown. A co-expression network was constructed by GeneMANIA.
RESULTS
GO enrichment highlighted significant enrichment of DEGs in TGF-β cellular response, connective tissue development, extracellular matrix components, and signaling pathways such as the AGE-RAGE signaling pathway and ECM-receptor interaction. PPI network analysis identified hub genes, including FN1, COL1A1, POSTN, KIF11, and ECT2. CALD1 (Caldesmon 1), CDH2 (Cadherin 2), and POSTN (Periostin) were identified as dysregulated hub genes in both the RNA sequencing and GEO datasets. Validation experiments confirmed the upregulation of CALD1, CDH2, and POSTN in TGF-β1-treated fibroblasts and IPF lung tissue samples. IHC experiments probed tissue-level expression patterns of these three molecules. Knockdown of CALD1, CDH2, and POSTN attenuated the expression of fibrotic markers (collagen I and α-SMA) in response to TGF-β1 stimulation in primary fibroblasts. Co-expression analysis revealed interactions between hub genes and predicted genes involved in actin cytoskeleton regulation and cell-cell junction organization.
CONCLUSIONS
CALD1, CDH2, and POSTN, identified as potential contributors to pulmonary fibrosis, present promising therapeutic targets for IPF patients.
Topics: Humans; Antigens, CD; Cadherins; Calmodulin-Binding Proteins; Cell Adhesion Molecules; Collagen; Fibroblasts; Gene Expression; Idiopathic Pulmonary Fibrosis; Transforming Growth Factor beta1
PubMed: 38370408
DOI: 10.3389/fimmu.2024.1275064 -
Current Computer-aided Drug Design Feb 2024Transcription factors are vital biological components that control gene expression, and their primary biological function is to recognize DNA sequences. As related...
INTRODUCTION
Transcription factors are vital biological components that control gene expression, and their primary biological function is to recognize DNA sequences. As related research continues, it was found that the specificity of DNA-protein binding has a significant role in gene expression, regulation, and especially gene therapy. Convolutional Neural Networks (CNNs) have become increasingly popular for predicting DNa-protein-specific binding sites, but their accuracy in prediction needs to be improved.
METHODS
We proposed a framework for combining multi-Instance Learning (MIL) and a hybrid neural network named WSHNN. First, we utilized sliding windows to split the DNA sequences into multiple overlapping instances, each instance containing multiple bags. Then, the instances were encoded using a K-mer encoding. Afterward, the scores of all instances in the same bag were calculated separately by a hybrid neural network.
RESULTS
Finally, a fully connected network was utilized as the final prediction for that bag. The framework could achieve the performances of 90.73% in Pre, 82.77% in Recall, 87.17% in Acc, 0.8657 in F1-score, and 0.7462 in MCC, respectively. In addition, we discussed the performance of K-mer encoding. Compared with other art-of-the-state efforts, the model has better performance with sequence information.
CONCLUSION
From the experimental results, it can be concluded that Bi-directional Long-ShortTerm Memory (Bi-LSTM) can better capture the long-sequence relationships between DNA sequences (the code and data can be visited at https://github.com/baowz12345/Weak_ Super_Network).
PubMed: 38347788
DOI: 10.2174/0115734099277249240129114123 -
International Journal of Molecular... Feb 2024Mitochondrial unfolded protein stress response (mtUPR) plays a critical role in regulating cellular and metabolic stress response and helps maintain protein homeostasis....
Mitochondrial unfolded protein stress response (mtUPR) plays a critical role in regulating cellular and metabolic stress response and helps maintain protein homeostasis. Caseinolytic peptidase P (CLPP) is one of the key regulators of mtUPR and promotes unfolded protein degradation. Previous studies demonstrated that global deletion of resulted in female infertility, whereas no impairment was found in the mouse model with targeted deletion of in cumulus/granulosa cells. These results suggest the need to delineate the function of in oocytes. In this study, we aimed to further explore the role of mtUPR in female reproductive competence and senescence using a mouse model. Oocyte-specific targeted deletion of in mice resulted in female subfertility associated with metabolic and functional abnormalities in oocytes, thus highlighting the importance of CLPP-mediated protein homeostasis in oocyte competence and reproductive function.
Topics: Female; Fertility; Infertility, Female; Mitochondria; Oocytes; Unfolded Protein Response; Endopeptidase Clp; Animals; Mice
PubMed: 38339144
DOI: 10.3390/ijms25031866 -
CNS Neuroscience & Therapeutics Feb 2024Genetic factors play a major part in mediating intracranial aneurysm (IA) rupture. However, research on the role of transcription factors (TFs) in IA rupture is rare.
INTRODUCTION
Genetic factors play a major part in mediating intracranial aneurysm (IA) rupture. However, research on the role of transcription factors (TFs) in IA rupture is rare.
AIMS
Bioinformatics analysis was performed to explore the TFs and related functional pathways involved in IA rupture.
RESULTS
A total of 63 differentially expressed transcription factors (DETFs) were obtained. Significantly enriched biological processes of these DETFs were related to regulation of myeloid leukocyte differentiation. The top 10 DETFs were screened based on the MCC algorithm from the protein-protein interaction network. After screening and validation, it was finally determined that CEBPB may be the hub gene for aneurysm rupture. The GSEA results of CEBPB were mainly associated with the inflammatory response, which was also verified by the experimental model of cellular inflammation in vitro.
CONCLUSION
The inflammatory and immune response may be closely associated with aneurysm rupture. CEBPB may be the hub gene for aneurysm rupture and may have diagnostic value. Therefore, CEBPB may serve as the diagnostic signature for RIAs and a potential target for intervention.
Topics: Humans; Intracranial Aneurysm; Gene Expression Regulation; Aneurysm, Ruptured; Immunity; Transcription Factors; CCAAT-Enhancer-Binding Protein-beta
PubMed: 38332649
DOI: 10.1111/cns.14603 -
Science Translational Medicine Feb 2024Recombination activating genes () are tightly regulated during lymphoid differentiation, and their mutations cause a spectrum of severe immunological disorders....
Recombination activating genes () are tightly regulated during lymphoid differentiation, and their mutations cause a spectrum of severe immunological disorders. Hematopoietic stem and progenitor cell (HSPC) transplantation is the treatment of choice but is limited by donor availability and toxicity. To overcome these issues, we developed gene editing strategies targeting a corrective sequence into the human gene by homology-directed repair (HDR) and validated them by tailored two-dimensional, three-dimensional, and in vivo xenotransplant platforms to assess rescue of expression and function. Whereas integration into intron 1 of achieved suboptimal correction, in-frame insertion into exon 2 drove physiologic human RAG1 expression and activity, allowing disruption of the dominant-negative effects of unrepaired hypomorphic alleles. Enhanced HDR-mediated gene editing enabled the correction of human in HSPCs from patients with hypomorphic mutations to overcome T and B cell differentiation blocks. Gene correction efficiency exceeded the minimal proportion of functional HSPCs required to rescue immunodeficiency in mice, supporting the clinical translation of HSPC gene editing for the treatment of RAG1 deficiency.
Topics: Animals; Humans; Mice; Exons; Gene Editing; Hematopoietic Stem Cell Transplantation; Hematopoietic Stem Cells; Homeodomain Proteins
PubMed: 38324638
DOI: 10.1126/scitranslmed.adh8162 -
Heliyon Jan 2024Diabetic nephropathy (DN) is one of the most common microvascular complications of diabetes mellitus. Periodontitis (PD) is a microbially-induced chronic inflammatory...
Diabetic nephropathy (DN) is one of the most common microvascular complications of diabetes mellitus. Periodontitis (PD) is a microbially-induced chronic inflammatory disease that is thought to have a bidirectional relationship with diabetes mellitus. DN and PD are recognized as models associated with accelerated aging. This study is divided into two parts, the first of which explores the bidirectional causal relationship through Mendelian randomization (MR). The second part aims to investigate the relationship between PD and DN in terms of potential crosstalk genes, aging-related genes, biological pathways, and processes using bioinformatic methods. MR analysis showed no evidence to support a causal relationship between DN and PD ( = 0.34) or PD and DN ( = 0.77). Using the GEO database, we screened 83 crosstalk genes overlapping in two diseases. Twelve paired genes identified by Pearson correlation and the four hub genes in the key cluster were jointly evaluated as key crosstalk-aging genes. Using support vector machine recursive feature elimination (SVM-RFE) and maximal clique centrality (MCC) algorithms, feature selection established five genes as the key crosstalk-aging genes. Based on five key genes, an ANN diagnostic model with reliable diagnosis of two diseases was developed. Gene enrichment analysis indicates that AGE-RAGE pathway signaling, the complement system, and multiple immune inflammatory pathways may be involved in common features of both diseases. Immune infiltration analysis reveals that most immune cells are differentially expressed in PD and DN, with dendritic cells and T cells assuming vital roles in both diseases. Overall, although there is no causal link, CSF1R, CXCL6, VCAM1, JUN and IL1B may be potential crosstalk-aging genes linking PD and DN. The common pathways and markers explored in this study could contribute to a deeper understanding of the common pathogenesis of both diseases in the context of aging and provide a theoretical basis for future research.
PubMed: 38304805
DOI: 10.1016/j.heliyon.2024.e24872 -
Computers in Biology and Medicine Mar 2024DNA-binding and RNA-binding proteins are essential to an organism's normal life cycle. These proteins have diverse functions in various biological processes. DNA-binding...
DNA-binding and RNA-binding proteins are essential to an organism's normal life cycle. These proteins have diverse functions in various biological processes. DNA-binding proteins are crucial for DNA replication, transcription, repair, packaging, and gene expression. Likewise, RNA-binding proteins are essential for the post-transcriptional control of RNAs and RNA metabolism. Identifying DNA- and RNA-binding residue is essential for biological research and understanding the pathogenesis of many diseases. However, most DNA-binding and RNA-binding proteins still need to be discovered. This research explored various properties of the protein sequences, such as amino acid composition type, Position-Specific Scoring Matrix (PSSM) values of amino acids, Hidden Markov model (HMM) profiles, physiochemical properties, structural properties, torsion angles, and disorder regions. We utilized a sliding window technique to extract more information from a target residue's neighbors. We proposed an optimized Light Gradient Boosting Machine (LightGBM) method, named DRBpred, to predict DNA-binding and RNA-binding residues from the protein sequence. DRBpred shows an improvement of 112.00 %, 33.33 %, and 6.49 % for the DNA-binding test set compared to the state-of-the-art method. It shows an improvement of 112.50 %, 16.67 %, and 7.46 % for the RNA-binding test set regarding Sensitivity, Mathews Correlation Coefficient (MCC), and AUC metric.
Topics: Algorithms; Machine Learning; Amino Acids; DNA-Binding Proteins; DNA; RNA; RNA-Binding Proteins; Computational Biology; Databases, Protein
PubMed: 38295475
DOI: 10.1016/j.compbiomed.2024.108081 -
Environmental Health Perspectives Jan 2024The organochlorine dichlorodiphenyltrichloroethane (DDT) is banned worldwide owing to its negative health effects. It is exceptionally used as an insecticide for malaria...
BACKGROUND
The organochlorine dichlorodiphenyltrichloroethane (DDT) is banned worldwide owing to its negative health effects. It is exceptionally used as an insecticide for malaria control. Exposure occurs in regions where DDT is applied, as well as in the Arctic, where its endocrine disrupting metabolite, dichlorodiphenyldichloroethylene (DDE) accumulates in marine mammals and fish. DDT and DDE exposures are linked to birth defects, infertility, cancer, and neurodevelopmental delays. Of particular concern is the potential of DDT use to impact the health of generations to come via the heritable sperm epigenome.
OBJECTIVES
The objective of this study was to assess the sperm epigenome in relation to DDE serum levels between geographically diverse populations.
METHODS
In the Limpopo Province of South Africa, we recruited 247 VhaVenda South African men and selected 50 paired blood serum and semen samples, and 47 Greenlandic Inuit blood and semen paired samples were selected from a total of 193 samples from the biobank of the INUENDO cohort, an EU Fifth Framework Programme Research and Development project. Sample selection was based on obtaining a range of -DDE serum levels (). We assessed the sperm epigenome in relation to serum -DDE levels using MethylC-Capture-sequencing (MCC-seq) and chromatin immunoprecipitation followed by sequencing (ChIP-seq). We identified genomic regions with altered DNA methylation (DNAme) and differential enrichment of histone H3 lysine 4 trimethylation (H3K4me3) in sperm.
RESULTS
Differences in DNAme and H3K4me3 enrichment were identified at transposable elements and regulatory regions involved in fertility, disease, development, and neurofunction. A subset of regions with sperm DNAme and H3K4me3 that differed between exposure groups was predicted to persist in the preimplantation embryo and to be associated with embryonic gene expression.
DISCUSSION
These findings suggest that DDT and DDE exposure impacts the sperm epigenome in a dose-response-like manner and may negatively impact the health of future generations through epigenetic mechanisms. Confounding factors, such as other environmental exposures, genetic diversity, and selection bias, cannot be ruled out. https://doi.org/10.1289/EHP12013.
Topics: Humans; Male; Cross-Sectional Studies; DDT; Dichlorodiphenyl Dichloroethylene; Epigenome; Inuit; Semen; South Africa; Spermatozoa; Black People
PubMed: 38294233
DOI: 10.1289/EHP12013