-
Purinergic Signalling Sep 2023Efforts to fully understand pharmacological differences between G protein-coupled receptor (GPCR) species homologues are generally not pursued in detail during the drug... (Review)
Review
Efforts to fully understand pharmacological differences between G protein-coupled receptor (GPCR) species homologues are generally not pursued in detail during the drug development process. To date, many GPCRs that have been successfully targeted are relatively well-conserved across species in amino acid sequence and display minimal variability of biological effects. However, the A adenosine receptor (AR), an exciting drug target for a multitude of diseases associated with tissue injury, ischemia, and inflammation, displays as little as 70% sequence identity among mammalian species (e.g., rodent vs. primate) commonly used in drug development. Consequently, the pharmacological properties of synthetic AAR ligands vary widely, not only in binding affinity, selectivity, and signaling efficacy, but to the extent that some function as agonists in some species and antagonists in others. Numerous heterocyclic antagonists that have nM affinity at the human AAR are inactive or weakly active at the rat and mouse AARs. Positive allosteric modulators, including the imidazo [4,5-c]quinolin-4-amine derivative LUF6000, are only active at human and some larger animal species that have been evaluated (rabbit and dog), but not rodents. AAR agonists evoke systemic degranulation of rodent, but not human mast cells. The rat AAR undergoes desensitization faster than the human AAR, but the human homologue can be completely re-sensitized and recycled back to the cell surface. Thus, comprehensive pharmacological evaluation and awareness of potential AAR species differences are critical in studies to further understand the basic biological functions of this unique AR subtype. Recombinant AARs from eight different species have been pharmacologically characterized thus far. In this review, we describe in detail current knowledge of species differences in genetic identity, G protein-coupling, receptor regulation, and both orthosteric and allosteric AAR pharmacology.
Topics: Rats; Mice; Humans; Rabbits; Animals; Dogs; Receptor, Adenosine A3; Mast Cells; Amino Acid Sequence; Protein Binding; Signal Transduction; Mammals
PubMed: 36538251
DOI: 10.1007/s11302-022-09910-1 -
Nucleic Acids Research Jan 2024The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million...
The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.
Topics: Amino Acid Sequence; Artificial Intelligence; Databases, Protein; Proteome; Search Engine; Proteins; Protein Structure, Secondary
PubMed: 37933859
DOI: 10.1093/nar/gkad1011 -
Bioinformatics (Oxford, England) Oct 2023In recent years, there has been a breakthrough in protein structure prediction, and the AlphaFold2 model of the DeepMind team has improved the accuracy of protein...
MOTIVATION
In recent years, there has been a breakthrough in protein structure prediction, and the AlphaFold2 model of the DeepMind team has improved the accuracy of protein structure prediction to the atomic level. Currently, deep learning-based protein function prediction models usually extract features from protein sequences and combine them with protein-protein interaction networks to achieve good results. However, for newly sequenced proteins that are not in the protein-protein interaction network, such models cannot make effective predictions. To address this, this article proposes the Struct2GO model, which combines protein structure and sequence data to enhance the precision of protein function prediction and the generality of the model.
RESULTS
We obtain amino acid residue embeddings in protein structure through graph representation learning, utilize the graph pooling algorithm based on a self-attention mechanism to obtain the whole graph structure features, and fuse them with sequence features obtained from the protein language model. The results demonstrate that compared with the traditional protein sequence-based function prediction model, the Struct2GO model achieves better results.
AVAILABILITY AND IMPLEMENTATION
The data underlying this article are available at https://github.com/lyjps/Struct2GO.
Topics: Neural Networks, Computer; Proteins; Algorithms; Amino Acid Sequence; Amino Acids
PubMed: 37847755
DOI: 10.1093/bioinformatics/btad637 -
Briefings in Bioinformatics Sep 2023Transmembrane proteins are receptors, enzymes, transporters and ion channels that are instrumental in regulating a variety of cellular activities, such as signal...
Transmembrane proteins are receptors, enzymes, transporters and ion channels that are instrumental in regulating a variety of cellular activities, such as signal transduction and cell communication. Despite tremendous progress in computational capacities to support protein research, there is still a significant gap in the availability of specialized computational analysis toolkits for transmembrane protein research. Here, we introduce TMKit, an open-source Python programming interface that is modular, scalable and specifically designed for processing transmembrane protein data. TMKit is a one-stop computational analysis tool for transmembrane proteins, enabling users to perform database wrangling, engineer features at the mutational, domain and topological levels, and visualize protein-protein interaction interfaces. In addition, TMKit includes seqNetRR, a high-performance computing library that allows customized construction of a large number of residue connections. This library is particularly well suited for assigning correlation matrix-based features at a fast speed. TMKit should serve as a useful tool for researchers in assisting the study of transmembrane protein sequences and structures. TMKit is publicly available through https://github.com/2003100127/tmkit and https://tmkit-guide.herokuapp.com/doc/overview.
Topics: Software; Computational Biology; Membrane Proteins; Amino Acid Sequence; Gene Library
PubMed: 37594311
DOI: 10.1093/bib/bbad288 -
Protein Science : a Publication of the... Nov 2023Predicting the effects of mutations on protein function and stability is an outstanding challenge. Here, we assess the performance of a variant of RoseTTAFold jointly...
Predicting the effects of mutations on protein function and stability is an outstanding challenge. Here, we assess the performance of a variant of RoseTTAFold jointly trained for sequence and structure recovery, RF , for mutation effect prediction. Without any further training, we achieve comparable accuracy in predicting mutation effects for a diverse set of protein families using RF to both another zero-shot model (MSA Transformer) and a model that requires specific training on a particular protein family for mutation effect prediction (DeepSequence). Thus, although the architecture of RF was developed to address the protein design problem of scaffolding functional motifs, RF acquired an understanding of the mutational landscapes of proteins during model training that is equivalent to that of recently developed large protein language models. The ability to simultaneously reason over protein structure and sequence could enable even more precise mutation effect predictions following supervised training on the task. These results suggest that RF has a quite broad understanding of protein sequence-structure landscapes, and can be viewed as a joint model for protein sequence and structure which could be broadly useful for protein modeling.
Topics: Proteins; Mutation; Amino Acid Sequence; Protein Stability
PubMed: 37695922
DOI: 10.1002/pro.4780 -
Sensors (Basel, Switzerland) Nov 2023Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore...
Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore provide information on the function of subcellular structures and physiological environments. Recently, data-driven systems have been developed to predict the subcellular location of proteins based on protein sequence, immunohistochemistry (IHC) images, or immunofluorescence (IF) images. However, the research on the fusion of multiple protein signals has received little attention. In this study, we developed a dual-signal computational protocol by incorporating IHC images into protein sequences to learn protein subcellular localization. Three major steps can be summarized as follows in this protocol: first, a benchmark database that includes 281 proteins sorted out from 4722 proteins of the Human Protein Atlas (HPA) and Swiss-Prot database, which is involved in the endoplasmic reticulum (ER), Golgi apparatus, cytosol, and nucleoplasm; second, discriminative feature operators were first employed to quantitate protein image-sequence samples that include IHC images and protein sequence; finally, the feature subspace of different protein signals is absorbed to construct multiple sub-classifiers via dimensionality reduction and binary relevance (BR), and multiple confidence derived from multiple sub-classifiers is adopted to decide subcellular location by the centralized voting mechanism at the decision layer. The experimental results indicated that the dual-signal model embedded IHC images and protein sequences outperformed the single-signal models with accuracy, precision, and recall of 75.41%, 80.38%, and 74.38%, respectively. It is enlightening for further research on protein subcellular location prediction under multi-signal fusion of protein.
Topics: Humans; Immunohistochemistry; Proteins; Amino Acid Sequence; Cell Nucleus; Databases, Protein; Subcellular Fractions
PubMed: 38005402
DOI: 10.3390/s23229014 -
Scientific Reports Sep 2023Various approaches have used neural networks as probabilistic models for the design of protein sequences. These "inverse folding" models employ different objective...
Various approaches have used neural networks as probabilistic models for the design of protein sequences. These "inverse folding" models employ different objective functions, which come with trade-offs that have not been assessed in detail before. This study introduces probabilistic definitions of protein stability and conformational specificity and demonstrates the relationship between these chemical properties and the [Formula: see text] Boltzmann probability objective. This links the Boltzmann probability objective function to experimentally verifiable outcomes. We propose a novel sequence decoding algorithm, referred to as "BayesDesign", that leverages Bayes' Rule to maximize the [Formula: see text] objective instead of the [Formula: see text] objective common in inverse folding models. The efficacy of BayesDesign is evaluated in the context of two protein model systems, the NanoLuc enzyme and the WW structural motif. Both BayesDesign and the baseline ProteinMPNN algorithm increase the thermostability of NanoLuc and increase the conformational specificity of WW. The possible sources of error in the model are analyzed.
Topics: Bayes Theorem; Protein Stability; Algorithms; Amino Acid Sequence; Likelihood Functions
PubMed: 37726313
DOI: 10.1038/s41598-023-42032-1 -
Mathematical Biosciences and... Jul 2023Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the...
Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the performance of protein interaction prediction, a coding method based on normalized difference sequence characteristics (NDSF) of amino acid sequences is proposed. By using the positional relationships between amino acids in the sequences and the correlation characteristics between sequence pairs, NDSF is jointly encoded. Using principal component analysis (PCA) and local linear embedding (LLE) dimensionality reduction methods, the coded 174-dimensional human protein sequence vector is extracted using sequence features. This study compares the classification performance of four ensemble learning methods (AdaBoost, Extra trees, LightGBM, XGBoost) applied to PCA and LLE features. Cross-validation and grid search methods are used to find the best combination of parameters. The results show that the accuracy of NDSF is generally higher than that of the sequence matrix-based coding method (MOS) coding method, and the loss and coding time can be greatly reduced. The bar chart of feature extraction shows that the classification accuracy is significantly higher when using the linear dimensionality reduction method, PCA, compared to the nonlinear dimensionality reduction method, LLE. After classification with XGBoost, the model accuracy reaches 99.2%, which provides the best performance among all models. This study suggests that NDSF combined with PCA and XGBoost may be an effective strategy for classifying different human protein interactions.
Topics: Humans; Amino Acid Sequence; Research Design; Amino Acids; Apoptosis; Computer Systems
PubMed: 37679156
DOI: 10.3934/mbe.2023659 -
Briefings in Bioinformatics Mar 2024Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep...
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.
Topics: Sequence Alignment; Amino Acid Sequence; Proteins; Neural Networks, Computer; Sequence Analysis, Protein
PubMed: 38600663
DOI: 10.1093/bib/bbae146 -
Nature Apr 2024Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional... (Comparative Study)
Comparative Study
Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.
Topics: Amino Acid Sequence; Cryoelectron Microscopy; Machine Learning; Markov Chains; Models, Molecular; Neural Networks, Computer; Protein Conformation; Proteins; Computer Graphics
PubMed: 38408488
DOI: 10.1038/s41586-024-07215-4