-
The Journal of Biological Chemistry Nov 2023Most immunoglobulin (Ig) domains bear only a single highly conserved canonical intradomain, inter-β-sheet disulfide linkage formed between Cys23-Cys104, and...
Most immunoglobulin (Ig) domains bear only a single highly conserved canonical intradomain, inter-β-sheet disulfide linkage formed between Cys23-Cys104, and incorporation of rare noncanonical disulfide linkages at other locations can enhance Ig domain stability. Here, we exhaustively surveyed the sequence tolerance of Ig variable (V) domain framework regions (FRs) to noncanonical disulfide linkages. Starting from a destabilized V domain lacking a Cys23-Cys104 disulfide linkage, we generated and screened phage-displayed libraries of engineered Vs, bearing all possible pairwise combinations of Cys residues in neighboring β-strands of the Ig fold FRs. This approach identified seven novel Cys pairs in V FRs (Cys4-Cys25, Cys4-Cys118, Cys5-Cys120, Cys6-Cys119, Cys22-Cys88, Cys24-Cys86, and Cys45-Cys100; the international ImMunoGeneTics information system numbering), whose presence rescued domain folding and stability. Introduction of a subset of these noncanonical disulfide linkages (three intra-β-sheet: Cys4-Cys25, Cys22-Cys88, and Cys24-Cys86, and one inter-β-sheet: Cys6-Cys119) into a diverse panel of V, V, and VH domains enhanced their thermostability and protease resistance without significantly impacting expression, solubility, or binding to cognate antigens. None of the noncanonical disulfide linkages identified were present in the natural human V repertoire. These data reveal an unexpected permissiveness of Ig V domains to noncanonical disulfide linkages at diverse locations in FRs, absent in the human repertoire, whose presence is compatible with antigen recognition and improves domain stability. Our work represents the most complete assessment to date of the role of engineered noncanonical disulfide bonding within FRs in Ig V domain structure and function.
Topics: Humans; Amino Acid Sequence; Cell Surface Display Techniques; Immunoglobulin Variable Region; Protein Domains; Escherichia coli; Protein Folding
PubMed: 37742917
DOI: 10.1016/j.jbc.2023.105278 -
Human Genetics Aug 2023Fatty acid elongase ELOVL5 is part of a protein family of multipass transmembrane proteins that reside in the endoplasmic reticulum where they regulate long-chain fatty...
Fatty acid elongase ELOVL5 is part of a protein family of multipass transmembrane proteins that reside in the endoplasmic reticulum where they regulate long-chain fatty acid elongation. A missense variant (c.689G>T p.Gly230Val) in ELOVL5 causes Spinocerebellar Ataxia subtype 38 (SCA38), a neurodegenerative disorder characterized by autosomal dominant inheritance, cerebellar Purkinje cell demise and adult-onset ataxia. Having previously showed aberrant accumulation of p.G230V in the Golgi complex, here we further investigated the pathogenic mechanisms triggered by p.G230V, integrating functional studies with bioinformatic analyses of protein sequence and structure. Biochemical analysis showed that p.G230V enzymatic activity was normal. In contrast, SCA38-derived fibroblasts showed reduced expression of ELOVL5, Golgi complex enlargement and increased proteasomal degradation with respect to controls. By heterologous overexpression, p.G230V was significantly more active than wild-type ELOVL5 in triggering the unfolded protein response and in decreasing viability in mouse cortical neurons. By homology modelling, we generated native and p.G230V protein structures whose superposition revealed a shift in Loop 6 in p.G230V that altered a highly conserved intramolecular disulphide bond. The conformation of this bond, connecting Loop 2 and Loop 6, appears to be elongase-specific. Alteration of this intramolecular interaction was also observed when comparing wild-type ELOVL4 and the p.W246G variant which causes SCA34. We demonstrate by sequence and structure analyses that ELOVL5 p.G230V and ELOVL4 p.W246G are position-equivalent missense variants. We conclude that SCA38 is a conformational disease and propose combined loss of function by mislocalization and gain of toxic function by ER/Golgi stress as early events in SCA38 pathogenesis.
Topics: Animals; Mice; Spinocerebellar Ataxias; Ataxia; Fatty Acid Elongases; Amino Acid Sequence; Mutation
PubMed: 37199746
DOI: 10.1007/s00439-023-02572-y -
Frontiers in Endocrinology 2023Neuropeptides are involved in almost all physiological activities of insects. Their classification is based on physiological function and the primary amino acid... (Review)
Review
Neuropeptides are involved in almost all physiological activities of insects. Their classification is based on physiological function and the primary amino acid sequence. The pyrokinin (PK)/pheromone biosynthesis activating neuropeptides (PBAN) are one of the largest neuropeptide families in insects, with a conserved C-terminal domain of FXPRLamide. The peptide family is divided into two groups, PK1/diapause hormone (DH) with a WFGPRLa C-terminal ending and PK2/PBAN with FXPRLamide C-terminal ending. Since the development of cutting-edge technology, an increasing number of peptides have been sequenced primarily through genomic, transcriptomics, and proteomics, and their functions discovered using gene editing tools. In this review, we discussed newly discovered functions, and analyzed the distribution of genes encoding these peptides throughout different insect orders. In addition, the location of the peptides that were confirmed by PCR or immunocytochemistry is also described. A phylogenetic tree was constructed according to the sequences of the receptors of most insect orders. This review offers an understanding of the significance of this conserved peptide family in insects.
Topics: Humans; Animals; Phylogeny; Amino Acid Sequence; Insecta; Neuropeptides; Pheromones
PubMed: 38161974
DOI: 10.3389/fendo.2023.1274750 -
Viruses Nov 2023Pepino mosaic virus (PepMV) causes significant economic losses in tomato crops worldwide. Since its first detection infecting tomato in 1999, aggressive PepMV variants...
Pepino mosaic virus (PepMV) causes significant economic losses in tomato crops worldwide. Since its first detection infecting tomato in 1999, aggressive PepMV variants have emerged. This study aimed to characterize two aggressive PepMV isolates, PepMV-H30 and PepMV-KLP2. Both isolates were identified in South-Eastern Spain infecting tomato plants, which showed severe symptoms, including bright yellow mosaics. Full-length infectious clones were generated, and phylogenetic relationships were inferred using their nucleotide sequences and another 35 full-length sequences from isolates representing the five known PepMV strains. Our analysis revealed that PepMV-H30 and PepMV-KLP2 belong to the EU and CH2 strains, respectively. Amino acid sequence comparisons between these and mild isolates identified 8 and 15 amino acid substitutions for PepMV-H30 and PepMV-KLP2, respectively, potentially involved in severe symptom induction. None of the substitutions identified in PepMV-H30 have previously been described as symptom determinants. The E236K substitution, originally present in the PepMV-H30 CP, was introduced into a mild PepMV-EU isolate, resulting in a virus that causes symptoms similar to those induced by the parental PepMV-H30 in plants. In silico analyses revealed that this residue is located at the C-terminus of the CP and is solvent-accessible, suggesting its potential involvement in CP-host protein interactions. We also examined the subcellular localization of PepGFPm2 in comparison to that of PepGFPm2, focusing on chloroplast affection, but no differences were observed in the GFP subcellular distribution between the two viruses in epidermal cells of plants. Due to the easily visible symptoms that PepMV-H30 and PepMV-KLP2 induce, these isolates represent valuable tools in programs designed to breed resistance to PepMV in tomato.
Topics: Phylogeny; Plant Breeding; Amino Acid Sequence; Potexvirus; Solanum lycopersicum; Plant Diseases
PubMed: 38005907
DOI: 10.3390/v15112230 -
Proceedings of the National Academy of... Mar 2024The design of protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert...
The design of protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. Deep learning methods promise to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here, we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology.
Topics: Models, Molecular; Proteins; Amino Acid Sequence; Nanostructures; Biotechnology; Protein Conformation
PubMed: 38502697
DOI: 10.1073/pnas.2314646121 -
SaLT&PepPr is an interface-predicting language model for designing peptide-guided protein degraders.Communications Biology Oct 2023Protein-protein interactions (PPIs) are critical for biological processes and predicting the sites of these interactions is useful for both computational and...
Protein-protein interactions (PPIs) are critical for biological processes and predicting the sites of these interactions is useful for both computational and experimental applications. We present a Structure-agnostic Language Transformer and Peptide Prioritization (SaLT&PepPr) pipeline to predict interaction interfaces from a protein sequence alone for the subsequent generation of peptidic binding motifs. Our model fine-tunes the ESM-2 protein language model (pLM) with a per-position prediction task to identify PPI sites using data from the PDB, and prioritizes motifs which are most likely to be involved within inter-chain binding. By only using amino acid sequence as input, our model is competitive with structural homology-based methods, but exhibits reduced performance compared with deep learning models that input both structural and sequence features. Inspired by our previous results using co-crystals to engineer target-binding "guide" peptides, we curate PPI databases to identify partners for subsequent peptide derivation. Fusing guide peptides to an E3 ubiquitin ligase domain, we demonstrate degradation of endogenous β-catenin, 4E-BP2, and TRIM8, and highlight the nanomolar binding affinity, low off-targeting propensity, and function-altering capability of our best-performing degraders in cancer cells. In total, our study suggests that prioritizing binders from natural interactions via pLMs can enable programmable protein targeting and modulation.
Topics: Proteins; Peptides; Amino Acid Sequence; Ubiquitin-Protein Ligases
PubMed: 37875551
DOI: 10.1038/s42003-023-05464-z -
Bioinformatics (Oxford, England) Nov 2023Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein...
MOTIVATION
Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships.
RESULTS
Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase.
AVAILABILITY AND IMPLEMENTATION
https://github.com/ugobas/PC_ali.
Topics: Software; Algorithms; Amino Acid Sequence; Proteins; Biological Evolution
PubMed: 37847775
DOI: 10.1093/bioinformatics/btad630 -
European Biophysics Journal : EBJ Oct 2023Peptide nucleic acid (PNA) is a nucleic acid mimic with high specificity and binding affinity to natural DNA or RNA, as well as resistance to enzymatic degradation. PNA... (Review)
Review
Peptide nucleic acid (PNA) is a nucleic acid mimic with high specificity and binding affinity to natural DNA or RNA, as well as resistance to enzymatic degradation. PNA sequences can be designed to selectively silence gene expression, which makes PNA a promising tool for antimicrobial applications. However, the poor membrane permeability of PNA remains the main limiting factor for its applications in cells. To overcome this obstacle, PNA conjugates with different molecules have been developed. This mini-review focuses on covalently linked conjugates of PNA with cell-penetrating peptides, aminosugars, aminoglycoside antibiotics, and non-peptidic molecules that were tested, primarily as PNA carriers, in antibacterial and antiviral applications. The chemistries of the conjugation and the applied linkers are also discussed.
Topics: Peptide Nucleic Acids; Anti-Bacterial Agents; Amino Acid Sequence; Cell-Penetrating Peptides
PubMed: 37610696
DOI: 10.1007/s00249-023-01673-w -
Journal of the American Chemical Society Aug 2023Discovering new bioactive molecules is crucial for drug development. Finding a hit compound for a new drug target usually requires screening of millions of molecules.... (Review)
Review
Discovering new bioactive molecules is crucial for drug development. Finding a hit compound for a new drug target usually requires screening of millions of molecules. Affinity selection based technologies have revolutionized early hit discovery by enabling the rapid screening of libraries with millions or billions of compounds in short timeframes. In this Perspective, we describe recent technology breakthroughs that enable the screening of ultralarge synthetic peptidomimetic libraries with a barcode-free tandem mass spectrometry decoding strategy. A combination of combinatorial synthesis, affinity selection, automated peptide sequencing algorithms, and advances in mass spectrometry instrumentation now enables hit discovery from synthetic libraries with over 100 million members. We provide a perspective on this powerful technology and showcase success stories featuring the discovery of high affinity binders for a number of drug targets including proteins, nucleic acids, and specific cell types. Further, we show the usage of the technology to discover synthetic peptidomimetics with specific functions and reactivity. We predict that affinity selection coupled with tandem mass spectrometry and automated decoding will rapidly evolve further and become a broadly used drug discovery technology.
Topics: Tandem Mass Spectrometry; Small Molecule Libraries; Drug Discovery; Amino Acid Sequence
PubMed: 37556835
DOI: 10.1021/jacs.3c04899 -
Nucleic Acids Research Jul 2023SH2 domains are key mediators of phosphotyrosine-based signalling, and therapeutic targets for diverse, mostly oncological, disease indications. They have a highly...
SH2 domains are key mediators of phosphotyrosine-based signalling, and therapeutic targets for diverse, mostly oncological, disease indications. They have a highly conserved structure with a central beta sheet that divides the binding surface of the protein into two main pockets, responsible for phosphotyrosine binding (pY pocket) and substrate specificity (pY + 3 pocket). In recent years, structural databases have proven to be invaluable resources for the drug discovery community, as they contain highly relevant and up-to-date information on important protein classes. Here, we present SH2db, a comprehensive structural database and webserver for SH2 domain structures. To organize these protein structures efficiently, we introduce (i) a generic residue numbering scheme to enhance the comparability of different SH2 domains, (ii) a structure-based multiple sequence alignment of all 120 human wild-type SH2 domain sequences and their PDB and AlphaFold structures. The aligned sequences and structures can be searched, browsed and downloaded from the online interface of SH2db (http://sh2db.ttk.hu), with functions to conveniently prepare multiple structures into a Pymol session, and to export simple charts on the contents of the database. Our hope is that SH2db can assist researchers in their day-to-day work by becoming a one-stop shop for SH2 domain related research.
Topics: Humans; Amino Acid Sequence; Binding Sites; Information Systems; Phosphotyrosine; Protein Binding; Proteins; src Homology Domains; Internet; Databases, Protein
PubMed: 37207333
DOI: 10.1093/nar/gkad420