-
Current Opinion in Chemical Biology Aug 2023The phenomenon of protein phase separation, which underlies the formation of biomolecular condensates, has been associated with numerous cellular functions. Recent... (Review)
Review
The phenomenon of protein phase separation, which underlies the formation of biomolecular condensates, has been associated with numerous cellular functions. Recent studies indicate that the amino acid sequences of most proteins may harbour not only the code for folding into the native state but also for condensing into the liquid-like droplet state and the solid-like amyloid state. Here we review the current understanding of the principles for sequence-based methods for predicting the propensity of proteins for phase separation. A guiding concept is that entropic contributions are generally more important to stabilise the droplet state than they are for the native and amyloid states. Although estimating these entropic contributions has proven difficult, we describe some progress that has been recently made in this direction. To conclude, we discuss the challenges ahead to extend sequence-based prediction methods of protein phase separation to include quantitative in vivo characterisations of this process.
Topics: Amyloid; Amino Acid Sequence; Cell Physiological Phenomena
PubMed: 37207400
DOI: 10.1016/j.cbpa.2023.102317 -
Frontiers in Endocrinology 2024
Topics: Animals; Arthropods; Neuropeptides; Amino Acid Sequence; Biology
PubMed: 38481439
DOI: 10.3389/fendo.2024.1387176 -
Briefings in Bioinformatics Sep 2023Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research...
Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research shows that integrating multisource data can effectively improve the performance of protein function prediction models. However, the heavy reliance on complex feature engineering and model integration methods limits the development of existing methods. Besides, models based on deep learning only use labeled data in a certain dataset to extract sequence features, thus ignoring a large amount of existing unlabeled sequence data. Here, we propose an end-to-end protein function annotation model named HNetGO, which innovatively uses heterogeneous network to integrate protein sequence similarity and protein-protein interaction network information and combines the pretraining model to extract the semantic features of the protein sequence. In addition, we design an attention-based graph neural network model, which can effectively extract node-level features from heterogeneous networks and predict protein function by measuring the similarity between protein nodes and gene ontology term nodes. Comparative experiments on the human dataset show that HNetGO achieves state-of-the-art performance on cellular component and molecular function branches.
Topics: Humans; Amino Acid Sequence; Gene Ontology; Molecular Sequence Annotation; Neural Networks, Computer; Protein Interaction Maps
PubMed: 37861172
DOI: 10.1093/bib/bbab556 -
Current Opinion in Structural Biology Oct 2023Ancestral sequence reconstruction (ASR) provides insight into the changes within a protein sequence across evolution. More specifically, it can illustrate how specific... (Review)
Review
Ancestral sequence reconstruction (ASR) provides insight into the changes within a protein sequence across evolution. More specifically, it can illustrate how specific amino acid changes give rise to different phenotypes within a protein family. Over the last few decades it has established itself as a powerful technique for revealing molecular common denominators that govern enzyme function. Here, we describe the strength of ASR in unveiling catalytic mechanisms and emerging phenotypes for a range of different proteins, also highlighting biotechnological applications the methodology can provide.
Topics: Phylogeny; Evolution, Molecular; Proteins; Amino Acid Sequence; Phenotype
PubMed: 37544113
DOI: 10.1016/j.sbi.2023.102669 -
Journal of Structural Biology Sep 2023Biomaterials for tissue regeneration must mimic the biophysical properties of the native physiological environment. A protein engineering approach allows the generation...
Biomaterials for tissue regeneration must mimic the biophysical properties of the native physiological environment. A protein engineering approach allows the generation of protein hydrogels with specific and customised biophysical properties designed to suit a particular physiological environment. Herein, repetitive engineered proteins were successfully designed to form covalent molecular networks with defined physical characteristics able to sustain cell phenotype. Our hydrogel design was made possible by the incorporation of the SpyTag (ST) peptide and multiple repetitive units of the SpyCatcher (SC) protein that spontaneously formed covalent crosslinks upon mixing. Changing the ratios of the protein building blocks (ST:SC), allowed the viscoelastic properties and gelation speeds of the hydrogels to be altered and controlled. The physical properties of the hydrogels could readily be altered further to suit different environments by tuning the key features in the repetitive protein sequence. The resulting hydrogels were designed with a view to allow cell attachment and encapsulation of liver derived cells. Biocompatibility of the hydrogels was assayed using a HepG2 cell line constitutively expressing GFP. The cells remained viable and continued to express GFP whilst attached or encapsulated within the hydrogel. Our results demonstrate how this genetically encoded approach using repetitive proteins could be applied to bridge engineering biology with nanotechnology creating a level of biomaterial customisation previously inaccessible.
Topics: Protein Array Analysis; Hydrogels; Proteins; Biocompatible Materials; Amino Acid Sequence
PubMed: 37245604
DOI: 10.1016/j.jsb.2023.107981 -
Biomolecules Aug 2023With the development of accurate protein structure prediction algorithms, artificial intelligence (AI) has emerged as a powerful tool in the field of structural biology....
With the development of accurate protein structure prediction algorithms, artificial intelligence (AI) has emerged as a powerful tool in the field of structural biology. AI-based algorithms have been used to analyze large amounts of protein sequence data including the human proteome, complementing experimental structure data found in resources such as the Protein Data Bank. The EBI AlphaFold Protein Structure Database (for example) contains over 230 million structures. In this study, these data have been analyzed to find all human proteins containing (or predicted to contain) the cytosolic glutathione transferase (cGST) fold. A total of 39 proteins were found, including the alpha-, mu-, pi-, sigma-, zeta- and omega-class GSTs, intracellular chloride channels, metaxins, multisynthetase complex components, elongation factor 1 complex components and others. Three broad themes emerge: cGST domains as enzymes, as chloride ion channels and as protein-protein interaction mediators. As the majority of cGSTs are dimers, the AI-based structure prediction algorithm AlphaFold-multimer was used to predict structures of all pairwise combinations of these cGST domains. Potential homo- and heterodimers are described. Experimental biochemical and structure data is used to highlight the strengths and limitations of AI-predicted structures.
Topics: Humans; Glutathione Transferase; Genome, Human; Artificial Intelligence; Algorithms; Amino Acid Sequence
PubMed: 37627305
DOI: 10.3390/biom13081240 -
Bioinformatics (Oxford, England) Aug 2023Protein thermostability is of great interest, both in theory and in practice.
MOTIVATION
Protein thermostability is of great interest, both in theory and in practice.
RESULTS
This study compared orthologous proteins with different cellular thermostability. A large number of physicochemical properties of protein were calculated and used to develop a series of machine learning models for predicting cellular thermostability differences between orthologous proteins. Most of the important features in these models are also highly correlated to relative cellular thermostability. A comparison between the present study with previous comparison of orthologous proteins from thermophilic and mesophilic organisms found that most highly correlated features are consistent in these studies, suggesting they may be important to protein thermostability.
AVAILABILITY AND IMPLEMENTATION
Data freely available for download at https://github.com/fangj3/cellular-protein-thermostability-dataset.
Topics: Amino Acid Sequence; Proteins
PubMed: 37572303
DOI: 10.1093/bioinformatics/btad504 -
Microbial Cell Factories Sep 2023In the post-genomic era, the demand for faster and more efficient protein production has increased, both in public laboratories and industry. In addition, with the... (Review)
Review
In the post-genomic era, the demand for faster and more efficient protein production has increased, both in public laboratories and industry. In addition, with the expansion of protein sequences in databases, the range of possible enzymes of interest for a given application is also increasing. Faced with peer competition, budgetary, and time constraints, companies and laboratories must find ways to develop a robust manufacturing process for recombinant protein production. In this review, we explore high-throughput technologies for recombinant protein expression and present a holistic high-throughput process development strategy that spans from genes to proteins. We discuss the challenges that come with this task, the limitations of previous studies, and future research directions.
Topics: Cloning, Molecular; Amino Acid Sequence; Genomics; Laboratories; Recombinant Proteins
PubMed: 37715258
DOI: 10.1186/s12934-023-02184-1 -
Proceedings of the National Academy of... Aug 2023Metabolite levels shape cellular physiology and disease susceptibility, yet the general principles governing metabolome evolution are largely unknown. Here, we introduce...
Metabolite levels shape cellular physiology and disease susceptibility, yet the general principles governing metabolome evolution are largely unknown. Here, we introduce a measure of conservation of individual metabolite levels among related species. By analyzing multispecies tissue metabolome datasets in phylogenetically diverse mammals and fruit flies, we show that conservation varies extensively across metabolites. Three major functional properties, metabolite abundance, essentiality, and association with human diseases predict conservation, highlighting a striking parallel between the evolutionary forces driving metabolome and protein sequence conservation. Metabolic network simulations recapitulated these general patterns and revealed that abundant metabolites are highly conserved due to their strong coupling to key metabolic fluxes in the network. Finally, we show that biomarkers of metabolic diseases can be distinguished from other metabolites simply based on evolutionary conservation, without requiring any prior clinical knowledge. Overall, this study uncovers simple rules that govern metabolic evolution in animals and implies that most tissue metabolome differences between species are permitted, rather than favored by natural selection. More broadly, our work paves the way toward using evolutionary information to identify biomarkers, as well as to detect pathogenic metabolome alterations in individual patients.
Topics: Animals; Humans; Metabolome; Amino Acid Sequence; Drosophila; Knowledge; Mammals
PubMed: 37603743
DOI: 10.1073/pnas.2302147120 -
Bioinformatics (Oxford, England) Mar 2024Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences...
MOTIVATION
Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences continues to grow rapidly, there is an increasing need for efficient and scalable multiple sequence query algorithms for super-large databases without expensive time and computational costs.
RESULTS
We introduce Chorus, a novel protein sequence query system that leverages parallel model and heterogeneous computation architecture to enable users to query thousands of protein sequences concurrently against large protein databases on a desktop workstation. Chorus achieves over 100× speedup over BLASTP without sacrificing sensitivity. We demonstrate the utility of Chorus through a case study of analyzing a ∼1.5-TB large-scale metagenomic datasets for novel CRISPR-Cas protein discovery within 30 min.
AVAILABILITY AND IMPLEMENTATION
Chorus is open-source and its code repository is available at https://github.com/Bio-Acc/Chorus.
Topics: Software; Algorithms; Amino Acid Sequence; Proteins; Databases, Protein
PubMed: 38547405
DOI: 10.1093/bioinformatics/btae151