protein sequence - OpenMD.com Journal Search

Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes.

BMC Genomics Jan 2020

Impaired proteostatic regulation of proteins with prion-like domains (PrLDs) is associated with a variety of human diseases including neurodegenerative disorders,...

Summary PubMed Full Text PDF

Authors: Sean M Cascarina, Eric D Ross

BACKGROUND

Impaired proteostatic regulation of proteins with prion-like domains (PrLDs) is associated with a variety of human diseases including neurodegenerative disorders, myopathies, and certain forms of cancer. For many of these disorders, current models suggest a prion-like molecular mechanism of disease, whereby proteins aggregate and spread to neighboring cells in an infectious manner. The development of prion prediction algorithms has facilitated the large-scale identification of PrLDs among "reference" proteomes for various organisms. However, the degree to which intraspecies protein sequence diversity influences predicted prion propensity has not been systematically examined.

RESULTS

Here, we explore protein sequence variation introduced at genetic, post-transcriptional, and post-translational levels, and its influence on predicted aggregation propensity for human PrLDs. We find that sequence variation is relatively common among PrLDs and in some cases can result in relatively large differences in predicted prion propensity. Sequence variation introduced at the post-transcriptional level (via alternative splicing) also commonly affects predicted aggregation propensity, often by direct inclusion or exclusion of a PrLD. Finally, analysis of a database of sequence variants associated with human disease reveals a number of mutations within PrLDs that are predicted to increase prion propensity.

CONCLUSIONS

Our analyses expand the list of candidate human PrLDs, quantitatively estimate the effects of sequence variation on the aggregation propensity of PrLDs, and suggest the involvement of prion-like mechanisms in additional human diseases.

Topics: Algorithms; Alternative Splicing; Amino Acid Sequence; Humans; Mutation; Neurodegenerative Diseases; Prion Proteins; Prions; Protein Aggregates; Protein Domains; Proteome

PubMed: 31914925
DOI: 10.1186/s12864-019-6425-3

SPEACH_AF: Sampling protein ensembles and conformational heterogeneity with Alphafold2.

PLoS Computational Biology Aug 2022

The unprecedented performance of Deepmind's Alphafold2 in predicting protein structure in CASP XIV and the creation of a database of structures for multiple proteomes...

Summary PubMed Full Text PDF

Authors: Richard A Stein, Hassane S Mchaourab

The unprecedented performance of Deepmind's Alphafold2 in predicting protein structure in CASP XIV and the creation of a database of structures for multiple proteomes and protein sequence repositories is reshaping structural biology. However, because this database returns a single structure, it brought into question Alphafold's ability to capture the intrinsic conformational flexibility of proteins. Here we present a general approach to drive Alphafold2 to model alternate protein conformations through simple manipulation of the multiple sequence alignment via in silico mutagenesis. The approach is grounded in the hypothesis that the multiple sequence alignment must also encode for protein structural heterogeneity, thus its rational manipulation will enable Alphafold2 to sample alternate conformations. A systematic modeling pipeline is benchmarked against canonical examples of protein conformational flexibility and applied to interrogate the conformational landscape of membrane proteins. This work broadens the applicability of Alphafold2 by generating multiple protein conformations to be tested biologically, biochemically, biophysically, and for use in structure-based drug design.

Topics: Amino Acid Sequence; Drug Design; Protein Conformation; Proteins; Sequence Alignment

PubMed: 35994486
DOI: 10.1371/journal.pcbi.1010483

Principles of metabolome conservation in animals.

Proceedings of the National Academy of... Aug 2023

Metabolite levels shape cellular physiology and disease susceptibility, yet the general principles governing metabolome evolution are largely unknown. Here, we introduce...

Summary PubMed Full Text PDF

Authors: Orsolya Liska, Gábor Boross, Charles Rocabert...

Metabolite levels shape cellular physiology and disease susceptibility, yet the general principles governing metabolome evolution are largely unknown. Here, we introduce a measure of conservation of individual metabolite levels among related species. By analyzing multispecies tissue metabolome datasets in phylogenetically diverse mammals and fruit flies, we show that conservation varies extensively across metabolites. Three major functional properties, metabolite abundance, essentiality, and association with human diseases predict conservation, highlighting a striking parallel between the evolutionary forces driving metabolome and protein sequence conservation. Metabolic network simulations recapitulated these general patterns and revealed that abundant metabolites are highly conserved due to their strong coupling to key metabolic fluxes in the network. Finally, we show that biomarkers of metabolic diseases can be distinguished from other metabolites simply based on evolutionary conservation, without requiring any prior clinical knowledge. Overall, this study uncovers simple rules that govern metabolic evolution in animals and implies that most tissue metabolome differences between species are permitted, rather than favored by natural selection. More broadly, our work paves the way toward using evolutionary information to identify biomarkers, as well as to detect pathogenic metabolome alterations in individual patients.

Topics: Animals; Humans; Metabolome; Amino Acid Sequence; Drosophila; Knowledge; Mammals

PubMed: 37603743
DOI: 10.1073/pnas.2302147120

Decoding an Amino Acid Sequence to Extract Information on Protein Folding.

Molecules (Basel, Switzerland) May 2022

Protein folding is a complicated phenomenon including various time scales (μs to several s), and various structural indices are required to analyze it. The... (Review)

Summary PubMed Full Text PDF

Review

Authors: Takeshi Kikuchi

Protein folding is a complicated phenomenon including various time scales (μs to several s), and various structural indices are required to analyze it. The methodologies used to study this phenomenon also have a wide variety and employ various experimental and computational techniques. Thus, a simple speculation does not serve to understand the folding mechanism of a protein. In the present review, we discuss the recent studies conducted by the author and their colleagues to decode amino acid sequences to obtain information on protein folding. We investigate globin-like proteins, ferredoxin-like fold proteins, IgG-like beta-sandwich fold proteins, lysozyme-like fold proteins and β-trefoil-like fold proteins. Our techniques are based on statistics relating to the inter-residue average distance, and our studies performed so far indicate that the information obtained from these analyses includes data on the protein folding mechanism. The relationships between our results and the actual protein folding phenomena are also discussed.

Topics: Amino Acid Sequence; Models, Molecular; Protein Folding; Proteins; Staphylococcal Protein A

PubMed: 35566370
DOI: 10.3390/molecules27093020

Rapid multiple protein sequence search by parallel and heterogeneous computation.

Bioinformatics (Oxford, England) Mar 2024

Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences...

Summary PubMed Full Text PDF

Authors: Jiefu Li, Ziyuan Wang, Xuwei Fan...

MOTIVATION

Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences continues to grow rapidly, there is an increasing need for efficient and scalable multiple sequence query algorithms for super-large databases without expensive time and computational costs.

RESULTS

We introduce Chorus, a novel protein sequence query system that leverages parallel model and heterogeneous computation architecture to enable users to query thousands of protein sequences concurrently against large protein databases on a desktop workstation. Chorus achieves over 100× speedup over BLASTP without sacrificing sensitivity. We demonstrate the utility of Chorus through a case study of analyzing a ∼1.5-TB large-scale metagenomic datasets for novel CRISPR-Cas protein discovery within 30 min.

AVAILABILITY AND IMPLEMENTATION

Chorus is open-source and its code repository is available at https://github.com/Bio-Acc/Chorus.

Topics: Software; Algorithms; Amino Acid Sequence; Proteins; Databases, Protein

PubMed: 38547405
DOI: 10.1093/bioinformatics/btae151

A quest for cytosolic sequons and their functions.

Scientific Reports Apr 2024

Evolution shapes protein sequences for their functions. Here, we studied the moonlighting functions of the N-linked sequon NXS/T, where X is not P, in human...

Summary PubMed Full Text PDF

Authors: Manthan Desai, Syed Rafid Chowdhury, Bingyun Sun...

Evolution shapes protein sequences for their functions. Here, we studied the moonlighting functions of the N-linked sequon NXS/T, where X is not P, in human nucleocytosolic proteins. By comparing membrane and secreted proteins in which sequons are well known for N-glycosylation, we discovered that cyto-sequons can participate in nucleic acid binding, particularly in zinc finger proteins. Our global studies further discovered that sequon occurrence is largely proportional to protein length. The contribution of sequons to protein functions, including both N-glycosylation and nucleic acid binding, can be regulated through their density as well as the biased usage between NXS and NXT. In proteins where other PTMs or structural features are rich, such as phosphorylation, transmembrane ɑ-helices, and disulfide bridges, sequon occurrence is scarce. The information acquired here should help understand the relationship between protein sequence and function and assist future protein design and engineering.

Topics: Humans; Proteins; Glycosylation; Amino Acid Sequence; Phosphorylation; Nucleic Acids

PubMed: 38565583
DOI: 10.1038/s41598-024-57334-1

Improving sequence-based modeling of protein families using secondary-structure quality assessment.

Bioinformatics (Oxford, England) Nov 2021

Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function...

Summary PubMed Full Text PDF

Authors: Cyril Malbranke, David Bikard, Simona Cocco...

MOTIVATION

Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family.

RESULTS

We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments.

AVAILABILITY AND IMPLEMENTATION

Data and code available at https://github.com/CyrilMa/ssqa.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Topics: Proteins; Amino Acid Sequence; Sequence Alignment; Protein Structure, Secondary; Mutagenesis

PubMed: 34117879
DOI: 10.1093/bioinformatics/btab442

Design and characterization of a protein fold switching network.

Nature Communications Jan 2023

To better understand how amino acid sequence encodes protein structure, we engineered mutational pathways that connect three common folds (3α, β-grasp, and...

Summary PubMed Full Text PDF

Authors: Biao Ruan, Yanan He, Yingwei Chen...

To better understand how amino acid sequence encodes protein structure, we engineered mutational pathways that connect three common folds (3α, β-grasp, and α/β-plait). The structures of proteins at high sequence-identity intersections in the pathways (nodes) were determined using NMR spectroscopy and analyzed for stability and function. To generate nodes, the amino acid sequence encoding a smaller fold is embedded in the structure of an ~50% larger fold and a new sequence compatible with two sets of native interactions is designed. This generates protein pairs with a 3α or β-grasp fold in the smaller form but an α/β-plait fold in the larger form. Further, embedding smaller antagonistic folds creates critical states in the larger folds such that single amino acid substitutions can switch both their fold and function. The results help explain the underlying ambiguity in the protein folding code and show that new protein structures can evolve via abrupt fold switching.

Topics: Proteins; Amino Acid Sequence; Protein Folding; Staphylococcal Protein A; Mutation

PubMed: 36702827
DOI: 10.1038/s41467-023-36065-3

Predicting subcellular location of protein with evolution information and sequence-based deep learning.

BMC Bioinformatics Oct 2021

Protein subcellular localization prediction plays an important role in biology research. Since traditional methods are laborious and time-consuming, many machine...

Summary PubMed Full Text PDF

Authors: Zhijun Liao, Gaofeng Pan, Chao Sun...

BACKGROUND

Protein subcellular localization prediction plays an important role in biology research. Since traditional methods are laborious and time-consuming, many machine learning-based prediction methods have been proposed. However, most of the proposed methods ignore the evolution information of proteins. In order to improve the prediction accuracy, we present a deep learning-based method to predict protein subcellular locations.

RESULTS

Our method utilizes not only amino acid compositions sequence but also evolution matrices of proteins. Our method uses a bidirectional long short-term memory network that processes the entire protein sequence and a convolutional neural network that extracts features from protein sequences. The position specific scoring matrix is used as a supplement to protein sequences. Our method was trained and tested on two benchmark datasets. The experiment results show that our method yields accurate results on the two datasets with an average precision of 0.7901, ranking loss of 0.0758 and coverage of 1.2848.

CONCLUSION

The experiment results show that our method outperforms five methods currently available. According to those experiments, we can see that our method is an acceptable alternative to predict protein subcellular location.

Topics: Amino Acid Sequence; Computational Biology; Databases, Protein; Deep Learning; Position-Specific Scoring Matrices; Proteins

PubMed: 34686152
DOI: 10.1186/s12859-021-04404-0

Galectins: Their Network and Roles in Infection/Immunity/Tumor Growth Control 2021.

Biomolecules Sep 2022

Galectins constitute a protein family of soluble and non-glycosylated animal lectins that show a β-galactoside-binding activity via a conserved sequence of...

Summary PubMed Full Text PDF

Authors: Toshio Hattori

Galectins constitute a protein family of soluble and non-glycosylated animal lectins that show a β-galactoside-binding activity via a conserved sequence of approximately 130-140 amino acids located in the carbohydrate recognition domain (CRD) [...].

Topics: Amino Acid Sequence; Amino Acids; Animals; Carbohydrates; Galectins; Neoplasms

PubMed: 36139094
DOI: 10.3390/biom12091255