-
BMC Bioinformatics Dec 2020Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many...
BACKGROUND
Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs accurately is crucial for understanding cell physiology.
RESULTS
Recently, graph convolutional networks (GCNs) have been proposed to capture the graph structure information and generate representations for nodes in the graph. In our paper, we use GCNs to learn the position information of proteins in the PPIs networks graph, which can reflect the properties of proteins to some extent. Combining amino acid sequence information and position information makes a stronger representation for protein, which improves the accuracy of PPIs prediction.
CONCLUSION
In previous research methods, most of them only used protein amino acid sequence as input information to make predictions, without considering the structural information of PPIs networks graph. We first time combine amino acid sequence information and position information to make representations for proteins. The experimental results indicate that our method has strong competitiveness compared with several sequence-based methods.
Topics: Amino Acid Sequence; Databases, Protein; Humans; Protein Interaction Mapping; Proteins; Saccharomyces cerevisiae; Saccharomyces cerevisiae Proteins
PubMed: 33323120
DOI: 10.1186/s12859-020-03896-6 -
PloS One 2023The structure and sequence of proteins strongly influence their biological functions. New models and algorithms can help researchers in understanding how the evolution...
The structure and sequence of proteins strongly influence their biological functions. New models and algorithms can help researchers in understanding how the evolution of sequences and structures is related to changes in functions. Recently, studies of SARS-CoV-2 Spike (S) protein structures have been performed to predict binding receptors and infection activity in COVID-19, hence the scientific interest in the effects of virus mutations due to sequence, structure and vaccination arises. However, there is the need for models and tools to study the links between the evolution of S protein sequence, structure and functions, and virus transmissibility and the effects of vaccination. As studies on S protein have been generated a large amount of relevant information, we propose in this work to use Protein Contact Networks (PCNs) to relate protein structures with biological properties by means of network topology properties. Topological properties are used to compare the structural changes with sequence changes. We find that both node centrality and community extraction analysis can be used to relate protein stability and functionality with sequence mutations. Starting from this we compare structural evolution to sequence changes and study mutations from a temporal perspective focusing on virus variants. Finally by applying our model to the Omicron variant we report a timeline correlation between Omicron and the vaccination campaign.
Topics: Humans; SARS-CoV-2; COVID-19; Amino Acid Sequence; Mutation; Spike Glycoprotein, Coronavirus
PubMed: 37471335
DOI: 10.1371/journal.pone.0283400 -
BMC Bioinformatics Oct 2021Protein protein interactions (PPIs) are essential to most of the biological processes. The prediction of PPIs is beneficial to the understanding of protein functions and...
BACKGROUND
Protein protein interactions (PPIs) are essential to most of the biological processes. The prediction of PPIs is beneficial to the understanding of protein functions and thus is helpful to pathological analysis, disease diagnosis and drug design etc. As the amount of protein data is growing fast in the post genomic era, high-throughput experimental methods are expensive and time-consuming for the prediction of PPIs. Thus, computational methods have attracted researcher's attention in recent years. A large number of computational methods have been proposed based on different protein sequence encoders.
RESULTS
Notably, the confidence score of a protein sequence pair could be regarded as a kind of measurement to PPIs. The higher the confidence score for one protein pair is, the more likely the protein pair interacts. Thus in this paper, a deep learning framework, called ordinal regression and recurrent convolutional neural network (OR-RCNN) method, is introduced to predict PPIs from the perspective of confidence score. It mainly contains two parts: the encoder part of protein sequence pair and the prediction part of PPIs by confidence score. In the first part, two recurrent convolutional neural networks (RCNNs) with shared parameters are applied to construct two protein sequence embedding vectors, which can automatically extract robust local features and sequential information from the protein pairs. Based on it, the two embedding vectors are encoded into one novel embedding vector by element-wise multiplication. By taking the ordinal information behind confidence score into consideration, ordinal regression is used to construct multiple sub-classifiers in the second part. The results of multiple sub-classifiers are aggregated to obtain the final confidence score. Following that, the existence of PPIs is determined by the confidence score. We set a threshold [Formula: see text], and say the interaction exists between the protein pair if its confidence score is bigger than [Formula: see text].
CONCLUSIONS
We applied our method to predict PPIs on data sets S. cerevisiae and Homo sapiens. Through experimental verification, our method outperforms state-of-the-art PPI prediction models.
Topics: Amino Acid Sequence; Humans; Neural Networks, Computer; Proteins; Saccharomyces cerevisiae
PubMed: 34625020
DOI: 10.1186/s12859-021-04369-0 -
International Journal of Molecular... Jan 2023This review explains the origin of the LIV-1 family of zinc transporters, paying attention to how this family of nine human proteins was originally discovered.... (Review)
Review
This review explains the origin of the LIV-1 family of zinc transporters, paying attention to how this family of nine human proteins was originally discovered. Structural and functional differences between these nine human LIV-1 family members and the five other ZIP transporters are examined. These differences are both related to aspects of the protein sequence, the conservation of important motifs and to the effect this may have on their overall function. The LIV-1 family are dependent on various post-translational modifications, such as phosphorylation and cleavage, which play an important role in their ability to transport zinc. These modifications and their implications are discussed in detail. Some of these proteins have been implicated in cancer which is examined. Furthermore, some additional areas of potential fruitful discovery are discussed and suggested as worthy of examination in the future.
Topics: Humans; Carrier Proteins; Membrane Transport Proteins; Zinc; Amino Acid Sequence
PubMed: 36674777
DOI: 10.3390/ijms24021255 -
BMC Bioinformatics Feb 2024To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the...
BACKGROUND
To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the proximal origins of SARS-CoV-2 coronavirus, we wanted to construct an alignment of genomes closely related to SARS-CoV-2 using both coding and non-coding sequences. To our knowledge, there is no tool that can be used to construct this type of alignment, which motivated the creation of CNCA.
RESULTS
CNCA is a web tool that aligns annotated genomes from GenBank files. It generates a nucleotide alignment that is then updated based on the protein sequence alignment. The output final nucleotide alignment matches the protein alignment and guarantees no frameshift. CNCA was designed to align closely related small genome sequences up to 50 kb (typically viruses) for which the gene order is conserved.
CONCLUSIONS
CNCA constructs multiple alignments of small genomes by integrating both coding and non-coding sequences. This preserves regions traditionally ignored in conventional back-translation methods, such as non-coding regions.
Topics: Genome; Sequence Alignment; Proteins; Amino Acid Sequence; Nucleotides
PubMed: 38424511
DOI: 10.1186/s12859-024-05700-1 -
Bioinformatics (Oxford, England) Mar 2023Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.
MOTIVATION
Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.
RESULTS
Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.
AVAILABILITY AND IMPLEMENTATION
The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.
Topics: Amino Acid Sequence; Proteins; Software
PubMed: 36916746
DOI: 10.1093/bioinformatics/btad122 -
Bioinformatics (Oxford, England) Jul 2022metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of...
MOTIVATION
metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability.
RESULTS
we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred's utility in analyzing microbiome metal requirements.
AVAILABILITY AND IMPLEMENTATION
mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Humans; Amino Acid Sequence; Proteins; Protein Binding; Sequence Alignment; Metals; Ions
PubMed: 35639953
DOI: 10.1093/bioinformatics/btac358 -
Biomolecules Jan 2022Protein-peptide interactions (PpIs) are a subset of the overall protein-protein interaction (PPI) network in the living cell and are pivotal for the majority of cell...
Protein-peptide interactions (PpIs) are a subset of the overall protein-protein interaction (PPI) network in the living cell and are pivotal for the majority of cell processes and functions. High-throughput methods to detect PpIs and PPIs usually require time and costs that are not always affordable. Therefore, reliable in silico predictions represent a valid and effective alternative. In this work, a new algorithm is described, implemented in a freely available tool, i.e., "PepThreader", to carry out PPIs and PpIs prediction and analysis. PepThreader threads multiple fragments derived from a full-length protein sequence (or from a peptide library) onto a second template peptide, in complex with a protein target, "spotting" the potential binding peptides and ranking them according to a sequence-based and structure-based threading score. The threading algorithm first makes use of a scoring function that is based on peptides sequence similarity. Then, a rerank of the initial hits is performed, according to structure-based scoring functions. PepThreader has been benchmarked on a dataset of 292 protein-peptide complexes that were collected from existing databases of experimentally determined protein-peptide interactions. An accuracy of 80%, when considering the top predicted 25 hits, was achieved, which performs in a comparable way with the other state-of-art tools in PPIs and PpIs modeling. Nonetheless, PepThreader is unique in that it is able at the same time to spot a binding peptide within a full-length sequence involved in PPI and model its structure within the receptor. Therefore, PepThreader adds to the already-available tools supporting the experimental PPIs and PpIs identification and characterization.
Topics: Amino Acid Sequence; Peptide Library; Peptides; Protein Interaction Mapping; Software
PubMed: 35204702
DOI: 10.3390/biom12020201 -
Scientific Reports Jul 2022Bio-sequence comparators are one of the most basic and significant methods for assessing biological data, and so, due to the importance of proteins, protein sequence...
Bio-sequence comparators are one of the most basic and significant methods for assessing biological data, and so, due to the importance of proteins, protein sequence comparators are particularly crucial. On the other hand, the complexity of the problem, the growing number of extracted protein sequences, and the growth of studies and data analysis applications addressing protein sequences have necessitated the development of a rapid and accurate approach to account for the complexities in this field. As a result, we propose a protein sequence comparison approach, called PCV, which improves comparison accuracy by producing vectors that encode sequence data as well as physicochemical properties of the amino acids. At the same time, by partitioning the long protein sequences into fix-length blocks and providing encoding vector for each block, this method allows for parallel and fast implementation. To evaluate the performance of PCV, like other alignment-free methods, we used 12 benchmark datasets including classes with homologous sequences which may require a simple preprocessing search tool to select the homologous data. And then, we compared the protein sequence comparison outcomes to those of alternative alignment-based and alignment-free methods, using various evaluation criteria. These results indicate that our method provides significant improvement in sequence classification accuracy, compared to the alternative alignment-free methods and has an average correlation of about 94% with the ClustalW method as our reference method, while considerably reduces the processing time.
Topics: Algorithms; Amino Acid Sequence; Amino Acids; Proteins; Sequence Alignment
PubMed: 35778592
DOI: 10.1038/s41598-022-15266-8 -
Physical Review. E Nov 2022In this paper, a geometrical and thermodynamical analysis of the global properties of the potential energy landscape of a minimalistic model of a polypeptide is...
In this paper, a geometrical and thermodynamical analysis of the global properties of the potential energy landscape of a minimalistic model of a polypeptide is presented. The global geometry of the potential energy landscape is supposed to contain relevant information about the properties of a given sequence of amino acids, that is, to discriminate between a random heteropolymer and a protein. By considering the SH3 and PYP protein-sequences and their randomized versions it turns out that, in addition to the standard signatures of the folding transition-discriminating between protein sequences of amino acids and random heteropolymer sequences-also peculiar geometric signatures of the equipotential hypersurfaces in configuration space can discriminate between proteins and random heteropolymers. Interestingly, these geometric signatures are the "shadows" of deeper topological changes that take place in correspondence with the protein folding transition. The protein folding transition takes place in systems with a small number of degrees of freedom (very far from the Avogadro number) and in the absence of a symmetry-breaking phenomenon. Nevertheless, seen from the deepest level of topology changes of equipotential submanifolds of phase space, the protein folding transition fully qualifies as a phase transition.
Topics: Protein Folding; Proteins; Amino Acid Sequence; Peptides; Polymers; Amino Acids; Thermodynamics; Protein Conformation
PubMed: 36559453
DOI: 10.1103/PhysRevE.106.054134