-
Briefings in Bioinformatics Sep 2023Transmembrane proteins are receptors, enzymes, transporters and ion channels that are instrumental in regulating a variety of cellular activities, such as signal...
Transmembrane proteins are receptors, enzymes, transporters and ion channels that are instrumental in regulating a variety of cellular activities, such as signal transduction and cell communication. Despite tremendous progress in computational capacities to support protein research, there is still a significant gap in the availability of specialized computational analysis toolkits for transmembrane protein research. Here, we introduce TMKit, an open-source Python programming interface that is modular, scalable and specifically designed for processing transmembrane protein data. TMKit is a one-stop computational analysis tool for transmembrane proteins, enabling users to perform database wrangling, engineer features at the mutational, domain and topological levels, and visualize protein-protein interaction interfaces. In addition, TMKit includes seqNetRR, a high-performance computing library that allows customized construction of a large number of residue connections. This library is particularly well suited for assigning correlation matrix-based features at a fast speed. TMKit should serve as a useful tool for researchers in assisting the study of transmembrane protein sequences and structures. TMKit is publicly available through https://github.com/2003100127/tmkit and https://tmkit-guide.herokuapp.com/doc/overview.
Topics: Software; Computational Biology; Membrane Proteins; Amino Acid Sequence; Gene Library
PubMed: 37594311
DOI: 10.1093/bib/bbad288 -
Protein Science : a Publication of the... Nov 2023Predicting the effects of mutations on protein function and stability is an outstanding challenge. Here, we assess the performance of a variant of RoseTTAFold jointly...
Predicting the effects of mutations on protein function and stability is an outstanding challenge. Here, we assess the performance of a variant of RoseTTAFold jointly trained for sequence and structure recovery, RF , for mutation effect prediction. Without any further training, we achieve comparable accuracy in predicting mutation effects for a diverse set of protein families using RF to both another zero-shot model (MSA Transformer) and a model that requires specific training on a particular protein family for mutation effect prediction (DeepSequence). Thus, although the architecture of RF was developed to address the protein design problem of scaffolding functional motifs, RF acquired an understanding of the mutational landscapes of proteins during model training that is equivalent to that of recently developed large protein language models. The ability to simultaneously reason over protein structure and sequence could enable even more precise mutation effect predictions following supervised training on the task. These results suggest that RF has a quite broad understanding of protein sequence-structure landscapes, and can be viewed as a joint model for protein sequence and structure which could be broadly useful for protein modeling.
Topics: Proteins; Mutation; Amino Acid Sequence; Protein Stability
PubMed: 37695922
DOI: 10.1002/pro.4780 -
Food Research International (Ottawa,... Nov 2023Processed fish by-products are valuable sources of peptides due to their high protein content. However, the bitterness of these peptides can limit their use. This review... (Review)
Review
Processed fish by-products are valuable sources of peptides due to their high protein content. However, the bitterness of these peptides can limit their use. This review outlines the most recent advancements and information regarding the reduction of bitterness in fish by-products derived peptides. The sources and factors influencing bitterness, the transduction mechanisms involved, and strategies for reducing bitterness are highlighted. Bitterness in peptides is mainly influenced by the source, preparation method, presence of hydrophobic amino acid groups, binding to bitter receptors, and amino acid sequence. The most widely utilized techniques for eliminating bitterness or enhancing taste include the Maillard reaction, encapsulation, seperating undesirable components, and bitter-blockers. Finally, a summary of the current challenges and future prospects in the domain of fish by-products derived peptides is given. Despite some limitations, such as residual bitterness and limited industrial application, there is a need for further research to reduce the bitterness of fish by-products derived peptides. To achieve this goal, future studies should focus on the technology of fish by-products derived peptide bitterness diminishment, with the aim of producing high-quality products that meet consumer expectations.
Topics: Animals; Taste; Peptides; Amino Acid Sequence; Maillard Reaction
PubMed: 37803554
DOI: 10.1016/j.foodres.2023.113241 -
The Protein Journal Feb 2024Protein sequence comparison remains a challenging work for the researchers owing to the computational complexity due to the presence of 20 amino acids compared with only...
Protein sequence comparison remains a challenging work for the researchers owing to the computational complexity due to the presence of 20 amino acids compared with only four nucleotides in Genome sequences. Further, protein sequences of different species are of different lengths; it throws additional changes to the researchers to develop methods, specially alignment-free methods, to compare protein sequences. In this work, an efficient technique to compare protein sequences is developed by a graphical representation. First, the classified grouping of 20 amino acids with a cardinality of 4 based on polar class is considered to narrow down the representational range from 20 to 4. Then a unit vector technique based on a two-quadrant Cartesian system is proposed to provide a new two-dimensional graphical representation of the protein sequence. Now, two approaches are proposed to cope with the varying lengths of protein sequences from various species: one uses Dynamic Time Warping (DTW), while the other one uses a two-dimensional Fast Fourier Transform (2D FFT). Next, the effectiveness of these two techniques is analyzed using two evaluation criteria-quantitative measures based on symmetric distance (SD) and computational speed. An analysis is performed on five data sets of 9 ND4, 9 ND5, 9 ND6, 12 Baculovirus, and 24 TF proteins under the two methods. It is found that the FFT-based method produces the same results as DTW but in less computational time. It is found that the result of the proposed method agrees with the known biological reference. Further, the present method produces better clustering than the existing ones.
Topics: Amino Acid Sequence; Proteins; Amino Acids; Algorithms
PubMed: 37848727
DOI: 10.1007/s10930-023-10160-2 -
Journal of Biochemistry Dec 2023Akanes are fluorescent proteins that have several fluorescence maxima. In this report, Akane1 and Akane3 from Scleronephthya gracillima were selected, successfully...
Akanes are fluorescent proteins that have several fluorescence maxima. In this report, Akane1 and Akane3 from Scleronephthya gracillima were selected, successfully overexpressed in Escherichia coli and purified by affinity chromatography. Fluorescence spectra of the recombinant Akanes matured in darkness, or ambient light were found to have several fluorescence peaks. SDS-PAGE analysis revealed that Akanes matured in ambient light have two fragments. MS/MS analysis of Akanes digested with trypsin showed that the cleavage site is the same as observed for the photoconvertible fluorescent protein Kaede. The differences between the calculated masses from the amino acid sequence of Akane1 and the measured masses of Akane1 fragments obtained under ambient light coincided with those of Kaede. In contrast, a mass difference between the measured N-terminal Akane3 fragment and the calculated mass indicated that Akane3 is modified in the N-terminal region. These results indicate that numerous peaks in the fluorescent spectra of Akanes partly arise from isoproteins of Akanes and photoconversion. Photoconversion of Akane1 caused a fluorescence change from green to red, which was also observed for Akane3; however, the fluorescent intensity decreased dramatically when compared with that of Akane3.
Topics: Luminescent Proteins; Light; Tandem Mass Spectrometry; Amino Acid Sequence; Green Fluorescent Proteins
PubMed: 37812399
DOI: 10.1093/jb/mvad078 -
Mathematical Biosciences and... Jul 2023Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the...
Protein interactions are the foundation of all metabolic activities of cells, such as apoptosis, the immune response, and metabolic pathways. In order to optimize the performance of protein interaction prediction, a coding method based on normalized difference sequence characteristics (NDSF) of amino acid sequences is proposed. By using the positional relationships between amino acids in the sequences and the correlation characteristics between sequence pairs, NDSF is jointly encoded. Using principal component analysis (PCA) and local linear embedding (LLE) dimensionality reduction methods, the coded 174-dimensional human protein sequence vector is extracted using sequence features. This study compares the classification performance of four ensemble learning methods (AdaBoost, Extra trees, LightGBM, XGBoost) applied to PCA and LLE features. Cross-validation and grid search methods are used to find the best combination of parameters. The results show that the accuracy of NDSF is generally higher than that of the sequence matrix-based coding method (MOS) coding method, and the loss and coding time can be greatly reduced. The bar chart of feature extraction shows that the classification accuracy is significantly higher when using the linear dimensionality reduction method, PCA, compared to the nonlinear dimensionality reduction method, LLE. After classification with XGBoost, the model accuracy reaches 99.2%, which provides the best performance among all models. This study suggests that NDSF combined with PCA and XGBoost may be an effective strategy for classifying different human protein interactions.
Topics: Humans; Amino Acid Sequence; Research Design; Amino Acids; Apoptosis; Computer Systems
PubMed: 37679156
DOI: 10.3934/mbe.2023659 -
Briefings in Bioinformatics Mar 2024Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep...
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.
Topics: Sequence Alignment; Amino Acid Sequence; Proteins; Neural Networks, Computer; Sequence Analysis, Protein
PubMed: 38600663
DOI: 10.1093/bib/bbae146 -
Nature Apr 2024Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional... (Comparative Study)
Comparative Study
Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.
Topics: Amino Acid Sequence; Cryoelectron Microscopy; Machine Learning; Markov Chains; Models, Molecular; Neural Networks, Computer; Protein Conformation; Proteins; Computer Graphics
PubMed: 38408488
DOI: 10.1038/s41586-024-07215-4 -
Protein Expression and Purification Dec 2023Neuritin is a vital neurotrophin that plays an essential role in recovery from nerve injury and neurodegenerative diseases and may become a new target for treating these...
Neuritin is a vital neurotrophin that plays an essential role in recovery from nerve injury and neurodegenerative diseases and may become a new target for treating these conditions. However, improving neuritin protein stability is an urgent problem. In this study, to obtain active and stable neuritin proteins, we added a carboxyl-terminal peptide (CTP) sequence containing four O-linked glycosylation sites to the C-terminus of neuritin and cloned it into the Chinese hamster ovary (CHO) expression system. The neuritin-CTP protein was purified using a His-Tag purification strategy after G418 screening of stable high-expression cell lines. Ultimately, we obtained neuritin-CTP protein with a purity >90%. Functional analyses showed that the purified neuritin-CTP protein promoted the neurite outgrowth of PC12 cells, and stability experiments showed that neuritin stability was increased by adding CTP. These results indicate that neuritin protein-CTP fusion effectively increases stability without affecting secretion and activity. This study offers a sound strategy for improving the stability of neuritin protein and provides material conditions for further study of the function of neuritin.
Topics: Rats; Cricetinae; Animals; CHO Cells; Cricetulus; Amino Acid Sequence; Glycosylation; GPI-Linked Proteins
PubMed: 37567400
DOI: 10.1016/j.pep.2023.106344 -
Scientific Reports Oct 2023High-throughput proteomic analysis of archaeological skeletal remains provides information about past fauna community compositions and species dispersals in time and...
High-throughput proteomic analysis of archaeological skeletal remains provides information about past fauna community compositions and species dispersals in time and space. Archaeological skeletal remains are a finite resource, however, and therefore it becomes relevant to optimize methods of skeletal proteome extraction. Ancient proteins in bone specimens can be highly degraded and consequently, extraction methods for well-preserved or modern bone might be unsuitable for the processing of highly degraded skeletal proteomes. In this study, we compared six proteomic extraction methods on Late Pleistocene remains with variable levels of proteome preservation. We tested the accuracy of species identification, protein sequence coverage, deamidation, and the number of post-translational modifications per method. We find striking differences in obtained proteome complexity and sequence coverage, highlighting that simple acid-insoluble proteome extraction methods perform better in highly degraded contexts. For well-preserved specimens, the approach using EDTA demineralization and protease-mix proteolysis yielded a higher number of identified peptides. The protocols presented here allowed protein extraction from ancient bone with a minimum number of working steps and equipment and yielded protein extracts within three working days. We expect further development along this route to benefit large-scale screening applications of relevance to archaeological and human evolution research.
Topics: Humans; Proteome; Proteomics; Body Remains; Peptides; Amino Acid Sequence
PubMed: 37884544
DOI: 10.1038/s41598-023-44885-y