-
Nature Aug 2021Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort,...
Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence-the structure prediction component of the 'protein folding problem'-has been an important open research problem for more than 50 years. Despite recent progress, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
Topics: Amino Acid Sequence; Computational Biology; Databases, Protein; Deep Learning; Models, Molecular; Neural Networks, Computer; Protein Conformation; Protein Folding; Proteins; Reproducibility of Results; Sequence Alignment
PubMed: 34265844
DOI: 10.1038/s41586-021-03819-2 -
Experimental & Molecular Medicine Sep 2020Advances in single-cell isolation and barcoding technologies offer unprecedented opportunities to profile DNA, mRNA, and proteins at a single-cell resolution. Recently,... (Review)
Review
Advances in single-cell isolation and barcoding technologies offer unprecedented opportunities to profile DNA, mRNA, and proteins at a single-cell resolution. Recently, bulk multiomics analyses, such as multidimensional genomic and proteogenomic analyses, have proven beneficial for obtaining a comprehensive understanding of cellular events. This benefit has facilitated the development of single-cell multiomics analysis, which enables cell type-specific gene regulation to be examined. The cardinal features of single-cell multiomics analysis include (1) technologies for single-cell isolation, barcoding, and sequencing to measure multiple types of molecules from individual cells and (2) the integrative analysis of molecules to characterize cell types and their functions regarding pathophysiological processes based on molecular signatures. Here, we summarize the technologies for single-cell multiomics analyses (mRNA-genome, mRNA-DNA methylation, mRNA-chromatin accessibility, and mRNA-protein) as well as the methods for the integrative analysis of single-cell multiomics data.
Topics: Animals; Biotechnology; Computational Biology; Epigenomics; Gene Expression Profiling; Genomics; Humans; Organ Specificity; Proteomics; Single-Cell Analysis; Transcriptome
PubMed: 32929225
DOI: 10.1038/s12276-020-0420-2 -
BMC Bioinformatics Dec 2020This is an editorial report of the supplements to BMC Bioinformatics that includes 6 papers selected from the BIOCOMP'19-The 2019 International Conference on...
This is an editorial report of the supplements to BMC Bioinformatics that includes 6 papers selected from the BIOCOMP'19-The 2019 International Conference on Bioinformatics and Computational Biology. These articles reflect current trend and development in bioinformatics research.
Topics: Computational Biology; Genomics; Humans; Magnetic Resonance Spectroscopy; Neoplasm Proteins; Neoplasms; Research
PubMed: 33272214
DOI: 10.1186/s12859-020-03874-y -
International Journal of Molecular... Apr 2020Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the... (Review)
Review
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
Topics: Computational Biology; Data Analysis; Humans; Machine Learning; Mass Spectrometry; Protein Interaction Mapping; Protein Interaction Maps; Proteomics; Workflow
PubMed: 32326049
DOI: 10.3390/ijms21082873 -
International Journal of Molecular... Sep 2019Recent advances in omics technologies have led to unprecedented efforts characterizing the molecular changes that underlie the development and progression of a wide... (Review)
Review
Recent advances in omics technologies have led to unprecedented efforts characterizing the molecular changes that underlie the development and progression of a wide array of complex human diseases, including cancer. As a result, multi-omics analyses-which take advantage of these technologies in genomics, transcriptomics, epigenomics, proteomics, metabolomics, and other omics areas-have been proposed and heralded as the key to advancing precision medicine in the clinic. In the field of precision oncology, genomics approaches, and, more recently, other omics analyses have helped reveal several key mechanisms in cancer development, treatment resistance, and recurrence risk, and several of these findings have been implemented in clinical oncology to help guide treatment decisions. However, truly integrated multi-omics analyses have not been applied widely, preventing further advances in precision medicine. Additional efforts are needed to develop the analytical infrastructure necessary to generate, analyze, and annotate multi-omics data effectively to inform precision medicine-based decision-making.
Topics: Biomarkers; Computational Biology; Epigenomics; Genomics; Humans; Metabolomics; Neoplasms; Precision Medicine; Proteomics
PubMed: 31561483
DOI: 10.3390/ijms20194781 -
PLoS Biology Mar 2021Why would a computational biologist with 40 years of research experience say bioinformatics is dead? The short answer is, in being the Founding Dean of a new School of...
Why would a computational biologist with 40 years of research experience say bioinformatics is dead? The short answer is, in being the Founding Dean of a new School of Data Science, what we do suddenly looks different.
Topics: Computational Biology; Curriculum; Data Science; Humans; Information Dissemination; Schools; Students
PubMed: 33735179
DOI: 10.1371/journal.pbio.3001165 -
Nature Communications Mar 2022Predicting the structure of interacting protein chains is a fundamental step towards understanding protein function. Unfortunately, no computational method can produce...
Predicting the structure of interacting protein chains is a fundamental step towards understanding protein function. Unfortunately, no computational method can produce accurate structures of protein complexes. AlphaFold2, has shown unprecedented levels of accuracy in modelling single chain protein structures. Here, we apply AlphaFold2 for the prediction of heterodimeric protein complexes. We find that the AlphaFold2 protocol together with optimised multiple sequence alignments, generate models with acceptable quality (DockQ ≥ 0.23) for 63% of the dimers. From the predicted interfaces we create a simple function to predict the DockQ score which distinguishes acceptable from incorrect models as well as interacting from non-interacting proteins with state-of-art accuracy. We find that, using the predicted DockQ scores, we can identify 51% of all interacting pairs at 1% FPR.
Topics: Computational Biology; Protein Conformation; Proteins
PubMed: 35273146
DOI: 10.1038/s41467-022-28865-w -
Analytica Chimica Acta Jan 2021Recent advances in high-throughput technologies have enabled the profiling of multiple layers of a biological system, including DNA sequence data (genomics), RNA... (Review)
Review
Recent advances in high-throughput technologies have enabled the profiling of multiple layers of a biological system, including DNA sequence data (genomics), RNA expression levels (transcriptomics), and metabolite levels (metabolomics). This has led to the generation of vast amounts of biological data that can be integrated in so-called multi-omics studies to examine the complex molecular underpinnings of health and disease. Integrative analysis of such datasets is not straightforward and is particularly complicated by the high dimensionality and heterogeneity of the data and by the lack of universal analysis protocols. Previous reviews have discussed various strategies to address the challenges of data integration, elaborating on specific aspects, such as network inference or feature selection techniques. Thereby, the main focus has been on the integration of two omics layers in their relation to a phenotype of interest. In this review we provide an overview over a typical multi-omics workflow, focusing on integration methods that have the potential to combine metabolomics data with two or more omics. We discuss multiple integration concepts including data-driven, knowledge-based, simultaneous and step-wise approaches. We highlight the application of these methods in recent multi-omics studies, including large-scale integration efforts aiming at a global depiction of the complex relationships within and between different biological layers without focusing on a particular phenotype.
Topics: Biomedical Research; Computational Biology; Genomics; Metabolomics; Phenotype
PubMed: 33248648
DOI: 10.1016/j.aca.2020.10.038 -
Nucleic Acids Research Jan 2020GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains over 6.25 trillion base pairs from over 1.6 billion nucleotide sequences for...
GenBank® (www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains over 6.25 trillion base pairs from over 1.6 billion nucleotide sequences for 450 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include a new version of Genome Workbench that supports GenBank submissions, new submission wizards for viral genomes, enhancements to BankIt and improved handling of taxonomy for sequences from pathogens.
Topics: Computational Biology; Databases, Nucleic Acid; Genomics; Molecular Sequence Annotation; National Institutes of Health (U.S.); Software; United States; Web Browser
PubMed: 31665464
DOI: 10.1093/nar/gkz956 -
International Journal of Molecular... Aug 2020Medical genomics relies on next-gen sequencing methods to decipher underlying molecular mechanisms of gene expression. This special issue collects materials originally...
Medical genomics relies on next-gen sequencing methods to decipher underlying molecular mechanisms of gene expression. This special issue collects materials originally presented at the "Centenary of Human Population Genetics" Conference-2019, in Moscow. Here we present some recent developments in computational methods tested on actual medical genetics problems dissected through genomics, transcriptomics and proteomics data analysis, gene networks, protein-protein interactions and biomedical literature mining. We have selected materials based on systems biology approaches, database mining. These methods and algorithms were discussed at the Digital Medical Forum-2019, organized by I.M. Sechenov First Moscow State Medical University presenting bioinformatics approaches for the drug targets discovery in cancer, its computational support, and digitalization of medical research, as well as at "Systems Biology and Bioinformatics"-2019 (SBB-2019) Young Scientists School in Novosibirsk, Russia. Selected recent advancements discussed at these events in the medical genomics and genetics areas are based on novel bioinformatics tools.
Topics: Algorithms; Computational Biology; Data Mining; Genetics, Medical; High-Throughput Nucleotide Sequencing; Humans; Systems Biology
PubMed: 32872128
DOI: 10.3390/ijms21176224