-
Briefings in Bioinformatics Mar 2022A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary,... (Review)
Review
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Topics: Genome; High-Throughput Nucleotide Sequencing; Molecular Sequence Annotation; Sequence Analysis, RNA; Transcriptome; Workflow
PubMed: 35076693
DOI: 10.1093/bib/bbab563 -
Cell Feb 2024Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo,... (Review)
Review
Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.
Topics: Artificial Intelligence; Protein Conformation; Proteins; Protein Engineering; Deep Learning
PubMed: 38306980
DOI: 10.1016/j.cell.2023.12.028 -
Aging Cell Sep 2023Recent advances highlight the pivotal role of nicotinamide adenine dinucleotide (NAD ) in ovarian aging. However, the roles of de novo NAD biosynthesis on ovarian aging...
Recent advances highlight the pivotal role of nicotinamide adenine dinucleotide (NAD ) in ovarian aging. However, the roles of de novo NAD biosynthesis on ovarian aging are still unknown. Here, we found that genetic ablation of Ido1 (indoleamine-2,3-dioxygenase 1) or Qprt (Quinolinate phosphoribosyl transferase), two critical genes in de novo NAD biosynthesis, resulted in decreased ovarian NAD levels in middle-aged mice, leading to subfertility, irregular estrous cycles, reduced ovarian reserve, and accelerated aging. Moreover, we observed impaired oocyte quality, characterized by increased reactive oxygen species and spindle anomalies, which ultimately led to reduced fertilization ability and impaired early embryonic development. A transcriptomic analysis of ovaries in both mutant and wild-type mice revealed alterations in gene expression related to mitochondrial metabolism. Our findings were further supported by the observation of impaired mitochondrial distribution and decreased mitochondrial membrane potential in the oocytes of knockout mice. Supplementation with nicotinamide riboside (NR), an NAD booster, in mutant mice increased ovarian reserve and improved oocyte quality. Our study highlights the importance of the NAD de novo pathway in middle-aged female fertility.
Topics: Female; Mice; Animals; NAD; Ovary; Mice, Knockout
PubMed: 37332134
DOI: 10.1111/acel.13904 -
Genome Medicine Apr 2022Previous large-scale studies of de novo variants identified a number of genes associated with neurodevelopmental disorders (NDDs); however, it was also predicted that...
BACKGROUND
Previous large-scale studies of de novo variants identified a number of genes associated with neurodevelopmental disorders (NDDs); however, it was also predicted that many NDD-associated genes await discovery. Such genes can be discovered by integrating copy number variants (CNVs), which have not been fully considered in previous studies, and increasing the sample size.
METHODS
We first constructed a model estimating the rates of de novo CNVs per gene from several factors such as gene length and number of exons. Second, we compiled a comprehensive list of de novo single-nucleotide variants (SNVs) in 41,165 individuals and de novo CNVs in 3675 individuals with NDDs by aggregating our own and publicly available datasets, including denovo-db and the Deciphering Developmental Disorders study data. Third, summing up the de novo CNV rates that we estimated and SNV rates previously established, gene-based enrichment of de novo deleterious SNVs and CNVs were assessed in the 41,165 cases. Significantly enriched genes were further prioritized according to their similarity to known NDD genes using a deep learning model that considers functional characteristics (e.g., gene ontology and expression patterns).
RESULTS
We identified a total of 380 genes achieving statistical significance (5% false discovery rate), including 31 genes affected by de novo CNVs. Of the 380 genes, 52 have not previously been reported as NDD genes, and the data of de novo CNVs contributed to the significance of three genes (GLTSCR1, MARK2, and UBR3). Among the 52 genes, we reasonably excluded 18 genes [a number almost identical to the theoretically expected false positives (i.e., 380 × 0.05 = 19)] given their constraints against deleterious variants and extracted 34 "plausible" candidate genes. Their validity as NDD genes was consistently supported by their similarity in function and gene expression patterns to known NDD genes. Quantifying the overall similarity using deep learning, we identified 11 high-confidence (> 90% true-positive probabilities) candidate genes: HDAC2, SUPT16H, HECTD4, CHD5, XPO1, GSK3B, NLGN2, ADGRB1, CTR9, BRD3, and MARK2.
CONCLUSIONS
We identified dozens of new candidates for NDD genes. Both the methods and the resources developed here will contribute to the further identification of novel NDD-associated genes.
Topics: Cell Cycle Proteins; DNA Copy Number Variations; DNA Helicases; Exons; Humans; Nerve Tissue Proteins; Neurodevelopmental Disorders; Nucleotides; Transcription Factors
PubMed: 35468861
DOI: 10.1186/s13073-022-01042-w -
Nature Reviews. Chemistry Jan 2022Natural metalloproteins perform many functions - ranging from sensing to electron transfer and catalysis - in which the position and property of each ligand and metal,...
Natural metalloproteins perform many functions - ranging from sensing to electron transfer and catalysis - in which the position and property of each ligand and metal, is dictated by protein structure. De novo protein design aims to define an amino acid sequence that encodes a specific structure and function, providing a critical test of the hypothetical inner workings of (metallo)proteins. To date, de novo metalloproteins have used simple, symmetric tertiary structures - uncomplicated by the large size and evolutionary marks of natural proteins - to interrogate structure-function hypotheses. In this Review, we discuss de novo design applications, such as proteins that induce complex, increasingly asymmetric ligand geometries to achieve function, as well as the use of more canonical ligand geometries to achieve stability. De novo design has been used to explore how proteins fine-tune redox potentials and catalyse both oxidative and hydrolytic reactions. With an increased understanding of structure-function relationships, functional proteins including O-dependent oxidases, fast hydrolases, and multi-proton/multi-electron reductases, have been created. In addition, proteins can now be designed using xeno-biological metals or cofactors and principles from inorganic chemistry to derive new-to-nature functions. These results and the advances in computational protein design suggest a bright future for the de novo design of diverse, functional metalloproteins.
PubMed: 35811759
DOI: 10.1038/s41570-021-00339-5 -
Genes & Diseases Nov 2023nucleotide biosynthetic pathway is a highly conserved and essential biochemical pathway in almost all organisms. Both purine nucleotides and pyrimidine nucleotides are... (Review)
Review
nucleotide biosynthetic pathway is a highly conserved and essential biochemical pathway in almost all organisms. Both purine nucleotides and pyrimidine nucleotides are necessary for cell metabolism and proliferation. Thus, the dysregulation of the nucleotide biosynthetic pathway contributes to the development of many human diseases, such as cancer. It has been shown that many enzymes in this pathway are overactivated in different cancers. In this review, we summarize and update the current knowledge on the nucleotide biosynthetic pathway, regulatory mechanisms, its role in tumorigenesis, and potential targeting opportunities.
PubMed: 37554216
DOI: 10.1016/j.gendis.2022.04.018 -
Cancer Science Feb 2021Recent studies of the cancer genome have identified numerous patients harboring multiple mutations (MM) within individual oncogenes. These MM (de novo MM) in cis... (Review)
Review
Recent studies of the cancer genome have identified numerous patients harboring multiple mutations (MM) within individual oncogenes. These MM (de novo MM) in cis synergistically activate the mutated oncogene and promote tumorigenesis, indicating a positive epistatic interaction between mutations. The relatively frequent de novo MM suggest that intramolecular positive epistasis is widespread in oncogenes. Studies also suggest that negative and higher-order epistasis affects de novo MM. Comparison of de novo MM and MM associated with drug-resistant secondary mutations (secondary MM) revealed several similarities with respect to allelic configuration, mutational selection and functionality of individual mutations. Conversely, they have several differences, most notably the difference in drug sensitivities. Secondary MM usually confer resistance to molecularly targeted therapies, whereas several de novo MM are associated with increased sensitivity, implying that both can be useful as therapeutic biomarkers. Unlike secondary MM in which specific secondary resistant mutations are selected, minor (infrequent) functionally weak mutations are convergently selected in de novo MM, which may provide an explanation as to why such mutations accumulate in cancer. The third type of MM is MM from different subclones. This type of MM is associated with parallel evolution, which may contribute to relapse and treatment failure. Collectively, MM within individual oncogenes are diverse, but all types of MM are associated with cancer evolution and therapeutic response. Further evaluation of oncogenic MM is warranted to gain a deeper understanding of cancer genetics and evolution.
Topics: Carcinogenesis; Humans; Mutation; Neoplasms; Oncogenes
PubMed: 33073435
DOI: 10.1111/cas.14699 -
Drug Discovery Today Nov 2021Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the... (Review)
Review
Molecular design strategies are integral to therapeutic progress in drug discovery. Computational approaches for de novo molecular design have been developed over the past three decades and, recently, thanks in part to advances in machine learning (ML) and artificial intelligence (AI), the drug discovery field has gained practical experience. Here, we review these learnings and present de novo approaches according to the coarseness of their molecular representation: that is, whether molecular design is modeled on an atom-based, fragment-based, or reaction-based paradigm. Furthermore, we emphasize the value of strong benchmarks, describe the main challenges to using these methods in practice, and provide a viewpoint on further opportunities for exploration and challenges to be tackled in the upcoming years.
Topics: Artificial Intelligence; Computer Simulation; Drug Design; Drug Discovery; Drug Evaluation, Preclinical; Humans; Machine Learning; Workflow
PubMed: 34082136
DOI: 10.1016/j.drudis.2021.05.019 -
RSC Medicinal Chemistry Aug 2021molecular design for drug discovery is a growing field. Deep neural networks (DNNs) are becoming more widespread in their use for machine learning models. As more DNN... (Review)
Review
molecular design for drug discovery is a growing field. Deep neural networks (DNNs) are becoming more widespread in their use for machine learning models. As more DNN models are proposed for molecular design, benchmarking methods are crucial for the comparision and validation of these models. This review looks at recently proposed benchmarking methods Fréchet ChemNet Distance, GuacaMol and Molecular Sets (MOSES), and provides a commentary on their future potential applications in molecular drug design and possible next steps for further validation of these benchmarking methods.
PubMed: 34458735
DOI: 10.1039/d1md00074h -
Cancers Mar 2021The activation of de novo serine/glycine biosynthesis in a subset of tumors has been described as a major contributor to tumor pathogenesis, poor outcome, and treatment... (Review)
Review
The activation of de novo serine/glycine biosynthesis in a subset of tumors has been described as a major contributor to tumor pathogenesis, poor outcome, and treatment resistance. Amplifications and mutations of de novo serine/glycine biosynthesis enzymes can trigger pathway activation; however, a large group of cancers displays serine/glycine pathway overexpression induced by oncogenic drivers and unknown regulatory mechanisms. A better understanding of the regulatory network of de novo serine/glycine biosynthesis activation in cancer might be essential to unveil opportunities to target tumor heterogeneity and therapy resistance. In the current review, we describe how the activation of de novo serine/glycine biosynthesis in cancer is linked to treatment resistance and its implications in the clinic. To our knowledge, only a few studies have identified this pathway as metabolic reprogramming of cancer cells in response to radiation therapy. We propose an important contribution of de novo serine/glycine biosynthesis pathway activation to radioresistance by being involved in cancer cell viability and proliferation, maintenance of cancer stem cells (CSCs), and redox homeostasis under hypoxia and nutrient-deprived conditions. Current approaches for inhibition of the de novo serine/glycine biosynthesis pathway provide new opportunities for therapeutic intervention, which in combination with radiotherapy might be a promising strategy for tumor control and ultimately eradication. Further research is needed to gain molecular and mechanistic insight into the activation of this pathway in response to radiation therapy and to design sophisticated stratification methods to select patients that might benefit from serine/glycine metabolism-targeted therapies in combination with radiotherapy.
PubMed: 33801846
DOI: 10.3390/cancers13061191