-
Nature Oct 2023Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in... (Review)
Review
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Topics: Humans; Genome, Human; Molecular Sequence Annotation; Protein Isoforms; Human Genome Project; Genes; Pseudogenes; RNA
PubMed: 37794265
DOI: 10.1038/s41586-023-06490-x -
Nature Communications Mar 2020Tumor cells often reprogram their metabolism for rapid proliferation. The roles of long noncoding RNAs (lncRNAs) in metabolism remodeling and the underlying mechanisms...
Tumor cells often reprogram their metabolism for rapid proliferation. The roles of long noncoding RNAs (lncRNAs) in metabolism remodeling and the underlying mechanisms remain elusive. Through screening, we found that the lncRNA Actin Gamma 1 Pseudogene (AGPG) is required for increased glycolysis activity and cell proliferation in esophageal squamous cell carcinoma (ESCC). Mechanistically, AGPG binds to and stabilizes 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3 (PFKFB3). By preventing APC/C-mediated ubiquitination, AGPG protects PFKFB3 from proteasomal degradation, leading to the accumulation of PFKFB3 in cancer cells, which subsequently activates glycolytic flux and promotes cell cycle progression. AGPG is also a transcriptional target of p53; loss or mutation of TP53 triggers the marked upregulation of AGPG. Notably, inhibiting AGPG dramatically impaired tumor growth in patient-derived xenograft (PDX) models. Clinically, AGPG is highly expressed in many cancers, and high AGPG expression levels are correlated with poor prognosis, suggesting that AGPG is a potential biomarker and cancer therapeutic target.
Topics: Animals; Cell Line, Tumor; Cell Proliferation; Cellular Reprogramming; Esophageal Squamous Cell Carcinoma; Female; Gene Knockout Techniques; Glycolysis; Humans; Mice, Inbred BALB C; Mice, Nude; Phosphofructokinase-2; Pseudogenes; RNA, Long Noncoding; Up-Regulation; Xenograft Model Antitumor Assays
PubMed: 32198345
DOI: 10.1038/s41467-020-15112-3 -
Nucleic Acids Research Jan 2021The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports...
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Topics: Animals; COVID-19; Computational Biology; Databases, Genetic; Epidemics; Genomics; Humans; Internet; Mice; Molecular Sequence Annotation; Pseudogenes; RNA, Long Noncoding; SARS-CoV-2; Transcription, Genetic
PubMed: 33270111
DOI: 10.1093/nar/gkaa1087 -
Nucleic Acids Research Jan 2023Ferroptosis is a mode of regulated cell death characterized by iron-dependent accumulation of lipid peroxidation. It is closely linked to the pathophysiological...
Ferroptosis is a mode of regulated cell death characterized by iron-dependent accumulation of lipid peroxidation. It is closely linked to the pathophysiological processes in many diseases. Since our publication of the first ferroptosis database in 2020 (FerrDb V1), many new findings have been published. To keep up with the rapid progress in ferroptosis research and to provide timely and high-quality data, here we present the successor, FerrDb V2. It contains 1001 ferroptosis regulators and 143 ferroptosis-disease associations manually curated from 3288 articles. Specifically, there are 621 gene regulators, of which 264 are drivers, 238 are suppressors, 9 are markers, and 110 are unclassified genes; and there are 380 substance regulators, with 201 inducers and 179 inhibitors. Compared to FerrDb V1, curated articles increase by >300%, ferroptosis regulators increase by 175%, and ferroptosis-disease associations increase by 50.5%. Circular RNA and pseudogene are novel regulators in FerrDb V2, and the percentage of non-coding RNA increases from 7.3% to 13.6%. External gene-related data were integrated, enabling thought-provoking and gene-oriented analysis in FerrDb V2. In conclusion, FerrDb V2 will help to acquire deeper insights into ferroptosis. FerrDb V2 is freely accessible at http://www.zhounan.org/ferrdb/.
Topics: Ferroptosis; Data Accuracy; Databases, Factual; Lipid Peroxidation; Pseudogenes
PubMed: 36305834
DOI: 10.1093/nar/gkac935 -
Nature Sep 2023The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date and led to its systematic omission from genomic...
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Topics: Humans; Male; Chromosomes, Human, Y; Genome, Human; Genomics; Mutation Rate; Phenotype; Evolution, Molecular; Euchromatin; Pseudogenes; Genetic Variation; Chromosomes, Human, X; Pseudoautosomal Regions
PubMed: 37612510
DOI: 10.1038/s41586-023-06425-6 -
Nature Methods Apr 2015HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an...
HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.
Topics: Humans; Limit of Detection; Pseudogenes; Sequence Alignment; Sequence Analysis, DNA; Sequence Analysis, RNA
PubMed: 25751142
DOI: 10.1038/nmeth.3317 -
Nature Nov 2016Pseudogenes are generally considered to be non-functional DNA sequences that arise through nonsense or frame-shift mutations of protein-coding genes. Although certain...
Pseudogenes are generally considered to be non-functional DNA sequences that arise through nonsense or frame-shift mutations of protein-coding genes. Although certain pseudogene-derived RNAs have regulatory roles, and some pseudogene fragments are translated, no clear functions for pseudogene-derived proteins are known. Olfactory receptor families contain many pseudogenes, which reflect low selection pressures on loci no longer relevant to the fitness of a species. Here we report the characterization of a pseudogene in the chemosensory variant ionotropic glutamate receptor repertoire of Drosophila sechellia, an insect endemic to the Seychelles that feeds almost exclusively on the ripe fruit of Morinda citrifolia. This locus, D. sechellia Ir75a, bears a premature termination codon (PTC) that appears to be fixed in the population. However, D. sechellia Ir75a encodes a functional receptor, owing to efficient translational read-through of the PTC. Read-through is detected only in neurons and is independent of the type of termination codon, but depends on the sequence downstream of the PTC. Furthermore, although the intact Drosophila melanogaster Ir75a orthologue detects acetic acid-a chemical cue important for locating fermenting food found only at trace levels in Morinda fruit-D. sechellia Ir75a has evolved distinct odour-tuning properties through amino-acid changes in its ligand-binding domain. We identify functional PTC-containing loci within different olfactory receptor repertoires and species, suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon.
Topics: Acetic Acid; Animals; Base Sequence; Codon, Terminator; Drosophila; Drosophila melanogaster; Ligands; Molecular Sequence Annotation; Neurons; Organ Specificity; Peptide Chain Elongation, Translational; Pseudogenes; Receptors, Odorant; Reproducibility of Results
PubMed: 27776356
DOI: 10.1038/nature19824 -
Oncotarget May 2016Pseudogenes are DNA sequences with high homology to the corresponding functional gene, but, because of the accumulation of various mutations, they have lost their... (Review)
Review
Pseudogenes are DNA sequences with high homology to the corresponding functional gene, but, because of the accumulation of various mutations, they have lost their initial functions to code for proteins. Consequently, pseudogenes have been considered until few years ago dysfunctional relatives of the corresponding ancestral genes, and then useless in the course of genome evolution. However, several studies have recently established that pseudogenes are owners of key biological functions. Indeed, some pseudogenes control the expression of functional genes by competitively binding to the miRNAs, some of them generate small interference RNAs to negatively modulate the expression of functional genes, and some of them even encode functional mutated proteins. Here, we concentrate our attention on the pseudogenes of the HMGA1 gene, that codes for the HMGA1a and HMGA1b proteins having a critical role in development and cancer progression. In this review, we analyze the family of HMGA1 pseudogenes through three aspects: classification, characterization, and their possible function and involvement in cancer.
Topics: Disease Progression; Gene Expression Regulation, Neoplastic; HMGA1a Protein; HMGA1b Protein; Humans; Models, Genetic; Mutation; Neoplasms; Pseudogenes; RNA, Messenger
PubMed: 26895108
DOI: 10.18632/oncotarget.7427 -
Cell Research Jun 2016
Topics: Animals; High-Throughput Nucleotide Sequencing; Humans; Mice; Nucleic Acid Conformation; Primates; Pseudogenes; RNA; Rats; Retroelements; Sequence Analysis, RNA
PubMed: 27021280
DOI: 10.1038/cr.2016.42 -
Communications Biology Oct 2023Glioma is the most common primary malignancy of the central nervous system. Glioblastoma (GBM) has the highest degree of malignancy among the gliomas and the strongest...
Glioma is the most common primary malignancy of the central nervous system. Glioblastoma (GBM) has the highest degree of malignancy among the gliomas and the strongest resistance to chemotherapy and radiotherapy. Vasculogenic mimicry (VM) provides tumor cells with a blood supply independent of endothelial cells and greatly restricts the therapeutic effect of anti-angiogenic tumor therapy for glioma patients. Vascular endothelial growth factor receptor 2 (VEGFR2) and vascular endothelial cadherin (VE-cadherin) are currently recognized molecular markers of VM in tumors. In the present study, we show that pseudogene MAPK6P4 deficiency represses VEGFR2 and VE-cadherin protein expression levels, as well as inhibits the proliferation, migration, invasion, and VM development of GBM cells. The MAPK6P4-encoded functional peptide P4-135aa phosphorylates KLF15 at the S238 site, promoting KLF15 protein stability and nuclear entry to promote GBM VM formation. KLF15 was further confirmed as a transcriptional activator of LDHA, where LDHA binds and promotes VEGFR2 and VE-cadherin lactylation, thereby increasing their protein expression. Finally, we used orthotopic and subcutaneous xenografted nude mouse models of GBM to verify the inhibitory effect of the above factors on GBM VM development. In summary, this study may represent new targets for the comprehensive treatment of glioma.
Topics: Animals; Humans; Mice; Cell Line, Tumor; Endothelial Cells; Glioblastoma; Glioma; Neovascularization, Pathologic; Pseudogenes; Vascular Endothelial Growth Factor Receptor-2
PubMed: 37853052
DOI: 10.1038/s42003-023-05438-1