-
The Plant Journal : For Cell and... Feb 2018Pseudogenes have a reputation of being 'evolutionary relics' or 'junk DNA'. While they are well characterized in mammals, studies in more complex plant genomes have so...
Pseudogenes have a reputation of being 'evolutionary relics' or 'junk DNA'. While they are well characterized in mammals, studies in more complex plant genomes have so far been hampered by the absence of reference genome sequences. Barley is one of the economically most important cereals and has a genome size of 5.1 Gb. With the first high-quality genome reference assembly available for a Triticeae crop, we conducted a whole-genome assessment of pseudogenes on the barley genome. We identified, characterized and classified 89 440 gene fragments and pseudogenes scattered along the chromosomes, with occasional hotspots and higher densities at the chromosome ends. Full-length pseudogenes (11 015) have preferentially retained their exon-intron structure. Retrotransposition of processed mRNAs only plays a marginal role in their creation. However, the distribution of retroposed pseudogenes reflects the Rabl configuration of barley chromosomes and thus hints at founding mechanisms. While parent genes related to the defense-response were found to be under-represented in cultivated barley, we detected several defense-related pseudogenes in wild barley accessions. The percentage of transcriptionally active pseudogenes is 7.2%, and these may potentially adopt new regulatory roles.The barley genome is rich in pseudogenes and small gene fragments mainly located towards chromosome tips or as tandemly repeated units. Our results indicate non-random duplication and pseudogenization preferences and improve our understanding of the dynamics of gene birth and death in large plant genomes and the mechanisms that lead to evolutionary innovations.
Topics: Chromosome Mapping; Chromosomes, Plant; Gene Duplication; Genes, Plant; Hordeum; Multigene Family; Pseudogenes; Selection, Genetic; Synteny
PubMed: 29205595
DOI: 10.1111/tpj.13794 -
Molecular Biology and Evolution Jul 2022Prokaryotic genomes are usually densely packed with intact and functional genes. However, in certain contexts, such as after recent ecological shifts or extreme...
Prokaryotic genomes are usually densely packed with intact and functional genes. However, in certain contexts, such as after recent ecological shifts or extreme population bottlenecks, broken and nonfunctional gene fragments can quickly accumulate and form a substantial fraction of the genome. Identification of these broken genes, called pseudogenes, is a critical step for understanding the evolutionary forces acting upon, and the functional potential encoded within, prokaryotic genomes. Here, we present Pseudofinder, an open-source software dedicated to pseudogene identification and analysis in bacterial and archaeal genomes. We demonstrate that Pseudofinder's multi-pronged, reference-based approach can detect a wide variety of pseudogenes, including those that are highly degraded and typically missed by gene-calling pipelines, as well newly formed pseudogenes containing only one or a few inactivating mutations. Additionally, Pseudofinder can detect genes that lack inactivating substitutions but experiencing relaxed selection. Implementation of Pseudofinder in annotation pipelines will allow more precise estimations of the functional potential of sequenced microbes, while also generating new hypotheses related to the evolutionary dynamics of bacterial and archaeal genomes.
Topics: Bacteria; Genome, Archaeal; Prokaryotic Cells; Pseudogenes; Software
PubMed: 35801562
DOI: 10.1093/molbev/msac153 -
Microbial Genomics Oct 2022Whole-genome sequence analyses have significantly contributed to the understanding of virulence and evolution of the complex (MTBC), the causative pathogens of...
Whole-genome sequence analyses have significantly contributed to the understanding of virulence and evolution of the complex (MTBC), the causative pathogens of tuberculosis. Most MTBC evolutionary studies are focused on single nucleotide polymorphisms and deletions, but rare studies have evaluated gene content, whereas none has comprehensively evaluated pseudogenes. Accordingly, we describe an extensive study focused on quantifying and predicting possible functions of MTBC and pseudogenes. Using NCBI's PGAP-detected pseudogenes, we analysed 25 837 pseudogenes from 158 MTBC and strains and combined transcriptomics and proteomics of H37Rv to gain insights about pseudogenes' expression. Our results indicate significant variability concerning rate and conservancy of predicted pseudogenes among different ecotypes and lineages of tuberculous mycobacteria and pseudogenization of important virulence factors and genes of the metabolism and antimicrobial resistance/tolerance. We show that predicted pseudogenes contribute considerably to MTBC genetic diversity at the population level. Moreover, the transcription machinery of can fully transcribe most pseudogenes, indicating intact promoters and recent pseudogene evolutionary emergence. Proteomics of and close evaluation of mutational lesions driving pseudogenization suggest that few predicted pseudogenes are likely capable of neofunctionalization, nonsense mutation reversal, or phase variation, contradicting the classical definition of pseudogenes. Such findings indicate that genome annotation should be accompanied by proteomics and protein function assays to improve its accuracy. While indels and insertion sequences are the main drivers of the observed mutational lesions in these species, population bottlenecks and genetic drift are likely the evolutionary processes acting on pseudogenes' emergence over time. Our findings unveil a new perspective on MTBC's evolution and genetic diversity.
Topics: Anti-Infective Agents; Codon, Nonsense; DNA Transposable Elements; Mycobacterium tuberculosis; Pseudogenes; Virulence Factors; Drug Resistance, Bacterial
PubMed: 36250787
DOI: 10.1099/mgen.0.000876 -
Theranostics 2020Pseudogenes were initially regarded as "nonfunctional" genomic elements that did not have protein-coding abilities due to several endogenous inactivating mutations.... (Review)
Review
Pseudogenes were initially regarded as "nonfunctional" genomic elements that did not have protein-coding abilities due to several endogenous inactivating mutations. Although pseudogenes are widely expressed in prokaryotes and eukaryotes, for decades, they have been largely ignored and classified as gene "junk" or "relics". With the widespread availability of high-throughput sequencing analysis, especially omics technologies, knowledge concerning pseudogenes has substantially increased. Pseudogenes are evolutionarily conserved and derive primarily from a mutation or retrotransposon, conferring the pseudogene with a "gene repository" role to store and expand genetic information. In contrast to previous notions, pseudogenes have a variety of functions at the DNA, RNA and protein levels for broadly participating in gene regulation to influence the development and progression of certain diseases, especially cancer. Indeed, some pseudogenes have been proven to encode proteins, strongly contradicting their "trash" identification, and have been confirmed to have tissue-specific and disease subtype-specific expression, indicating their own value in disease diagnosis. Moreover, pseudogenes have been correlated with the life expectancy of patients and exhibit great potential for future use in disease treatment, suggesting that they are promising biomarkers and therapeutic targets for clinical applications. In this review, we summarize the natural properties, functions, disease involvement and clinical value of pseudogenes. Although our knowledge of pseudogenes remains nascent, this field deserves more attention and deeper exploration.
Topics: Biomarkers; Diagnostic Techniques and Procedures; Evolution, Molecular; Gene Expression Regulation; Humans; Life Expectancy; Mutation; Neoplasms; Prognosis; Pseudogenes; Therapeutics
PubMed: 32042317
DOI: 10.7150/thno.40659 -
Genome Biology Aug 2021The human genome encodes over 14,000 pseudogenes that are evolutionary relics of protein-coding genes and commonly considered as nonfunctional. Emerging evidence...
BACKGROUND
The human genome encodes over 14,000 pseudogenes that are evolutionary relics of protein-coding genes and commonly considered as nonfunctional. Emerging evidence suggests that some pseudogenes may exert important functions. However, to what extent human pseudogenes are functionally relevant remains unclear. There has been no large-scale characterization of pseudogene function because of technical challenges, including high sequence similarity between pseudogene and parent genes, and poor annotation of transcription start sites.
RESULTS
To overcome these technical obstacles, we develop an integrated computational pipeline to design the first genome-wide library of CRISPR interference (CRISPRi) single-guide RNAs (sgRNAs) that target human pseudogene promoter-proximal regions. We perform the first pseudogene-focused CRISPRi screen in luminal A breast cancer cells and reveal approximately 70 pseudogenes that affect breast cancer cell fitness. Among the top hits, we identify a cancer-testis unitary pseudogene, MGAT4EP, that is predominantly localized in the nucleus and interacts with FOXA1, a key regulator in luminal A breast cancer. By enhancing the promoter binding of FOXA1, MGAT4EP upregulates the expression of oncogenic transcription factor FOXM1. Integrative analyses of multi-omic data from the Cancer Genome Atlas (TCGA) reveal many unitary pseudogenes whose expressions are significantly dysregulated and/or associated with overall/relapse-free survival of patients in diverse cancer types.
CONCLUSIONS
Our study represents the first large-scale study characterizing pseudogene function. Our findings suggest the importance of nuclear function of unitary pseudogenes and underscore their underappreciated roles in human diseases. The functional genomic resources developed here will greatly facilitate the study of human pseudogene function.
Topics: Breast Neoplasms; Cell Nucleus; Cell Proliferation; Clustered Regularly Interspaced Short Palindromic Repeats; Computational Biology; Forkhead Box Protein M1; Gene Expression Regulation, Neoplastic; Hepatocyte Nuclear Factor 3-alpha; Humans; MCF-7 Cells; Promoter Regions, Genetic; Protein Binding; Pseudogenes; RNA, Guide, CRISPR-Cas Systems; Reproducibility of Results; Up-Regulation
PubMed: 34425866
DOI: 10.1186/s13059-021-02464-2 -
Genome Biology and Evolution Oct 2022Trypanosomatids belong to a remarkable group of unicellular, parasitic organisms of the order Kinetoplastida, an early diverging branch of the phylogenetic tree of...
Trypanosomatids belong to a remarkable group of unicellular, parasitic organisms of the order Kinetoplastida, an early diverging branch of the phylogenetic tree of eukaryotes, exhibiting intriguing biological characteristics affecting gene expression (intronless polycistronic transcription, trans-splicing, and RNA editing), metabolism, surface molecules, and organelles (compartmentalization of glycolysis, variation of the surface molecules, and unique mitochondrial DNA), cell biology and life cycle (phagocytic vacuoles evasion and intricate patterns of cell morphogenesis). With numerous genomic-scale data of several trypanosomatids becoming available since 2005 (genomes, transcriptomes, and proteomes), the scientific community can further investigate the mechanisms underlying these unusual features and address other unexplored phenomena possibly revealing biological aspects of the early evolution of eukaryotes. One fundamental aspect comprises the processes and mechanisms involved in the acquisition and loss of genes throughout the evolutionary history of these primitive microorganisms. Here, we present a comprehensive in silico analysis of pseudogenes in three major representatives of this group: Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi. Pseudogenes, DNA segments originating from altered genes that lost their original function, are genomic relics that can offer an essential record of the evolutionary history of functional genes, as well as clues about the dynamics and evolution of hosting genomes. Scanning these genomes with functional proteins as proxies to reveal intergenic regions with protein-coding features, relying on a customized threshold to distinguish statistically and biologically significant sequence similarities, and reassembling remnant sequences from their debris, we found thousands of pseudogenes and hundreds of open reading frames, with particular characteristics in each trypanosomatid: mutation profile, number, content, density, codon bias, average size, single- or multi-copy gene origin, number and type of mutations, putative primitive function, and transcriptional activity. These features suggest a common process of pseudogene formation, different patterns of pseudogene evolution and extant biological functions, and/or distinct genome organization undertaken by those parasites during evolution, as well as different evolutionary and/or selective pressures acting on distinct lineages.
Topics: Animals; Pseudogenes; Phylogeny; Open Reading Frames; Genome; Trypanosoma brucei brucei; Parasites
PubMed: 36208292
DOI: 10.1093/gbe/evac142 -
World Journal of Surgical Oncology Apr 2021BLCA is a common cancer worldwide, and it is both aggressive and fatal. Immunotherapy (ICT) has achieved an excellent curative effect in BLCA; however, only some BLCA...
BACKGROUND
BLCA is a common cancer worldwide, and it is both aggressive and fatal. Immunotherapy (ICT) has achieved an excellent curative effect in BLCA; however, only some BLCA patients can benefit from ICT. MT1L is a pseudogene, and a previous study suggested that MT1L can be used as an indicator of prognosis in colorectal cancer. However, the role of MT1L in BLCA has not yet been determined.
METHODS
Data were collected from TCGA, and logistic regression, Kaplan-Meier plotter, and multivariate Cox analysis were performed to demonstrate the correlation between the pseudogene MT1L and the prognosis of BLCA. To identify the association of MT1L with tumor-infiltrating immune cells, TIMER and TISIDB were utilized. Additionally, GSEA was performed to elucidate the potential biological function.
RESULTS
The expression of MT1L was decreased in BLCA. Additionally, MT1L was positively correlated with immune cells, such as Tregs (ρ = 0.708) and MDSCs (ρ = 0.664). We also confirmed that MT1L is related to typical markers of immune cells, such as PD-1 and CTLA-4. In addition, a high MT1L expression level was associated with the advanced T and N and high grade in BLCA. Increased expression of MT1L was significantly associated with shorter OS times of BLCA patients (p < 0.05). Multivariate Cox analysis revealed that MT1L expression could be an independent prognostic factor in BLCA.
CONCLUSION
Collectively, our findings demonstrated that the pseudogene MT1L regulates the immune microenvironment, correlates with poor survival, and is an independent prognostic biomarker in BLCA.
Topics: Colonic Neoplasms; Gene Expression Regulation, Neoplastic; Humans; Prognosis; Pseudogenes; Tumor Microenvironment; Urinary Bladder Neoplasms
PubMed: 33888142
DOI: 10.1186/s12957-021-02231-4 -
International Journal of Molecular... Nov 2016Pseudogenes are paralogs generated from ancestral functional genes (parents) during genome evolution, which contain critical defects in their sequences, such as lacking... (Review)
Review
Pseudogenes are paralogs generated from ancestral functional genes (parents) during genome evolution, which contain critical defects in their sequences, such as lacking a promoter, having a premature stop codon or frameshift mutations. Generally, pseudogenes are functionless, but recent evidence demonstrates that some of them have potential roles in regulation. The majority of pseudogenes are generated from functional progenitor genes either by gene duplication (duplicated pseudogenes) or retro-transposition (processed pseudogenes). Pseudogenes are primarily identified by comparison to their parent genes. Bioinformatics tools for pseudogene prediction have been developed, among which PseudoPipe, PSF and Shiu's pipeline are publicly available. We compared these three tools using the well-annotated genome and its known 924 pseudogenes as a test data set. PseudoPipe and Shiu's pipeline identified ~80% of pseudogenes, of which 94% were shared, while PSF failed to generate adequate results. A need for improvement of the bioinformatics tools for pseudogene prediction accuracy in plant genomes was thus identified, with the ultimate goal of improving the quality of genome annotation in plants.
Topics: Computational Biology; Gene Duplication; Genome, Plant; Pseudogenes
PubMed: 27916797
DOI: 10.3390/ijms17121991 -
BMC Genomics Jan 2009Of the > 2000 serovars of Salmonella enterica subspecies I, most cause self-limiting gastrointestinal disease in a wide range of mammalian hosts. However, S. enterica... (Comparative Study)
Comparative Study
BACKGROUND
Of the > 2000 serovars of Salmonella enterica subspecies I, most cause self-limiting gastrointestinal disease in a wide range of mammalian hosts. However, S. enterica serovars Typhi and Paratyphi A are restricted to the human host and cause the similar systemic diseases typhoid and paratyphoid fever. Genome sequence similarity between Paratyphi A and Typhi has been attributed to convergent evolution via relatively recent recombination of a quarter of their genomes. The accumulation of pseudogenes is a key feature of these and other host-adapted pathogens, and overlapping pseudogene complements are evident in Paratyphi A and Typhi.
RESULTS
We report the 4.5 Mbp genome of a clinical isolate of Paratyphi A, strain AKU_12601, completely sequenced using capillary techniques and subsequently checked using Illumina/Solexa resequencing. Comparison with the published genome of Paratyphi A ATCC9150 revealed the two are collinear and highly similar, with 188 single nucleotide polymorphisms and 39 insertions/deletions. A comparative analysis of pseudogene complements of these and two finished Typhi genomes (CT18, Ty2) identified several pseudogenes that had been overlooked in prior genome annotations of one or both serovars, and identified 66 pseudogenes shared between serovars. By determining whether each shared and serovar-specific pseudogene had been recombined between Paratyphi A and Typhi, we found evidence that most pseudogenes have accumulated after the recombination between serovars. We also divided pseudogenes into relative-time groups: ancestral pseudogenes inherited from a common ancestor, pseudogenes recombined between serovars which likely arose between initial divergence and later recombination, serovar-specific pseudogenes arising after recombination but prior to the last evolutionary bottlenecks in each population, and more recent strain-specific pseudogenes.
CONCLUSION
Recombination and pseudogene-formation have been important mechanisms of genetic convergence between Paratyphi A and Typhi, with most pseudogenes arising independently after extensive recombination between the serovars. The recombination events, along with divergence of and within each serovar, provide a relative time scale for pseudogene-forming mutations, affording rare insights into the progression of functional gene loss associated with host adaptation in Salmonella.
Topics: DNA, Bacterial; Evolution, Molecular; Genes, Bacterial; Genome, Bacterial; Phylogeny; Pseudogenes; Recombination, Genetic; Salmonella paratyphi A; Salmonella typhi; Sequence Analysis, DNA
PubMed: 19159446
DOI: 10.1186/1471-2164-10-36 -
Genome Medicine May 2017The Human Genome Project and advances in DNA sequencing technologies have revolutionized the identification of genetic disorders through the use of clinical exome... (Review)
Review
The Human Genome Project and advances in DNA sequencing technologies have revolutionized the identification of genetic disorders through the use of clinical exome sequencing. However, in a considerable number of patients, the genetic basis remains unclear. As clinicians begin to consider whole-genome sequencing, an understanding of the processes and tools involved and the factors to consider in the annotation of the structure and function of genomic elements that might influence variant identification is crucial. Here, we discuss and illustrate the strengths and weaknesses of approaches for the annotation and classification of important elements of protein-coding genes, other genomic elements such as pseudogenes and the non-coding genome, comparative-genomic approaches for inferring gene function, and new technologies for aiding genome annotation, as a practical guide for clinicians when considering pathogenic sequence variation. Complete and accurate annotation of structure and function of genome features has the potential to reduce both false-negative (from missing annotation) and false-positive (from incorrect annotation) errors in causal variant identification in exome and genome sequences. Re-analysis of unsolved cases will be necessary as newer technology improves genome annotation, potentially improving the rate of diagnosis.
Topics: Diagnostic Techniques and Procedures; Genetic Variation; Humans; Molecular Sequence Annotation; Pseudogenes; Sequence Analysis, DNA
PubMed: 28558813
DOI: 10.1186/s13073-017-0441-1