-
Genome Research Nov 2020Bacterial genomes can contain traces of a complex evolutionary history, including extensive homologous recombination, gene loss, gene duplications, and horizontal gene...
Bacterial genomes can contain traces of a complex evolutionary history, including extensive homologous recombination, gene loss, gene duplications, and horizontal gene transfer. To reconstruct the phylogenetic and population history of a set of multiple bacteria, it is necessary to examine their pangenome, the composite of all the genes in the set. Here we introduce PEPPAN, a novel pipeline that can reliably construct pangenomes from thousands of genetically diverse bacterial genomes that represent the diversity of an entire genus. PEPPAN outperforms existing pangenome methods by providing consistent gene and pseudogene annotations extended by similarity-based gene predictions, and identifying and excluding paralogs by combining tree- and synteny-based approaches. The PEPPAN package additionally includes PEPPAN_parser, which implements additional downstream analyses, including the calculation of trees based on accessory gene content or allelic differences between core genes. To test the accuracy of PEPPAN, we implemented SimPan, a novel pipeline for simulating the evolution of bacterial pangenomes. We compared the accuracy and speed of PEPPAN with four state-of-the-art pangenome pipelines using both empirical and simulated data sets. PEPPAN was more accurate and more specific than any of the other pipelines and was almost as fast as any of them. As a case study, we used PEPPAN to construct a pangenome of approximately 40,000 genes from 3052 representative genomes spanning at least 80 species of The resulting gene and allelic trees provide an unprecedented overview of the genomic diversity of the entire genus.
Topics: Algorithms; Bacteria; Genes, Bacterial; Genome, Bacterial; Genomics; Phylogeny; Pseudogenes; Software; Streptococcus
PubMed: 33055096
DOI: 10.1101/gr.260828.120 -
PloS One 2020For rodents, olfaction is essential for locating food, recognizing mates and competitors, avoiding predators, and navigating their environment. It is thought that...
For rodents, olfaction is essential for locating food, recognizing mates and competitors, avoiding predators, and navigating their environment. It is thought that rodents may have expanded olfactory receptor repertoires in order to specialize in olfactory behavior. Despite being the largest clade of mammals and depending on olfaction relatively little work has documented olfactory repertoires outside of conventional laboratory species. Here we report the olfactory receptor repertoire of the African giant pouched rat (Cricetomys ansorgei), a Muroid rodent distantly related to mice and rats. The African giant pouched rat is notable for its large cortex and olfactory bulbs relative to its body size compared to other sympatric rodents, which suggests anatomical elaboration of olfactory capabilities. We hypothesized that in addition to anatomical elaboration for olfaction, these pouched rats might also have an expanded olfactory receptor repertoire to enable their olfactory behavior. We examined the composition of the olfactory receptor repertoire to better understand how their sensory capabilities have evolved. We identified 1145 intact olfactory genes, and 260 additional pseudogenes within 301 subfamilies from the African giant pouched rat genome. This repertoire is similar to mice and rats in terms of size, pseudogene percentage and number of subfamilies. Analyses of olfactory receptor gene trees revealed that the pouched rat has 6 expansions in different subfamilies compared to mice, rats and squirrels. We identified 81 orthologous genes conserved among 4 rodent species and an additional 147 conserved genes within the Muroid rodents. The orthologous genes shared within Muroidea suggests that there may be a conserved Muroid-specific olfactory receptor repertoire. We also note that the description of this repertoire can serve as a complement to other studies of rodent olfaction, as the pouched rat is an outgroup within Muroidea. Thus, our data suggest that African giant pouched rats are capable of both natural and trained olfactory behaviors with a typical Muriod olfactory receptor repertoire.
Topics: Animals; Genome; Mice; Olfactory Bulb; Olfactory Receptor Neurons; Phylogeny; Pseudogenes; Rats; Receptors, Odorant; Smell
PubMed: 32240170
DOI: 10.1371/journal.pone.0221981 -
Aging Feb 2022Pseudogenes have been reported to play oncogenic or tumor-suppressive roles in cancer progression. However, the molecular mechanism of most pseudogenes in pancreatic...
Pseudogenes have been reported to play oncogenic or tumor-suppressive roles in cancer progression. However, the molecular mechanism of most pseudogenes in pancreatic ductal adenocarcinoma (PDAC) remains unknown. Herein, we characterized a novel pseudogene-miRNA-mRNA network associated with PDAC progression using bioinformatics analysis. After screening by dreamBase and GEPIA, 12 up-regulated and 7 down-regulated differentially expressed pseudogenes (DEPs) were identified. According to survival analysis, only elevated AK4P1 indicated a poor prognosis for PDAC patients. Moreover, we found that AK4 acts as a cognate gene of AK4P1 and also predicts worse survival for PDAC patients. Furthermore, 32 miRNAs were predicted to bind to AK4P1 by starBase, among which miR-375 was identified as the most potential binding miRNA of AK4P1. A total of 477 potential target genes of miR-375 were obtained by miRNet, in which 49 hub genes with node degree ≥ 20 were identified by STRING. Subsequent analysis for hub genes demonstrated that YAP1 may be a functional downstream target of AK4P1. To confirmed the above findings, microarray, and qRT-PCR assay revealed that YAP1 was dramatically upregulated in both PDAC cells and tissues. Functional experiments showed that knockdown of YAP1 significantly suppressed PDAC cells growth, increased apoptosis, and decreased the ability of invasion. In conclusion, amplification of AK4P1 may fuel the onset and development of PDAC by targeting YAP1 through competitively binding to miR-375, and serve as a promising biomarker and therapeutic target for PDAC.
Topics: Carcinoma, Pancreatic Ductal; Cell Line, Tumor; Cell Movement; Cell Proliferation; Gene Expression Regulation, Neoplastic; Humans; MicroRNAs; Pancreatic Neoplasms; Prognosis; Pseudogenes; YAP-Signaling Proteins
PubMed: 35220277
DOI: 10.18632/aging.203921 -
Oncotarget Jan 2017Pseudogenes have been considered as non-functional transcriptional relics of human genomic for long time. However, recent studies revealed that they play a plethora of...
Pseudogenes have been considered as non-functional transcriptional relics of human genomic for long time. However, recent studies revealed that they play a plethora of roles in diverse physiological and pathological processes, especially in cancer, and many pseudogenes are transcribed into long noncoding RNAs and emerging as a novel class of lncRNAs. However, the biological roles and underlying mechanism of pseudogenes in the pathogenesis of non small cell lung cancer are still incompletely elucidated. This study identifies a putative oncogenic pseudogene DUXAP10 in NSCLC, which is located in 14q11.2 and 2398 nt in length. Firstly, we found that DUXAP10 was significantly up-regulated in 93 human NSCLC tissues and cell lines, and increased DUXAP10 was associated with patients poorer prognosis and short survival time. Furthermore, the loss and gain of functional studies including growth curves, migration, invasion assays and in vivo studies verify the oncogenic roles of DUXAP10 in NSCLC. Finally, the mechanistic experiments indicate that DUXAP10 could interact with Histone demethylase Lysine specific demethylase1 (LSD1) and repress tumor suppressors Large tumor suppressor 2 (LATS2) and Ras-related associated with diabetes (RRAD) transcription in NSCLC cells. Taken together, these findings demonstrate DUXAP10 exerts the oncogenic roles through binding with LSD1 and epigenetic silencing LATS2 and RRAD expression. Our investigation reveals the novel roles of pseudogene in NSCLC, which may serve as new target for NSCLC diagnosis and therapy.
Topics: A549 Cells; Animals; Carcinoma, Non-Small-Cell Lung; Cell Line, Tumor; Cell Movement; Cell Proliferation; Epigenesis, Genetic; Female; Gene Expression Regulation, Neoplastic; Histone Demethylases; Humans; Lung Neoplasms; Male; Mice; Prognosis; Protein Serine-Threonine Kinases; Pseudogenes; RNA, Long Noncoding; Survival Analysis; Tumor Suppressor Proteins; Up-Regulation; ras Proteins
PubMed: 28029651
DOI: 10.18632/oncotarget.14125 -
Nucleic Acids Research Sep 2020Alternative splicing (AS) and alternative polyadenylation (APA) generate diverse transcripts in mammalian genomes during development and differentiation. Epigenetic...
Alternative splicing (AS) and alternative polyadenylation (APA) generate diverse transcripts in mammalian genomes during development and differentiation. Epigenetic marks such as trimethylation of histone H3 lysine 36 (H3K36me3) and DNA methylation play a role in generating transcriptome diversity. Intragenic CpG islands (iCGIs) and their corresponding host genes exhibit dynamic epigenetic and gene expression patterns during development and between different tissues. We hypothesise that iCGI-associated H3K36me3, DNA methylation and transcription can influence host gene AS and/or APA. We investigate H3K36me3 and find that this histone mark is not a major regulator of AS or APA in our model system. Genomewide, we identify over 4000 host genes that harbour an iCGI in the mammalian genome, including both previously annotated and novel iCGI/host gene pairs. The transcriptional activity of these iCGIs is tissue- and developmental stage-specific and, for the first time, we demonstrate that the premature termination of host gene transcripts upstream of iCGIs is closely correlated with the level of iCGI transcription in a DNA-methylation independent manner. These studies suggest that iCGI transcription, rather than H3K36me3 or DNA methylation, interfere with host gene transcription and pre-mRNA processing genomewide and contributes to the spatiotemporal diversification of both the transcriptome and proteome.
Topics: Animals; Cell Differentiation; Chromatin; CpG Islands; DNA Methylation; Epigenesis, Genetic; Genome; Histone Code; Humans; Promoter Regions, Genetic; Protein Processing, Post-Translational; Pseudogenes; RNA Precursors; Transcription, Genetic
PubMed: 32621610
DOI: 10.1093/nar/gkaa556 -
Neoplasia (New York, N.Y.) Oct 2019We present the functional characterization of a pseudogene associated recurrent gene fusion in prostate cancer. The fusion gene KLK4-KLKP1 is formed by the fusion of the...
We present the functional characterization of a pseudogene associated recurrent gene fusion in prostate cancer. The fusion gene KLK4-KLKP1 is formed by the fusion of the protein coding gene KLK4 with the noncoding pseudogene KLKP1. Screening of a cohort of 659 patients (380 Caucasian American; 250 African American, and 29 patients from other races) revealed that the KLK4-KLKP1 is expressed in about 32% of prostate cancer patients. Correlative analysis with other ETS gene fusions and SPINK1 revealed a concomitant expression pattern of KLK4-KLKP1 with ERG and a mutually exclusive expression pattern with SPINK1, ETV1, ETV4, and ETV5. Development of an antibody specific to KLK4-KLKP1 fusion protein confirmed the expression of the full-length KLK4-KLKP1 protein in prostate tissues. The in vitro and in vivo functional assays to study the oncogenic properties of KLK4-KLKP1 confirmed its role in cell proliferation, cell invasion, intravasation, and tumor formation. Presence of strong ERG and AR binding sites located at the fusion junction in KLK4-KLKP1 suggests that the fusion gene is regulated by ERG and AR. Correlative analysis of clinical data showed an association of KLK4-KLKP1 with lower preoperative PSA values and in young men (<50 years) with prostate cancer. Screening of patient urine samples showed that KLK4-KLKP1 can be detected noninvasively in urine. Taken together, we present KLK4-KLKP1 as a class of pseudogene associated fusion transcript in cancer with potential applications as a biomarker for routine screening of prostate cancer.
Topics: Amino Acid Sequence; Animals; Cell Line, Tumor; Chick Embryo; Gene Expression Regulation, Neoplastic; Gene Fusion; Genetic Loci; Humans; Kallikreins; Male; Neoplasm Grading; Oncogene Proteins, Fusion; Prostatic Neoplasms; Pseudogenes; Tissue Kallikreins
PubMed: 31446281
DOI: 10.1016/j.neo.2019.07.010 -
Scientific Reports Jul 2019Previously, through a TILLING (Targeting Induced Local Lesions in Genomes) approach applied on barley chloroplast mutator (cpm) seedlings a high frequency of...
Previously, through a TILLING (Targeting Induced Local Lesions in Genomes) approach applied on barley chloroplast mutator (cpm) seedlings a high frequency of polymorphisms in the rpl23 gene was detected. All the polymorphisms corresponded to five differences already known to exist in nature between the rpl23 gene located in the inverted repeats (IRs) and the rpl23 pseudogene located in the large single copy region (LSC). In this investigation, polymorphisms in the rpl23 gene were verified and besides, a similar situation was found for the pseudogene in cpm seedlings. On the other hand, no polymorphisms were found in any of those loci in 40 wild type barley seedlings. Those facts and the independent occurrence of polymorphisms in the gene and pseudogene in individual seedlings suggest that the detected polymorphisms initially arose from gene conversion between gene and pseudogene. Moreover, an additional recombination process involving small recombinant segments seems to occur between the two gene copies as a consequence of their location in the IRs. These and previous results support the hypothesis that the CPM protein is a component of the plastome mismatch repair (MMR) system, whose failure of the anti-recombination activity results in increased illegitimate recombination between the rpl23 gene and pseudogene.
Topics: Chloroplasts; Genes, Chloroplast; Genes, Plant; Genome, Chloroplast; Hordeum; Plant Proteins; Polymorphism, Genetic; Pseudogenes; Ribosomal Proteins; Seedlings
PubMed: 31292475
DOI: 10.1038/s41598-019-46321-6 -
Nucleic Acids Research Jan 2019The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both...
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Topics: Animals; Computational Biology; Databases, Genetic; Genome, Human; Genomics; Humans; Internet; Mice; Molecular Sequence Annotation; Pseudogenes; Software
PubMed: 30357393
DOI: 10.1093/nar/gky955 -
Genes Dec 2022Inherited copy number variations (CNVs) can provide valuable information for cancer susceptibility and prognosis. However, their association with oropharynx squamous...
Inherited copy number variations (CNVs) can provide valuable information for cancer susceptibility and prognosis. However, their association with oropharynx squamous cell carcinoma (OPSCC) is still poorly studied. Using microarrays analysis, we identified three inherited CNVs associated with OPSCC risk, of which one was validated in 152 OPSCC patients and 155 controls and related to pseudogene-microRNA-mRNA interaction. Individuals with three or more copies of and pseudogenes (8p11.22 chromosome region) were under 6.49-fold increased risk of OPSCC. shared a highly homologous sequence with the 3'-UTR, predicted to be a binding site for miR-122b-5p. Individuals carrying more than three copies of and presented higher expression levels. Moreover, patients with total deletion or one copy of pseudogenes and with higher expression of miR-122b-5p presented worse prognoses. Our data suggest, for the first time, that and pseudogene-inherited CNV could modulate OPSCC occurrence and prognosis, possibly through the interaction of pseudogene transcript, miR-122b-5p, and .
Topics: Humans; DNA Copy Number Variations; Pseudogenes; MicroRNAs; Squamous Cell Carcinoma of Head and Neck; Oropharyngeal Neoplasms; Head and Neck Neoplasms; Membrane Proteins; ADAM Proteins
PubMed: 36553675
DOI: 10.3390/genes13122408 -
Annals of Botany Jan 2021The ribosomal DNA (rDNA) gene family, encoding ribosomal RNA (rRNA), has long been regarded as an archetypal example illustrating the model of concerted evolution....
BACKGROUND AND AIMS
The ribosomal DNA (rDNA) gene family, encoding ribosomal RNA (rRNA), has long been regarded as an archetypal example illustrating the model of concerted evolution. However, controversy is arising, as rDNA in many eukaryotic species has been proved to be polymorphic. Here, a metagenomic strategy was applied to detect the intragenomic polymorphism as well as the evolutionary patterns of 26S rDNA across the genus Camellia.
METHODS
Degenerate primer pairs were designed to amplify the 26S rDNA fragments from different Camellia species. The amplicons were then paired-end sequenced on the Illumina MiSeq platform.
KEY RESULTS
An extremely high level of rDNA polymorphism existed universally in Camellia. However, functional rDNA was still the major component of the family, and was relatively conserved among different Camellia species. Sequence variations mainly came from rRNA pseudogenes and favoured regions that are rich in GC. Specifically, some rRNA pseudogenes have existed in the genome for a long time, and have even experienced several expansion events, which has greatly enriched the abundance of rDNA polymorphism.
CONCLUSIONS
Camellia represents a group in which rDNA is subjected to a mixture of concerted and birth-and-death evolution. Some rRNA pseudogenes may still have potential functions. Conversely, when released from selection constraint, they can evolve in the direction of decreasing GC content and structural stability through a methylation-induced process, and finally be eliminated from the genome.
Topics: Camellia; DNA, Ribosomal; Evolution, Molecular; Phylogeny; Pseudogenes; RNA, Ribosomal
PubMed: 32939535
DOI: 10.1093/aob/mcaa169