-
PloS One 2017RNA-seq reads containing part of the poly(A) tail of transcripts (denoted as poly(A) reads) provide the most direct evidence for the position of poly(A) sites in the...
RNA-seq reads containing part of the poly(A) tail of transcripts (denoted as poly(A) reads) provide the most direct evidence for the position of poly(A) sites in the genome. However, due to reduced coverage of poly(A) tails by reads, poly(A) reads are not routinely identified during RNA-seq mapping. Nevertheless, recent studies for several herpesviruses successfully employed mapping of poly(A) reads to identify herpesvirus poly(A) sites using different strategies and customized programs. To more easily allow such analyses without requiring additional programs, we integrated poly(A) read mapping and prediction of poly(A) sites into our RNA-seq mapping program ContextMap 2. The implemented approach essentially generalizes previously used poly(A) read mapping approaches and combines them with the context-based approach of ContextMap 2 to take into account information provided by other reads aligned to the same location. Poly(A) read mapping using ContextMap 2 was evaluated on real-life data from the ENCODE project and compared against a competing approach based on transcriptome assembly (KLEAT). This showed high positive predictive value for our approach, evidenced also by the presence of poly(A) signals, and considerably lower runtime than KLEAT. Although sensitivity is low for both methods, we show that this is in part due to a high extent of spurious results in the gold standard set derived from RNA-PET data. Sensitivity improves for poly(A) sites of known transcripts or determined with a more specific poly(A) sequencing protocol and increases with read coverage on transcript ends. Finally, we illustrate the usefulness of the approach in a high read coverage scenario by a re-analysis of published data for herpes simplex virus 1. Thus, with current trends towards increasing sequencing depth and read length, poly(A) read mapping will prove to be increasingly useful and can now be performed automatically during RNA-seq mapping with ContextMap 2.
Topics: Animals; Humans; Poly A; RNA, Messenger; Sequence Analysis, RNA; Software; Transcriptome
PubMed: 28135292
DOI: 10.1371/journal.pone.0170914 -
Nature Structural & Molecular Biology May 2024Shortening of messenger RNA poly(A) tails, or deadenylation, is a rate-limiting step in mRNA decay and is highly regulated during gene expression. The incorporation of...
Shortening of messenger RNA poly(A) tails, or deadenylation, is a rate-limiting step in mRNA decay and is highly regulated during gene expression. The incorporation of non-adenosines in poly(A) tails, or 'mixed tailing', has been observed in vertebrates and viruses. Here, to quantitate the effect of mixed tails, we mathematically modeled deadenylation reactions at single-nucleotide resolution using an in vitro deadenylation system reconstituted with the complete human CCR4-NOT complex. Applying this model, we assessed the disrupting impact of single guanosine, uridine or cytosine to be equivalent to approximately 6, 8 or 11 adenosines, respectively. CCR4-NOT stalls at the 0, -1 and -2 positions relative to the non-adenosine residue. CAF1 and CCR4 enzyme subunits commonly prefer adenosine but exhibit distinct sequence selectivities and stalling positions. Our study provides an analytical framework to monitor deadenylation and reveals the molecular basis of tail sequence-dependent regulation of mRNA stability.
Topics: Humans; Kinetics; RNA Stability; Poly A; RNA, Messenger; Adenosine; Receptors, CCR4; Exoribonucleases; RNA Nucleotidyltransferases
PubMed: 38374449
DOI: 10.1038/s41594-023-01187-1 -
PLoS Computational Biology Nov 2020In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is...
In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
Topics: Animals; Deep Learning; Deoxyguanosine; Humans; Neural Networks, Computer; Poly A; Signal Transduction; Species Specificity
PubMed: 33151940
DOI: 10.1371/journal.pcbi.1008297 -
BMC Genomics Aug 2017Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and...
BACKGROUND
Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3'-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.
RESULTS
In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.
CONCLUSIONS
The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/ .
Topics: Data Mining; Genome, Human; Genomics; Humans; Poly A; Polyadenylation
PubMed: 28810905
DOI: 10.1186/s12864-017-4033-7 -
Wiley Interdisciplinary Reviews. RNA 2012Pre-mRNA cleavage and polyadenylation is an essential step for 3' end formation of almost all protein-coding transcripts in eukaryotes. The reaction, involving cleavage... (Review)
Review
Pre-mRNA cleavage and polyadenylation is an essential step for 3' end formation of almost all protein-coding transcripts in eukaryotes. The reaction, involving cleavage of nascent mRNA followed by addition of a polyadenylate or poly(A) tail, is controlled by cis-acting elements in the pre-mRNA surrounding the cleavage site. Experimental and bioinformatic studies in the past three decades have elucidated conserved and divergent elements across eukaryotes, from yeast to human. Here we review histories and current models of these elements in a broad range of species.
Topics: Animals; Computational Biology; Humans; Plants; Poly A; Polyadenylation; Polynucleotide Adenylyltransferase; RNA Precursors; Regulatory Sequences, Nucleic Acid; Saccharomyces cerevisiae; mRNA Cleavage and Polyadenylation Factors
PubMed: 22012871
DOI: 10.1002/wrna.116 -
Nature Structural & Molecular Biology Jun 2019The 3' poly(A) tail of messenger RNA is fundamental to regulating eukaryotic gene expression. Shortening of the poly(A) tail, termed deadenylation, reduces transcript...
The 3' poly(A) tail of messenger RNA is fundamental to regulating eukaryotic gene expression. Shortening of the poly(A) tail, termed deadenylation, reduces transcript stability and inhibits translation. Nonetheless, the mechanism for poly(A) recognition by the conserved deadenylase complexes Pan2-Pan3 and Ccr4-Not is poorly understood. Here we provide a model for poly(A) RNA recognition by two DEDD-family deadenylase enzymes, Pan2 and the Ccr4-Not nuclease Caf1. Crystal structures of Saccharomyces cerevisiae Pan2 in complex with RNA show that, surprisingly, Pan2 does not form canonical base-specific contacts. Instead, it recognizes the intrinsic stacked, helical conformation of poly(A) RNA. Using a fully reconstituted biochemical system, we show that disruption of this structure-for example, by incorporation of guanosine into poly(A)-inhibits deadenylation by both Pan2 and Caf1. Together, these data establish a paradigm for specific recognition of the conformation of poly(A) RNA by proteins that regulate gene expression.
Topics: Crystallography, X-Ray; Exoribonucleases; Models, Molecular; Multiprotein Complexes; Poly A; RNA, Messenger; Ribonucleases; Saccharomyces cerevisiae; Saccharomyces cerevisiae Proteins
PubMed: 31110294
DOI: 10.1038/s41594-019-0227-9 -
Nature Communications Jan 2017Hypomorphic mutations are a valuable tool for both genetic analysis of gene function and for synthetic biology applications. However, current methods to generate...
Hypomorphic mutations are a valuable tool for both genetic analysis of gene function and for synthetic biology applications. However, current methods to generate hypomorphic mutations are limited to a specific organism, change gene expression unpredictably, or depend on changes in spatial-temporal expression of the targeted gene. Here we present a simple and predictable method to generate hypomorphic mutations in model organisms by targeting translation elongation. Adding consecutive adenosine nucleotides, so-called polyA tracks, to the gene coding sequence of interest will decrease translation elongation efficiency, and in all tested cell cultures and model organisms, this decreases mRNA stability and protein expression. We show that protein expression is adjustable independent of promoter strength and can be further modulated by changing sequence features of the polyA tracks. These characteristics make this method highly predictable and tractable for generation of programmable allelic series with a range of expression levels.
Topics: Genetic Techniques; Mutation; Poly A; Promoter Regions, Genetic; Protein Biosynthesis; Proteins; RNA Stability
PubMed: 28106166
DOI: 10.1038/ncomms14112 -
IUBMB Life Dec 1999Arguments are presented in favor of capability of poly(A)-tracts of cellular RNA to form double helices in vivo. It is suggested that formation of the double helix in... (Review)
Review
Arguments are presented in favor of capability of poly(A)-tracts of cellular RNA to form double helices in vivo. It is suggested that formation of the double helix in the mRNA poly(A) tall provides the basis for such processes as polyadenylation termination, PAB I synthesis autoregulation, and stabilization of ARE-containing mRNA by ELAV-like proteins.
Topics: Nucleic Acid Conformation; Poly A; RNA, Double-Stranded; RNA, Messenger
PubMed: 10683761
DOI: 10.1080/713803577 -
Methods in Enzymology 2021An increasing number of investigations have established alternative polyadenylation (APA) as a key mechanism of gene regulation through altering the length of 3'...
An increasing number of investigations have established alternative polyadenylation (APA) as a key mechanism of gene regulation through altering the length of 3' untranslated region (UTR) and generating distinct mRNA termini. Further, appreciation for the significance of APA in disease contexts propelled the development of several 3' sequencing techniques. While these RNA sequencing technologies have advanced APA analysis, the intrinsic limitation of 3' read coverage and lack of appropriate computational tools constrain precise mapping and quantification of polyadenylation sites. Notably, Poly(A)-ClickSeq (PAC-seq) overcomes limiting factors such as poly(A) enrichment and 3' linker ligation steps using click-chemistry. Here we provide an updated PolyA-miner protocol, a computational approach to analyze PAC-seq or other 3'-Seq datasets. As a key practical constraint, we also provide a detailed account on the impact of sequencing depth on the number of detected polyadenylation sites and APA changes. This protocol is also updated to handle unique molecular identifiers used to address PCR duplication potentially observed in PAC-seq.
Topics: 3' Untranslated Regions; Poly A; Polyadenylation; RNA, Messenger; Sequence Analysis, RNA
PubMed: 34183121
DOI: 10.1016/bs.mie.2021.04.001 -
Methods (San Diego, Calif.) Feb 2019The use of RNA-seq as a generalized tool to measure the differential expression of genes has essentially replaced the use of the microarray. Despite the acknowledged...
The use of RNA-seq as a generalized tool to measure the differential expression of genes has essentially replaced the use of the microarray. Despite the acknowledged technical advantages to this approach, RNA-seq library preparation remains mostly conducted by core facilities rather than in the laboratory due to the infrastructure, expertise and time required per sample. We have recently described two 'click-chemistry' based library construction methods termed ClickSeq and Poly(A)-ClickSeq (PAC-seq) as alternatives to conventional RNA-seq that are both cost effective and rely on straightforward reagents readily available to most labs. ClickSeq is random-primed and can sequence any (unfragmented) RNA template, while PAC-seq is targeted to poly(A) tails of mRNAs. Here, we further develop PAC-seq as a platform that allows for simultaneous mapping of poly(A) sites and the measurement of differential expression of genes. We provide a detailed protocol, descriptions of appropriate computational pipelines, and a proof-of-principle dataset to illustrate the technique. PAC-seq offers a unique advantage over other 3' end mapping protocols in that it does not require additional purification, selection, or fragmentation steps allowing sample preparation directly from crude total cellular RNA. We have shown that PAC-seq is able to accurately and sensitively count transcripts for differential gene expression analysis, as well as identify alternative poly(A) sites and determine the precise nucleotides of the poly(A) tail boundaries.
Topics: 3' Flanking Region; Animals; Cells, Cultured; Click Chemistry; Drosophila melanogaster; Gene Expression Profiling; Gene Expression Regulation; Gene Library; Genome, Insect; High-Throughput Nucleotide Sequencing; Insect Proteins; Poly A; Polyadenylation; RNA, Messenger; Sequence Analysis, RNA
PubMed: 30625385
DOI: 10.1016/j.ymeth.2019.01.002