Jump to: Page Content, Section Navigation, Site Navigation, Site Search, Account Information, or Site Tools.
|
|
ReportsNascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters![]()
RNA polymerases are highly regulated molecular machines. We present a method (global run-on sequencing, GRO-seq) that maps the position, amount, and orientation of transcriptionally engaged RNA polymerases genome-wide. In this method, nuclear run-on RNA molecules are subjected to large-scale parallel sequencing and mapped to the genome. We show that peaks of promoter-proximal polymerase reside on
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
30% of human genes, transcription extends beyond pre-messenger RNA 3' cleavage, and antisense transcription is prevalent. Additionally, most promoters have an engaged polymerase upstream and in an orientation opposite to the annotated gene. This divergent polymerase is associated with active genes but does not elongate effectively beyond the promoter. These results imply that the interplay between polymerases and regulators over broad promoter regions dictates the orientation and efficiency of productive transcription.
* These authors contributed equally to this work.
Transcription of coding and noncoding RNA molecules by eukaryotic RNA polymerases requires their collaboration with hundreds of transcription factors to direct and control polymerase recruitment, initiation, elongation, and termination. Whole-genome microarrays and ultra-high-throughput sequencing technologies enable efficient mapping of the distribution of transcription factors, nucleosomes, and their modifications, as well as accumulated RNA transcripts throughout genomes (1, 2), thereby providing a global correlation of factors and transcription states. Studies using the chromatin immunoprecipitation assay coupled to genomic DNA microarrays (ChIP-chip) or to high-throughput sequencing (ChIP-seq) indicate that RNA polymerase II (Pol II) is present at disproportionately higher amounts near the 5' end of many eukaryotic genes relative to downstream regions (3–6). However, these techniques cannot determine whether Pol II is simply promoter-bound or engaged in transcription. Small-scale analyses using independent methods have shown that this distribution likely represents transcriptionally engaged Pol II that has accumulated between
Here, we present a global run-on-sequencing (GRO-seq) assay to map and quantify transcriptionally engaged polymerase density genome-wide. These measurements provide a snapshot of genome-wide transcription and directly evaluate promoter-proximal pausing on all genes. We used nuclear run-on assays (NRO) to extend nascent RNAs that are associated with transcriptionally engaged polymerases under conditions where new initiation is prohibited. To specifically isolate NRO-RNA, we added a ribonucleotide analog [5-bromouridine 5'-triphosphate (BrUTP)] to BrU-tag nascent RNA during the run-on step (fig. S1). The length of the polynucleotide was kept short, and the NRO-RNA was chemically hydrolyzed into short fragments (
In total,
Aligning the GRO-seq data relative to RefSeq TSSs shows that the density of reads peaks near the TSS in both sense (
To identify all genes that show a peak of engaged Pol II that is characteristic of promoter-proximal pausing, we assessed whether each gene showed significant enrichment of read density in the promoter-proximal region relative to the density in the body of each gene (8). The ratio of these densities is called the pausing index (5, 6, 8), and significant pausing indices range from 2 to 103 (fig. S4). Within the defined promoter region, 7057 genes have a significant enrichment of GRO-seq reads relative to the body of the gene (P < 0.01), representing 28.3% of all genes (41.7% of active genes). Comparison of paused genes to either microarray expression or GRO-seq data revealed four classes of genes: class I, not paused and active; class II, paused and active; class III, paused and not active; and class IV, inactive (not paused and not active) (Fig. 3). Class III was severely depleted when we used GRO-seq to classify gene activity because GRO-seq provides a more sensitive measure of gene activity. Given the low signal at the promoters of the few genes left within this class, they are likely to be classified as active with deeper sequencing. Therefore, the overwhelming majority of genes with a paused polymerase also produce significant transcription throughout the gene, albeit often to quantities not detectable by expression microarrays. A recent comparison of Pol II ChIP-seq data to RNA-seq also supports the view that nearly all genes that are bound by Pol II produce full-length transcripts (10).
The density of polymerases within the promoter-proximal region generally correlates with the level of gene activity when all genes (Fig. 4A) or only genes with a paused polymerase are considered (fig. S5). Whereas nearly all paused genes show significant full-length activity by GRO-seq, the pausing index inversely correlates with gene activity (Fig. 4B). Considering that pausing is observed when Pol II enters a pause site faster than the rate of escape from pausing (9), this inverse correlation is consistent with the hypothesis that highly transcribed, but paused genes appear to be controlled, at least in part, by increasing the rate at which Pol II escapes the pause site and enters productive elongation (8).
A prominent and unexpected feature of the GRO-seq profiles around TSSs is the robust signal from an upstream, divergent, engaged polymerase. RNAs generated by these divergent polymerases can be identified at low concentrations when small RNAs are isolated from whole cells (14). These divergent polymerases cannot be accounted for by the 10% of known bidirectional promoters that are less than 1 kb apart (15) (fig. S6). We found that 13,633 genes (55% of all genes, 77% of active genes) display significant divergent transcription within 1 kb upstream of sense-oriented promoter-proximal peaks (P < 0.001), indicating that the number of bidirectional promoters exceeds even the highest estimates (16, 17). However, because it appears that the majority of these promoters produce mRNAs in only one direction (see below), we refer to this class of promoters as divergent. Although the top 10% of active genes have, on average, a slightly larger promoter-proximal than divergent peak (Fig. 3D), amounts of divergent transcription generally correlate with both the promoter-proximal signal (fig. S7) and the transcription level of the associated gene (Fig. 4C). Thus, divergent transcription is a mark for most active promoters. Gene activity, pausing, and divergent transcription correlate with each other and with promoters containing a CpG island. These four characteristics co-occur significantly more often than would be expected by chance (P < 10–52) (table S1). Previous mapping of capped mRNA transcripts has shown that at CpG island promoters initiation occurs broadly over hundreds of base pairs (18), and GRO-seq shows that polymerases initiate and accumulate on this large class of promoters in both orientations. Does existing ChIP-chip data (3) show any indication of the divergent peak of polymerase? Manual inspection of a number of genes and comparison with composite profiles aligned to TSSs show that the Pol II ChIP peak at promoters is accounted for by the two divergent peaks uncovered by GRO-seq (Figs. 1B and 4E). Higher-resolution ChIP-seq data in different cell lines has identified Pol II molecules upstream of promoters that were proposed to be in the same orientation of the annotated gene; however, these instead are likely to represent the divergent promoters identified by GRO-seq (10). Additionally, active promoters are typically marked by histone modifications such as di- and trimethylation of H3-Lys4 (H3K4me2 and H3K4me3) as well as acetylation of histone H3 and H4 (H3ac and H4ac). These modifications show a bimodal distribution around TSSs, with the trough representing a nucleosome-free region encompassing the TSS (3, 4, 19). Comparison of available H3ac and H3K4me2 data in this cell line (3) with GRO-seq suggests that both upstream and downstream peaks of these histone modifications are associated with active transcription, with each peak of histone modifications being adjacent and downstream of an engaged polymerase (Fig. 4F) (8). Other studies have shown that histone modifications associated with transcription elongation (e.g., H3K36me3 and H3K79me3) do not associate in a bimodal fashion around TSSs (4, 19). This and the lack of divergent GRO-seq reads further upstream (fig. S8) indicate that the majority of promoters experience initiation in the upstream direction but that these divergent polymerases do not productively elongate transcripts. Thus, promoters can distinguish polymerase in the forward versus the reverse direction. We envision several possible functions for divergent transcription. First, the act of transcription itself could be crucial for granting access of transcription factors to control elements that reside upstream of core promoters, possibly by creating a barrier that prevents nucleosomes from obstructing transcription factor binding sites (20, 21). Second, as proposed by Seila et al. (14), negative supercoiling produced in the wake of transcribing polymerases could facilitate initiation in these regions. Third, these short nascent RNAs could themselves be functional, through either Argonaute-dependent (22) or -independent (23) pathways. Upcoming challenges will be to decipher whether the widespread transcriptional activity that lies upstream but divergent from the direction of coding genes positively or negatively regulates transcription output and how promoter or unknown DNA elements are designed to distinguish between productive elongation in one direction versus the other.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/1162228/DC1 Materials and Methods SOM Text Figs. S1 to S26 Tables S1 to S3 References
Received for publication 24 June 2008. Accepted for publication 7 November 2008.
The editors suggest the following Related Resources on Science sites:In Science Magazine
THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES:
|
Science. ISSN 0036-8075 (print), 1095-9203 (online)