Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters
Leighton J. Core,*Joshua J. Waterfall,*John T. Lis
RNA polymerases are highly regulated molecular machines. Wepresent a method (global run-on sequencing, GRO-seq) that mapsthe position, amount, and orientation of transcriptionally engagedRNA polymerases genome-wide. In this method, nuclear run-onRNA molecules are subjected to large-scale parallel sequencingand mapped to the genome. We show that peaks of promoter-proximalpolymerase reside on 30% of human genes, transcription extendsbeyond pre-messenger RNA 3' cleavage, and antisense transcriptionis prevalent. Additionally, most promoters have an engaged polymeraseupstream and in an orientation opposite to the annotated gene.This divergent polymerase is associated with active genes butdoes not elongate effectively beyond the promoter. These resultsimply that the interplay between polymerases and regulatorsover broad promoter regions dictates the orientation and efficiencyof productive transcription.
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
* These authors contributed equally to this work.
To whom correspondence should be addressed. E-mail: jtl10{at}cornell.edu
Transcription of coding and noncoding RNA molecules by eukaryoticRNA polymerases requires their collaboration with hundreds oftranscription factors to direct and control polymerase recruitment,initiation, elongation, and termination. Whole-genome microarraysand ultra-high-throughput sequencing technologies enable efficientmapping of the distribution of transcription factors, nucleosomes,and their modifications, as well as accumulated RNA transcriptsthroughout genomes (1, 2), thereby providing a global correlationof factors and transcription states. Studies using the chromatinimmunoprecipitation assay coupled to genomic DNA microarrays(ChIP-chip) or to high-throughput sequencing (ChIP-seq) indicatethat RNA polymerase II (Pol II) is present at disproportionatelyhigher amounts near the 5' end of many eukaryotic genes relativeto downstream regions (3–6). However, these techniquescannot determine whether Pol II is simply promoter-bound orengaged in transcription. Small-scale analyses using independentmethods have shown that this distribution likely representstranscriptionally engaged Pol II that has accumulated between20 and 50 bases downstream of transcription start sites (TSSs)(5, 6), indicating that transcription can be regulated at thestage of elongation as well as the recruitment and initiationstages (7). This promoter-proximal pausing or stalling (8) isproposed to be an important post-initiation, rate-limiting targetfor gene regulation (7, 9).
Here, we present a global run-on-sequencing (GRO-seq) assayto map and quantify transcriptionally engaged polymerase densitygenome-wide. These measurements provide a snapshot of genome-widetranscription and directly evaluate promoter-proximal pausingon all genes. We used nuclear run-on assays (NRO) to extendnascent RNAs that are associated with transcriptionally engagedpolymerases under conditions where new initiation is prohibited.To specifically isolate NRO-RNA, we added a ribonucleotide analog[5-bromouridine 5'-triphosphate (BrUTP)] to BrU-tag nascentRNA during the run-on step (fig. S1). The length of the polynucleotidewas kept short, and the NRO-RNA was chemically hydrolyzed intoshort fragments (100 bases) to facilitate high-resolution mappingof the polymerase originat the time of assay (8). BrU-containingNRO-RNA was triple-selected through immunopurification withan antibody that is specific for this nucleotide analog, resultingin a 10,000-fold enrichment of the NRO-RNA pool that was determinedto be >98% pure (8). A NRO-cDNA library was then preparedfor sequencing from what represents the 5' end of the fragmented,BrU-incorporated RNA molecule by using the Illumina high-throughputsequencing platform. The origin and the orientation of the RNAsand therefore the associated transcriptionally engaged polymeraseswere documented genome-wide by mapping the reads to the referencehuman genome (8).
In total, 2.5 x 107 33–base pair (bp) reads were obtainedfrom two independent replicates (8) prepared from primary humanlung fibroblast (IMR90) nuclei, of which 1.1 x 107 (44%) mappeduniquely to the human genome. Most reads (85.8%) align on thecoding strand within boundaries of known RefSeq genes, humanmRNAs, or expressed sequence tags (fig. S2). The number of transcriptionallyactive genes was determined by using an experimentally and computationallydetermined background of 0.04 reads per kilobase (8). We found16,882 (68%) of RefSeq genes to be active (P < 0.01) comparedwith 8438 active genes found by a microarray experiment performedin the same cell line (3), reflecting, in part, the added sensitivityof sequencing platforms (10). Examination of several large regionsshows that GRO-seq can differentiate between transcriptionallyactive and inactive regions in large chromosomal domains (Fig. 1).In addition, we are able to detect a generally low, but significant(P < 0.01 relative to background) amount of antisense transcriptionfor 14,545 genes (58.7% of genes in the genome) (fig. S3).
Fig. 1. Sample of GRO-seq data view on the University of California at Santa Cruz (UCSC) genome browser. A 2.5-Mb region on chromosome 5 showing GRO-seq reads aligned to the genome at 1-bp resolution, followed by an up-close view around the NPM1 gene. Pol II ChIP results (3) are shown in green; mappable regions, black; GRO-seq reads on the plus strand (left to right), red; GRO-seq reads on the minus strand (right to left), light blue; RefSeq gene annotations, dark blue.
[View Larger Version of this Image (31K GIF file)]
Aligning the GRO-seq data relative to RefSeq TSSs shows thatthe density of reads peaks near the TSS in both sense (50 bp)and antisense (–250 bp) directions (see below) (Fig. 2A).Alignment of GRO-seq reads to annotated 3' ends of genes revealsa broad peak that is maximal at about +1.5 kb and can extendgreater than 10 kb downstream of polyadenylation (poly-A) sites(Fig. 2B). This peak distance is consistent with previous andrecent estimates (11, 12). A small peak followed by a sharpdrop off is observed at the site of polyadenylation, likelyrepresenting the known 3' cleavage before polyadenylation ofthe RNA (13).
Fig. 2. Alignment of GRO-seq reads to TSSs and 3' ends. (A) GRO-seq reads aligned to Ref-seq TSSs in 10-bp windows in both sense (red) and antisense (blue) directions relative to the direction of gene transcription. (B) GRO-seq reads flanking the 3' ends of genes. The sharp peak coincides with the new 5' end created after cleavage at the poly-A site. Polymerase density extends considerably downstream before termination.
[View Larger Version of this Image (17K GIF file)]
To identify all genes that show a peak of engaged Pol II thatis characteristic of promoter-proximal pausing, we assessedwhether each gene showed significant enrichment of read densityin the promoter-proximal region relative to the density in thebody of each gene (8). The ratio of these densities is calledthe pausing index (5, 6, 8), and significant pausing indicesrange from 2 to 103 (fig. S4). Within the defined promoter region,7057 genes have a significant enrichment of GRO-seq reads relativeto the body of the gene (P < 0.01), representing 28.3% ofall genes (41.7% of active genes). Comparison of paused genesto either microarray expression or GRO-seq data revealed fourclasses of genes: class I, not paused and active; class II,paused and active; class III, paused and not active; and classIV, inactive (not paused and not active) (Fig. 3). Class IIIwas severely depleted when we used GRO-seq to classify geneactivity because GRO-seq provides a more sensitive measure ofgene activity. Given the low signal at the promoters of thefew genes left within this class, they are likely to be classifiedas active with deeper sequencing. Therefore, the overwhelmingmajority of genes with a paused polymerase also produce significanttranscription throughout the gene, albeit often to quantitiesnot detectable by expression microarrays. A recent comparisonof Pol II ChIP-seq data to RNA-seq also supports the view thatnearly all genes that are bound by Pol II produce full-lengthtranscripts (10).
Fig. 3. Comparison of pausing with gene activity. Four classes of genes are found when comparing genes with a paused polymerase and transcription activity either by microarray or GRO-seq density in the downstream portions of genes. An example of each class is shown, with tracks shown in the UCSC genome browser as in Fig. 1. The gene names, pausing index, and P value, from top to bottom, respectively, are as follows: TRIO, 1.1, 0.62; FUS, 41, 2.8 x 10–43; IZUMO1, 410, 7.6 x 10–3; and GALP (which has no reads and therefore no pausing index). The number of genes represented in each class is shown to the right.
[View Larger Version of this Image (22K GIF file)]
The density of polymerases within the promoter-proximal regiongenerally correlates with the level of gene activity when allgenes (Fig. 4A) or only genes with a paused polymerase are considered(fig. S5). Whereas nearly all paused genes show significantfull-length activity by GRO-seq, the pausing index inverselycorrelates with gene activity (Fig. 4B). Considering that pausingis observed when Pol II enters a pause site faster than therate of escape from pausing (9), this inverse correlation isconsistent with the hypothesis that highly transcribed, butpaused genes appear to be controlled, at least in part, by increasingthe rate at which Pol II escapes the pause site and enters productiveelongation (8).
Fig. 4. Correlation of promoter-proximal transcription patterns with gene activity. (A to D) Box plots (each showing the fifth, 25th, 50th, 75th, and 95th percentiles) that show the relationship of promoter-proximal (PP) sense peaks (red), divergent peaks (DP) (blue), pausing indices (green), and PP/DP ratios (orange) to the top, middle, and bottom deciles of gene activity. All deciles are significantly different from each other: P < 10–9 for all comparisons except between the lowest and the middle deciles in (D) (P < 10–3). (E) ChIP profiles of Pol II and GRO-seq sense (S) and antisense (AS) strand reads aligned to TSSs. (F) ChIP profiles of H3ac and H3K4me2 and GRO-seq aligned to TSSs.
[View Larger Version of this Image (25K GIF file)]
A prominent and unexpected feature of the GRO-seq profiles aroundTSSs is the robust signal from an upstream, divergent, engagedpolymerase. RNAs generated by these divergent polymerases canbe identified at low concentrations when small RNAs are isolatedfrom whole cells (14). These divergent polymerases cannot beaccounted for by the 10% of known bidirectional promoters thatare less than 1 kb apart (15) (fig. S6). We found that 13,633genes (55% of all genes, 77% of active genes) display significantdivergent transcription within 1 kb upstream of sense-orientedpromoter-proximal peaks (P < 0.001), indicating that thenumber of bidirectional promoters exceeds even the highest estimates(16, 17). However, because it appears that the majority of thesepromoters produce mRNAs in only one direction (see below), werefer to this class of promoters as divergent. Although thetop 10% of active genes have, on average, a slightly largerpromoter-proximal than divergent peak (Fig. 3D), amounts ofdivergent transcription generally correlate with both the promoter-proximalsignal (fig. S7) and the transcription level of the associatedgene (Fig. 4C). Thus, divergent transcription is a mark formost active promoters.
Gene activity, pausing, and divergent transcription correlatewith each other and with promoters containing a CpG island.These four characteristics co-occur significantly more oftenthan would be expected by chance (P < 10–52) (tableS1). Previous mapping of capped mRNA transcripts has shown thatat CpG island promoters initiation occurs broadly over hundredsof base pairs (18), and GRO-seq shows that polymerases initiateand accumulate on this large class of promoters in both orientations.
Does existing ChIP-chip data (3) show any indication of thedivergent peak of polymerase? Manual inspection of a numberof genes and comparison with composite profiles aligned to TSSsshow that the Pol II ChIP peak at promoters is accounted forby the two divergent peaks uncovered by GRO-seq (Figs. 1B and4E). Higher-resolution ChIP-seq data in different cell lineshas identified Pol II molecules upstream of promoters that wereproposed to be in the same orientation of the annotated gene;however, these instead are likely to represent the divergentpromoters identified by GRO-seq (10). Additionally, active promotersare typically marked by histone modifications such as di- andtrimethylation of H3-Lys4 (H3K4me2 and H3K4me3) as well as acetylationof histone H3 and H4 (H3ac and H4ac). These modifications showa bimodal distribution around TSSs, with the trough representinga nucleosome-free region encompassing the TSS (3, 4, 19). Comparisonof available H3ac and H3K4me2 data in this cell line (3) withGRO-seq suggests that both upstream and downstream peaks ofthese histone modifications are associated with active transcription,with each peak of histone modifications being adjacent and downstreamof an engaged polymerase (Fig. 4F) (8). Other studies have shownthat histone modifications associated with transcription elongation(e.g., H3K36me3 and H3K79me3) do not associate in a bimodalfashion around TSSs (4, 19). This and the lack of divergentGRO-seq reads further upstream (fig. S8) indicate that the majorityof promoters experience initiation in the upstream directionbut that these divergent polymerases do not productively elongatetranscripts. Thus, promoters can distinguish polymerase in theforward versus the reverse direction.
We envision several possible functions for divergent transcription.First, the act of transcription itself could be crucial forgranting access of transcription factors to control elementsthat reside upstream of core promoters, possibly by creatinga barrier that prevents nucleosomes from obstructing transcriptionfactor binding sites (20, 21). Second, as proposed by Seilaet al. (14), negative supercoiling produced in the wake of transcribingpolymerases could facilitate initiation in these regions. Third,these short nascent RNAs could themselves be functional, througheither Argonaute-dependent (22) or -independent (23) pathways.Upcoming challenges will be to decipher whether the widespreadtranscriptional activity that lies upstream but divergent fromthe direction of coding genes positively or negatively regulatestranscription output and how promoter or unknown DNA elementsare designed to distinguish between productive elongation inone direction versus the other.
24. We gratefully thank C. Haudenschild for advice on construction of our libraries and for performing the initial alignments, Q. Sun and L. Ponnala for aligning the trimmed reads, A. Siepel for computational and statistical discussion, and the members of the Lis lab for suggestions regarding this work. The work was funded by NIH grant GM25232 to J.T.L. The data discussed in this publication have been deposited in National Center for Biotechnology Information's Gene Expression Omnibus under accession number GSE13518. The authors are filing a patent based on the work in this paper.
Received for publication 24 June 2008. Accepted for publication 7 November 2008.
The editors suggest the following Related Resources on Science sites:
In Science Magazine
PERSPECTIVES
Stephen Buratowski (19 December 2008) Science322 (5909), 1804.
[DOI: 10.1126/science.1168805] |Summary »|Full Text »|PDF »
THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES:
Transcriptional Analysis of the Adeno-Associated Virus Integration Site.
N. Dutheil, E. Henckaerts, E. Kohlbrenner, and R. M. Linden (2009)
J. Virol.
83, 12512-12525
|Abstract »|Full Text »|PDF »
Dynamic bookmarking of primary response genes by p300 and RNA polymerase II complexes.
J. S. Byun, M. M. Wong, W. Cui, G. Idelman, Q. Li, A. De Siervi, S. Bilke, C. M. Haggerty, A. Player, Y. H. Wang, et al. (2009)
PNAS
106, 19286-19291
|Abstract »|Full Text »|PDF »
Molecular mechanisms of RNA polymerase--the F/E (RPB4/7) complex is required for high processivity in vitro.
Establishing legitimacy and function in the new transcriptome.
H. van Bakel and T. R. Hughes (2009)
Brief Funct Genomic Proteomic
8, 424-436
|Abstract »|Full Text »|PDF »
A wave of nascent transcription on activated human genes.
Y. Wada, Y. Ohta, M. Xu, S. Tsutsumi, T. Minami, K. Inoue, D. Komura, J. Kitakami, N. Oshida, A. Papantonis, et al. (2009)
PNAS
106, 18357-18361
|Abstract »|Full Text »|PDF »
The resolution of the genetics of gene expression.
S. B. Montgomery and E. T. Dermitzakis (2009)
Hum. Mol. Genet.
18, R211-R215
|Abstract »|Full Text »|PDF »
The Adenovirus E1B 55-Kilodalton and E4 Open Reading Frame 6 Proteins Limit Phosphorylation of eIF2{alpha} during the Late Phase of Infection.
Transcriptome analysis by strand-specific sequencing of complementary DNA.
D. Parkhomchuk, T. Borodina, V. Amstislavskiy, M. Banaru, L. Hallen, S. Krobitsch, H. Lehrach, and A. Soldatov (2009)
Nucleic Acids Res.
37, e123
|Abstract »|Full Text »|PDF »
Paused Pol II captures enhancer activity and acts as a potent insulator.
Long noncoding RNAs: functional surprises from the RNA world.
J. E. Wilusz, H. Sunwoo, and D. L. Spector (2009)
Genes & Dev.
23, 1494-1504
|Abstract »|Full Text »|PDF »
Promoter targeted small RNAs induce long-term transcriptional gene silencing in human cells.
P. G. Hawkins, S. Santoso, C. Adams, V. Anest, and K. V. Morris (2009)
Nucleic Acids Res.
37, 2984-2995
|Abstract »|Full Text »|PDF »
Gene-body hypermethylation of ATM in peripheral blood DNA of bilateral breast cancer patients.
J. M. Flanagan, M. Munoz-Alegre, S. Henderson, T. Tang, P. Sun, N. Johnson, O. Fletcher, I. dos Santos Silva, J. Peto, C. Boshoff, et al. (2009)
Hum. Mol. Genet.
18, 1332-1342
|Abstract »|Full Text »|PDF »
A. C. Seila, J. M. Calabrese, S. S. Levine, G. W. Yeo, P. B. Rahl, R. A. Flynn, R. A. Young, and P. A. Sharp (2008)
Science
322, 1849-1851
|Abstract »|Full Text »|PDF »