Note to users. If you're seeing this message, it means that your browser cannot find this page's style/presentation instructions -- or possibly that you are using a browser that does not support current Web standards. Find out more about why this message is appearing, and what you can do to make your experience of our site the best it can be.
Amy C. Seila,1*J. Mauro Calabrese,1,2*Stuart S. Levine,3Gene W. Yeo,4Peter B. Rahl,3Ryan A. Flynn,1Richard A. Young,2,3Phillip A. Sharp1,2
Transcription initiation by RNA polymerase II (RNAPII) is thoughtto occur unidirectionally from most genes. Here, we presentevidence of widespread divergent transcription at protein-encodinggene promoters. Transcription start site–associated RNAs(TSSa-RNAs) nonrandomly flank active promoters, with peaks ofantisense and sense short RNAs at 250 nucleotides upstream and50 nucleotides downstream of TSSs, respectively. Northern analysisshows that TSSa-RNAs are subsets of an RNA population 20 to90 nucleotides in length. Promoter-associated RNAPII and H3K4-trimethylatedhistones, transcription initiation hallmarks, colocalize atsense and antisense TSSa-RNA positions; however, H3K79-dimethylatedhistones, characteristic of elongating RNAPII, are only presentdownstream of TSSs. These results suggest that divergent transcriptionover short distances is common for active promoters and mayhelp promoter regions maintain a state poised for subsequentregulation.
1 Koch Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 2 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 3 Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA. 4 Salk Institute, Crick-Jacobs Center for Theoretical and Computational Biology, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA.
* These authors contributed equally to this work.
Present address: Department of Genetics and the Carolina Centerfor Genome Sciences, University of North Carolina, Chapel Hill,NC 27599, USA.
Present address: Department of Cellular and Molecular Medicine,University of California, San Diego, CA 92037, USA.
To whom correspondence should be addressed. E-mail: sharppa{at}mit.edu
Transcription of DNA by RNAPII is an orchestrated process subjectto regulation at numerous levels: binding of RNAPII to the promoter,transcription initiation, and elongation. These phases and theirtransitions require concerted action by many protein complexesand are accompanied by changes in local chromatin structure(1).
When examining short RNA expression in murine embryonic stem(ES) cells, we noted the presence of 20 nucleotide (nt)–longRNAs located near the transcription start site (TSS) of protein-encodinggenes (2). To further investigate these low abundance RNAs,8.4 million sequence reads were analyzed from several murineshort RNA cDNA libraries: 7.3 million were derived from ES cellsand 1.1 million from differentiated cell types (3, 4). About42,000 of these reads, referred to as TSSa-RNAs, uniquely mappedwithin 1.5 kb of protein-encoding gene TSSs (Fig. 1 and tableS1). A single TSS frequently had more than one associated TSSa-RNA(Fig. 1B). TSSa-RNAs were associated with more than half ofall mouse genes and were detected in all cell types examined(fig. S1). TSSa-RNAs were also found in ES cells lacking Dicer,an RNase III enzyme necessary for microRNA processing, suggestingthat they are not Dicer products (fig. S1F). Sequenced TSSa-RNAswere 16 to 30 nt long, with a mean length of 20 nt (fig. S2).
Fig. 1. The distribution of TSSa-RNAs around TSSs shows divergent transcription. (A) Histogram of the distance from each TSSa-RNA to all associated gene TSSs (4). Counts of TSSa-RNA 5' positions relative to gene TSSs are binned in 20-nt windows. Red and blue bars represent bins of sense and antisense TSSa-RNAs, respectively. (B) Percentage of annotated mouse genes with indicated number of associating TSSa-RNAs.
[View Larger Version of this Image (18K GIF file)]
TSSa-RNAs surround promoters in nonrandom, divergent orientations.Sense TSSa-RNAs map downstream of the associated promoter, overlappinggenic transcripts and peaking in abundance between +0 and +50nt downstream of the TSS. Forty percent of TSSa-RNAs map upstreamof the TSS and are oriented in the antisense direction relativeto their associated genes, peaking between nucleotides –100and –300 (Fig. 1A). Sense and antisense TSSa-RNAs associatedwith overlapping sets of 8115 and 6331 gene promoters, respectively(table S2). This distribution is not dependent on mapping toeither head-to-head gene pairs or genes with multiple TSSs,nor is it seen in intergenic regions or at gene 3' ends (figs.S3 and S4).
A majority (67%) of ES cell genes with two or more TSSa-RNAshave both sense and antisense species; thus, individual TSSsproduce both RNA subtypes (fig. S3). Based on their directionand position relative to TSSs, we hypothesize that sense andantisense TSSa-RNAs arise from divergent transcription, definedas nonoverlapping transcription initiation events that proceedin opposite directions from the TSS. Divergent transcriptionis likely a common feature of mammalian TSSs, given the presenceof TSSa-RNAs in all cell types examined in this study.
TSSa-RNAs associate with genes expressed at varying levels inES cells but were biased toward higher levels of gene expression.TSSa-RNAs were found at the majority of highly and moderatelyexpressed genes (Fig. 2 and fig. S5), and 80% associate withpromoters having high CpG dinucleotide frequency (CpG islands)(table S2). Additionally, the number of TSSa-RNA observationsper gene correlated positively with gene expression levels,with a notable increase in the sense:antisense ratio found atthe highest levels of expression (Fig. 2B). This increase suggeststhat a fraction of these reads from the most active genes arisefrom mRNA turnover.
Fig. 2. In ES cells, TSSa-RNA associated genes are primarily expressed. (A) ES cell expression data separated into four bins based on Log2 signal intensity. Off, 1 to 4; low, 5 to 8; med, 6 to 12; and high 13 (13). Gene counts for each expression bin are shown. (B) Ratio of sense to antisense reads in each expression bin.
[View Larger Version of this Image (19K GIF file)]
Whereas typical RNAPII transcripts have a bias toward G at their5' ends, TSSa-RNAs show a nearly random 5'-nucleotide distribution(4, 5) (table S3). This significant distribution differencesuggests that the 5'-most base of the TSSa-RNAs does not representthe initial nucleotide transcribed by RNAPII.
Based on sequencing frequency, 20 nt TSSa-RNAs are estimatedto be present at 1 molecule per 10 cells (4). Therefore, anenrichment procedure was developed to determine the nature ofthe short RNAs surrounding TSSa-RNA–associated genes.Sequenced 21-nt sense and antisense TSSa-RNAs associated withRing finger protein 12 (Rnf12) or Coiled-coil domain containing52 (Ccdc52) transcripts, respectively, were not detected asunique species in ES cells. Instead, species between 20 and90 nt were detected at levels estimated to be greater than 10molecules per cell (4) (Fig. 3, B and D). Similar sized fragmentswere not found in HeLa cell RNA samples using the same sequenceprobes, demonstrating specificity of the procedure (Fig. 3, B and D).Northern analysis for two other TSSa-RNA–associated genesshowed similar results (figs. S6 and S7). We suggest that 20to 90 nt transcripts are the dominant short RNA species fromthese promoters and that sequenced TSSa-RNAs represent no morethan 10% of promoter-associated transcripts.
Fig. 3. Transcripts from TSSa-RNA–associated regions are 20 to 90 nt long. TSS-RNAs are shown as arrowheads. (A) Map of the sense TSSa-RNA Rnf12 region. (B) Northern analysis for Rnf12 sense TSSa-RNA using probe 1 in (A). Lane 1, 10–base pair ladder. Lanes 2 to 5, detection controls with 15, and 1.5, 0.75, and 0 fMol, respectively, of RNA oligo 1a+1b in (A). Lanes 6 to 8, material recovered from ES RNA (ESC), HeLa RNA (H-), and HeLa RNA + 15 fMol RNA oligo 1a+1b (H+), respectively, using DNA oligo 1 in (A). (C) Map of the antisense TSSa-RNA Ccdc52 region. (D) Northern analysis for the Ccdc52 antisense TSSa-RNA using probe 2 in C. Lanes 1 to 7 are as lanes 2 to 8 in (B), except using RNA oligo 2 for controls and DNA oligo 2 in (C) for enrichment. Bracket marks ESC-specific transcripts; * marks background band.
[View Larger Version of this Image (57K GIF file)]
To further classify promoters that produce TSSa-RNAs, we examinedtheir local chromatin environment using chromatin immunoprecipitationcoupled with DNA sequencing (ChIP-seq) (3, 4). TSSa-RNA–associatedpromoters are enriched in bound RNAPII and histone H3 lysine4 trimethylated (H3K4me3) chromatin in ES cells (Fig. 4A). About90% of TSSa-RNA–associated genes show H3K4me3-modifiednucleosomes at their promoters, as compared to 60% for all mousegenes (Fig. 4A). TSSa-RNA–associated genes also show a3-fold enrichment in promoter proximal RNAPII over all genes(Fig. 4A). In contrast, TSSa-RNA–associated genes aredepleted of the Polycomb component Suz12, a known transcriptionalrepressor thought to help maintain pluripotency by repressingdevelopmental regulators (Fig. 4A) (6, 7).
Fig. 4. Relationship between TSSa-RNAs, chromatin structure, and RNAPII. (A) Percentage of genes associated with H3K4me3, RNAPII and Suz12. ALL, all mouse genes. t test gives P < 2.2 x 10–16 for all marks. (B) Schematic of factor binding site mapping using Chip-seq reads. (C) Chromosomal position versus enrichment ratio for H3K4me3-modified nucleosomes and RNAPII for representative gene Mknk2. TSSa-RNAs are shown as arrowheads. (D) Metagene profiles for forward (green) and reverse (yellow) reads for ChIP-seq data (first three panels) and TSSa-RNAs from the sense (red) and antisense (blue) strand (bottom panel). Panels are aligned at the TSS. The TSS is denoted by the arrow. Black numbers on the profiles define the midpoint between forward and reverse peaks. Red and blue bars above the ChIP-seq profiles represent sense and antisense TSSa-RNA peak maxima.
[View Larger Version of this Image (35K GIF file)]
Composite profiles of ChIP-seq data were used to determine RNAPIIand histone modification positions relative to TSS, revealinga correlation with sense and antisense TSSa-RNA peaks. In suchanalyses, the midpoint between the forward and reverse ChIP-seqread maxima defines the average DNA binding site for a factor(Fig. 4B) (3). At TSSa-RNA–associated genes, two distinctpeaks for RNAPII are detectable with a spacing of several hundredbase pairs (Fig. 4, C and D). A sharp RNAPII peak just downstreamof the TSS lies directly over the sense TSSa-RNA peak (Fig. 4D).A second RNAPII peak, upstream of the first, is more diffusebut again lies directly over the antisense TSSa-RNA peak (Fig. 4D).The co-occurrence with antisense TSSa-RNAs strongly suggeststhat the upstream peak of RNAPII is indicative of divergenttranscription rather than sense initiation upstream of the TSS,as has been proposed (8).
H3K4me3-modified nucleosome alignment with respect to the TSSshows peaks flanking the TSSa-RNA and RNAPII maxima, consistentwith H3K4 methylation at the nucleosomes immediately upstreamand downstream of TSSs (Fig. 4, C and D). These flanking peakssuggest that divergently paused RNAPII complexes may recruitH3K4 methyltransferase activity to mark active promoter boundaries.In contrast to the dual peaks of RNAPII and H3K4me3 surroundingTSSs, H3K79me2, a chromatin mark found over RNAPII elongationregions, is solely enriched in the direction of productive transcription(Fig. 4D). These observations suggest that although divergenttranscription initiation is widespread, productive elongationby RNAPII occurs primarily unidirectionally, downstream of TSSs.
Sense and antisense TSSa-RNAs with bound RNAPII are found ata large number of mammalian promoters, suggesting that divergentinitiation by RNAPII at TSSs is a general feature of transcriptionalprocesses. Supporting this hypothesis, genome-wide nuclear run-onassays by Core et al. show that divergent transcripts arisefrom transcriptionally engaged RNAPII at many genes in humanfibroblasts (9).
Because TSSa-RNAs do not represent the 5' end of transcripts,they likely mark regions of RNAPII pausing rather than initiation.Pausing of RNAPII 20 to 50 nt downstream of the TSS has beenobserved at many genes, most notably Drosophila Hsp70, and isthought to maintain a chromatin structure permissive to transcriptioninitiation (10, 11). The results presented here suggest thepresence of antisense paused RNAPII upstream of many TSSs. Theposition of paused, antisense RNAPII centers around 250 nt upstreamof the TSS, as inferred by the presence of bound RNAPII andantisense short RNAs colocalizing at this location. Consideringthat chromatin marks associated with elongating RNAPII are onlyfound downstream of TSSs, it appears that antisense RNAPII frequentlydoes not elongate after TSSa-RNA production (Fig. 4D) (12–14).This suggests the existence of an undefined mechanism that discriminatesbetween the sense and antisense polymerase for productive elongation.
RNAPII initiation complex polarity at promoters is thought tobe established by TFIID/TBP complex binding together with TFIIB(15). RNAPII/TFIIF binding and DNA unwinding by the TFIIH helicasethen gives rise to the open preinitiation complex (10). Theprevalence of divergently oriented RNAPII at most promoterssuggests a more complex situation. We hypothesize that transcriptionfactors first nucleate a sense-oriented preinitiation complexat the TSS. Transcription by this complex generates at leasttwo signals that could subsequently promote upstream antisensepaused polymerase. First, the RNAPII carboxy-terminal domainand other initiation complex components can activate transcriptionwhen tethered to DNA, suggesting that the sense complex maypromote antisense preinitiation complex formation in the upstreamregion (16). Second, as RNAPII elongates the sense transcript,negative supercoiling of the DNA will occur upstream, perhapspromoting the antisense initiation process (17). This divergenttranscription could structure chromatin and nascent RNA at theTSS for subsequent regulation.
9. L. J. Core, J. J. Waterfall, J. T. Lis, Science322, 1845 (2008); published online 4 December 2008 (10.1126/science.1162228).[Abstract/Free Full Text]
18. We thank G. Zheng, C. Whittaker, S. Hoersch, and A. F. Seila. A.C.S. was supported by NIH postdoctoral fellowship 5-F32-HD051190 and G.W.Y. by the Crick-Jacobs Center for Computational Biology. This work was supported by NIH grants RO1-GM34277 and HG002668, NCI grant PO1-CA42063, and the NCI Cancer Center Support (core) grant P30-CA14051. The data discussed in this publication have been deposited in National Center for Biotechnology Information's Gene Expression Omnibus under accession numbers GSE13483 and GSE12680.
Received for publication 24 June 2008. Accepted for publication 12 November 2008.
The editors suggest the following Related Resources on Science sites:
In Science Magazine
PERSPECTIVES
Stephen Buratowski (19 December 2008) Science322 (5909), 1804.
[DOI: 10.1126/science.1168805] |Summary »|Full Text »|PDF »
THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES:
Transcriptional Analysis of the Adeno-Associated Virus Integration Site.
N. Dutheil, E. Henckaerts, E. Kohlbrenner, and R. M. Linden (2009)
J. Virol.
83, 12512-12525
|Abstract »|Full Text »|PDF »
DNA topoisomerase I inhibition by camptothecin induces escape of RNA polymerase II from promoter-proximal pause site, antisense transcription and histone acetylation at the human HIF-1{alpha} gene locus.
L. Baranello, D. Bertozzi, M. V. Fogli, Y. Pommier, and G. Capranico (2009)
Nucleic Acids Res.
|Abstract »|Full Text »|PDF »
Establishing legitimacy and function in the new transcriptome.
H. van Bakel and T. R. Hughes (2009)
Brief Funct Genomic Proteomic
8, 424-436
|Abstract »|Full Text »|PDF »
SetDB1 contributes to repression of genes encoding developmental regulators and maintenance of ES cell state.
S. Bilodeau, M. H. Kagey, G. M. Frampton, P. B. Rahl, and R. A. Young (2009)
Genes & Dev.
23, 2484-2489
|Abstract »|Full Text »|PDF »
A wave of nascent transcription on activated human genes.
Y. Wada, Y. Ohta, M. Xu, S. Tsutsumi, T. Minami, K. Inoue, D. Komura, J. Kitakami, N. Oshida, A. Papantonis, et al. (2009)
PNAS
106, 18357-18361
|Abstract »|Full Text »|PDF »
High DNA melting temperature predicts transcription start site location in human and mouse.
D. G. Dineen, A. Wilm, P. Cunningham, and D. G. Higgins (2009)
Nucleic Acids Res.
|Abstract »|Full Text »|PDF »
Transcriptome analysis by strand-specific sequencing of complementary DNA.
D. Parkhomchuk, T. Borodina, V. Amstislavskiy, M. Banaru, L. Hallen, S. Krobitsch, H. Lehrach, and A. Soldatov (2009)
Nucleic Acids Res.
37, e123
|Abstract »|Full Text »|PDF »
Microarray analysis of cytoplasmic versus whole cell RNA reveals a considerable number of missed and false positive mRNAs.
H. W. Trask, R. Cowper-Sal-lari, M. A. Sartor, J. Gui, C. V. Heath, J. Renuka, A.-J. Higgins, P. Andrews, M. Korc, J. H. Moore, et al. (2009)
RNA
15, 1917-1928
|Abstract »|Full Text »|PDF »
Gains and unexpected lessons from genome-scale promoter mapping.
K. S. Shavkunov, I. S. Masulis, M. N. Tutukina, A. A. Deev, and O. N. Ozoline (2009)
Nucleic Acids Res.
37, 4919-4931
|Abstract »|Full Text »|PDF »
The discovery of eukaryotic genome design and its forgotten corollary--the postulate of gene regulation by nuclear RNA.
Long noncoding RNAs: functional surprises from the RNA world.
J. E. Wilusz, H. Sunwoo, and D. L. Spector (2009)
Genes & Dev.
23, 1494-1504
|Abstract »|Full Text »|PDF »
Promoter targeted small RNAs induce long-term transcriptional gene silencing in human cells.
P. G. Hawkins, S. Santoso, C. Adams, V. Anest, and K. V. Morris (2009)
Nucleic Acids Res.
37, 2984-2995
|Abstract »|Full Text »|PDF »
Four histone variants mark the boundaries of polycistronic transcription units in Trypanosoma brucei.
T. N. Siegel, D. R. Hekstra, L. E. Kemp, L. M. Figueiredo, J. E. Lowell, D. Fenyo, X. Wang, S. Dewell, and G. A.M. Cross (2009)
Genes & Dev.
23, 1063-1076
|Abstract »|Full Text »|PDF »
Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters.