GE and Science Prize

From DNA Sequence to Chromatin State and Dynamics

Tommy Kaplan

All cells of a living organism share the same DNA, yet they manifest tremendous variability in their structure and function. These differences arise from complex regulatory systems that activate different genes in every cell and tissue during development, and in response to different environments and conditions. Although it has long been clear that this regulatory information is genetically controlled, years after sequencing the first animal genomes, we still know very little about how this information is actually encoded in DNA. My graduate career was devoted to unraveling the principles of transcriptional regulation: one of the major ways in which gene abundance is controlled. My goal was to better understand how cells work, how they respond to external stimuli, why they sometimes stop functioning normally, and what steps we can take to correct them when they do.

I began by studying the activity of transcription factors: proteins that bind to DNA in a sequence-specific manner and, in doing so, alter the expression of nearby genes. With colleagues in Nir Friedman’s lab, I developed computational tools to identify the genome-wide binding locations of transcription factors, allowing us to reconstruct the architecture of the cell’s regulatory map by identifying which genes are regulated by which transcription factors. This was done using complex probabilistic models to represent protein-DNA binding sites (1) and by stronger statistical tools to compute their binding probability (2). I then developed an ab initio algorithm to predict the binding preferences of proteins lacking experimental binding data, based on their sequence and structure (3). Finally, together with colleagues from the O'Shea lab, I developed an analytical approach to dissect complex transcriptional networks (which involve several regulators). We combined in vivo genomic binding data with detailed gene expression measurements under various environmental and genetic perturbations of the regulatory system, and partitioned the yeast response to hyperosmotic shock into functional subnetworks (4). By comparing the binding data with the gene expression levels, we showed that most bound genes were indeed activated by the bound factor. Yet we also found many latent as well as nonfunctional binding sites. These results highlight the role of higher-order mechanisms in gene regulation, including the involvement of DNA packaging in altering the accessibility and physicochemical context of the DNA.

To better understand these epigenetic aspects of transcriptional regulation, I focused on the information stored in the packaging of DNA per se. This includes the position of nucleosomes along the DNA; their covalent modifications, such as acetylation or methylation; and their dynamics over time (5). With Ollie Rando and colleagues, we assayed the single-nucleosome state of histone modifications and dynamics at a genomic scale (6, 7). First, we measured the covalent modification patterns of thousands of nucleosomes in Saccharomyces cerevisiae and found that histone modifications do not occur independently but instead can be roughly split into two groups of co-occurring modifications. The first is largely transcription-independent and serves to mark underlying genomic features (for example, global hypoacetylation marks the nucleosomes directly adjacent to start sites). The other group occurs in gradients through coding regions of genes and is strongly associated with transcription (for example, a gradient of H3K4 methylation in highly expressed genes) (figure, panel A) (6). We then turned to inspecting the replication-independent rates of nucleosome exchange in order to learn more about how these modifications are maintained and dynamically altered. We measured the site-specific integration of tagged nucleosomes after induction, in replication-coupled and replication-independent manners. I developed a mathematical model to analyze these time-series data and estimate the turnover rate of each nucleosome. We showed that nucleosomes are integrated into the genome both in a replication-coupled manner (hence, every cell cycle) and in a replication-independent constant manner. Surprisingly, nucleosome turnover over regulatory regions occured at much higher rates than over coding regions (panel B) 7.


Transcription, histone modifications, and chromatin dynamics. (A) A schematic view of the yeast chromatin architecture. The transcription start site is surrounded by two nucleosomes that exhibit low levels of acetylation at H2BK16, H4K8, and H4K16. Acetylation and methylation typically occur in a gradient from 5' to 3' over actively transcribed genes. (B) Distribution of turnover rates for nucleosomes in G1-arrested yeast, color-coded from red (high turnover rate) to green (low turnover rate). (C) H3K56 acetylation levels explained by genomic replication timing (y axis) and replication-independent turnover rates (x axis). (D) H3K56ac profiles as measured during the cell cycle (left) or modeled based on the replication timing and turnover rate of each nucleosome (right).

These results alter our conception of histone modification and nucleosome positioning and occupancy. Why are "old" nucleosomes marked differently than "new" ones? What function is served by the massive histone replacement at promoters? Is it merely the result of the constant competition with transcription factors, or is it a bona fide regulatory mechanism to "refresh" the histone marks at decision-making loci, aimed at ensuring that transcriptional suppression, activation, or re-initiation occurs only in the continuing presence of an activating stimulus? In addition, we found rapid turnover near the edges of chromatin domains, suggesting that a high turnover rate may serve as a barrier to block the unwanted enzymatic spreading of chromatin states.

Finally, I turned to the newly described H3K56 acetylation mark, previously shown to occur during S phase. We assumed that H3K56ac is a general mark for newly incorporated nucleosomes and should therefore depend on the replication timing at each locus, as well as on the replication-independent turnover rate (panel C) 8. I integrated these two factors into a nonhomogenous rate equation model, resulting in accurate predictions of H3K5ac levels in midlog and synchronized cells during the cell cycle (panel D) 8. Taken together, all these results provide a broad perspective and deepen our understanding of the cross talk between transcription, histone modifications, and chromatin dynamics.

References

  1. Y. Barash et al., in Proceedings of the 7th International Conference. on Research in Computational Molecular Biology (2003), pp. 28-37.
  2. Y. Barash et al., Bioinformatics21, 596 (2005).
  3. T. Kaplan, N. Friedman, H. Margalit, PLoS Comput. Biol.1, e1 (2005).
  4. A. P. Capaldi et al., Nat. Genet. 40, 1300 (2008).
  5. O. J. Rando, H. Y. Chang, Annu. Rev. Biochem.78, 245 (2009).
  6. C. L. Liu et al., PLoS Biol.3, e328 (2005).
  7. M. F. Dion et al., Science315, 1405 (2007).
  8. T. Kaplan et al., PLoS Genet.4, e1000270 (2008).