Here’s my take of histone modifications. It’s probably a reasonably accurate snapshot of what we knew by the end of 2013. (There is a lot more to cover, and this view will surely go out of date fairly quickly. If you are reading this post in 2016, you might want to look for your cheat sheet somewhere else!)
So – this summary is mainly for Sandro, but I am pretty sure there are others who might like to use it.
Histones are proteins that package up DNA. The combination of histones and DNA is called “chromatin”, and this is the natural way one finds DNA in eukaryotic cells. Histones come as two groups of four proteins in a unit, called a nucleosome.
There are different types of histone protein. And just to be extra confusing, the same type of histone protein is sometimes made by more than one gene. (A lot of histone protein needs to be made during each cell cycle.)
Histone proteins are mainly compact, globular structures, but each one has a floppy peptide region at the start of the protein, which is described as the histone “tail” (somewhat confusingly in my view, as it’s at the start of the protein! Isn’t it more like a “trunk”?). These histone tails have many lysines (single amino acid code, K) which can often be modified chemically with the addition of a methyl group (CH3) or an acetyl group (CH3CO; alcohol, basically).
There are some strong themes that emerge in the sets of modifications, and the most important rule for recognising them is that a particular lysine can have either 1, 2 or 3 methyl groups, or one acetyl group. There are other combinations and modifications, but this rule (lysine in one of four states) is the main one.
There are many different histone proteins, and each protein has many different lysines. Because the histone modifications were first described biochemically, they had a standard naming scheme, for instance H3K4me3, or H3K27ac. The naming scheme is:
- H is for histone (H3K4me3)
- The next number is the type of histone protein. Much of the action happens on histone 3 (H3K4me3)
- The next letter is amino acid that is modified. It is very often lysine: K (H3K4me3)
- The next number is the residue of that amino acid. Remember, the histone tail is at the N-terminus, so it is often a small number (H3K4me3)
- The next group is the modification. It might be me1 (mono-methylation), me2 (di-methylation), me3 (tri-methylation, as in the example) or ac (acetylation). (N.B. ‘me’ by itself is ambiguous, but ‘ac’ by itself is not.)
Histone modifications and chromatin behaviour
Histone modifications are observed across the genome, but are very different in different parts of the genome and in different cell types. There is a raging debate about whether histone modifications can be considered to drive chromatin behaviour, or whether chromatin behaviour is simply a consequence of things which are happening nearby on the chromatin (in particular, transcription factor binding) (I am mainly a “consequence” person here). Either way, these modifications are extremely informative about what is going on in any given cell type.
Well-known modifications: some notes
The H3K4me3 vs H3K4me1 pair: promoters vs intergenic
H3K4me1 is kind of its opposite, as it is present far more in intergenic regions, including many “enhancers”, through both active and inactive enhancers. H3K4me1 is more enigmatic than H3K4me3, and is a slightly less localised mark.
H3K4me2 is (normally) tightly correlated with H3K4me3, and I think of it as really on its way of getting its third methylation to become H3K4me3.
The H3K27me3 vs H3K27ac pair: repressed vs active
H3K27me3 is also present on the inactive X chromosome. In common with most other repressive marks, H3K27me3 is far more diffuse, and there are mechanisms that take an initial H3K37me3 region and expand it “automatically”.
In contrast, H3K27ac is a strong, “active” mark, and shows activity over both active promoters and active enhancers. As this is happening in the same residue, you can see that this system is being set up nicely to be either active or repressed.
The H3K36me3 and H3K79me2 pair: transcriptional repression
When a gene is transcribed, there is this huge, hulking protein complex (RNA polymerase) that is marching through the chromatin, making it far easier for cryptic promoters in the DNA sequence to be activated. This would lead to a mess (in particular if they were going anti-sense), except that the RNA polymerase deliberately comes to the rescue with a histone modification scheme that puts down a “don’t start transcription here, because I am transcribing through this region” message: H3K36me3. This means this mark is indicative of polymerase activity, and in theory should be relatively flat across a gene body.
H3K79me2 is the same thing, but only at the start of the gene, whereas H3K36me3 picks up after the first bit.
The H3K9me3 vs H3K9ac pair: repeat repression vs active
There is also something weird going on with H3K9me3 and zinc finger genes.