Journal list menu

Volume 587, Issue 8 p. 1247-1257
Review
Open Access

Deciphering post-translational modification codes

Adam P. Lothrop

Adam P. Lothrop

Department of Biology, Tufts University, 200 Boston Ave. Suite 4700, Medford, MA 02155, United States

Search for more papers by this author
Matthew P. Torres

Corresponding Author

Matthew P. Torres

School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, United States

Corresponding authors.Search for more papers by this author
Stephen M. Fuchs

Corresponding Author

Stephen M. Fuchs

Department of Biology, Tufts University, 200 Boston Ave. Suite 4700, Medford, MA 02155, United States

Corresponding authors.Search for more papers by this author
First published: 09 February 2013
Citations: 127

Abstract

Post-translational modifications (PTMs) occur on nearly all proteins. Many domains within proteins are modified on multiple amino acid sidechains by diverse enzymes to create a myriad of possible protein species. How these combinations of PTMs lead to distinct biological outcomes is only beginning to be understood. This manuscript highlights several examples of combinatorial PTMs in proteins, and describes recent technological developments, which are driving our ability to understand how PTM patterns may “code” for biological outcomes.

1 Patterns, signatures, and codes

Most proteins are post-translationally regulated in some manner by enzymes that directly alter the chemical makeup of the protein. These enzymes can be proteases, transferases (kinases, acetyltransferases, methyltransferases, glycosyltransferases, etc.), or enzymes that remove groups (phosphatases, deacetylases, glycosidases, etc.). In all, more than 400 discrete types of modifications can occur and, to date, more than 90 000 individual PTMs have been identified through biochemical and biophysical analysis [1]. PTMs are known to act alone and in combination to regulate nearly all aspects of protein function. Thus, deciphering how PTMs are coordinately regulated is of fundamental importance to our understanding of biology.

While many proteins are known to be heavily modified, combinatorial PTMs have perhaps been best studied in the context of histones, where, more than ten years ago, Strahl and Allis proposed that PTMs on the tails of histone proteins, alone or in combination, specify downstream events [2]. The original paper talks about a language of histone modifications, but today these concepts are generally referred to as the “histone code hypothesis” [2]. In the intervening years, several significant events have occurred. Tremendous technological advances have been made, allowing us to identify numerous histone PTMs, the enzymes responsible for transferring and removing many PTMs, and a host of protein domains that recognize specific histone PTMs. Improvements in mass spectrometry and proteomics techniques have also revolutionized the rate and detail with which PTMs are identified within the proteome. Consequently, there is great interest in identifying additional codes that modulate protein function [3-5]. Likewise, interdependences between PTMs within distant regions of the same protein, or on different proteins within complexes, are now commonplace. Thus, there is a nearly constant re-examination of how we define the interrelationships of multiple PTMs.

Among researchers interested in PTM biology, there is often debate over the nomenclature used to describe how the multitude of PTMs on a given protein regulates function. Currently, our capacity to detect PTMs far exceeds our ability to understand their biological function. This unfortunate, but important distinction is at the root of the controversy underlying the use of the term “code” to refer to patterns of PTMs that are “read” by the cell to drive biological outcomes. A general set of principles to decipher these codes has not yet emerged, and consequently many find the term “code” misleading. Whether they are called “codes” or not, we can clearly state that: (1) Many proteins have regions within their primary sequence that are targets for extensive, and often overlapping, modification by enzymes; (2) In many cases, these PTMs can recruit the binding or modulate the activity of other proteins; (3) Patterns of PTMs can be identified that correlate with differential biological states (e.g. normal or disease states, cell cycle stage, aging) (Fig. 1 ). This review focuses on methods developed to understand the complex biology of PTMs, and the growing evidence demonstrating that the interactions of modifications that exist across a landscape of proteins act concomitantly to orchestrate complex biological outcomes.

figure image
Combinatorial PTMs can code for complex biological outcomes. (A) Modifications such as methylation (red), phosphorylation (yellow), or acetylation (blue) are commonly recognized by proteins with PTM-recognition domains (purple and cyan). Modifications such as lysine methylation can occur up to three times on a single residue resulting in PTMs with distinct activity. (B) Neighboring PTMs have differing effects on the ability of proteins to recognize a phosphorlyation site. For example, the purple protein requires dimethylation of the lysine, but is occluded by trimethylysine and uninfluenced by the neighboring acetylation. In contrast, the cyan protein can be blocked by aceylation but is uneffected by methylation. (C) The combinatorial PTMs setup a “code,” that determines which protein–protein interactions lead to distinct biological outcomes.

2 Combinatorial PTMs coordinate protein–protein interactions – lessons from histone tails

DNA is packaged around two copies each of four histones (H2A, H2B, H3, and H4) to form nucleosomes, the basic unit of chromatin structure. Nucleosomes play pivotal roles in compacting the genome and protecting it from damage. However, packaging of DNA into chromatin is repressive towards DNA-templated processes such as transcription [6]. Eukaryotic cells balance the needs to copy and read, but simultaneously protect, genetic information through a complex network of PTMs primarily directed toward the N- and C-terminal tails of the four histones. Histone PTMs can alter the charge of histones (e.g. lysine acetylation) and recruit specific binding domains (e.g. acetylation, methylation, phosphorylation) associated with proteins such as chromatin remodelers, transcriptional coactivators/repressors, and DNA repair proteins. Histone PTMs have gained prominence since the mid 1990s when the Allis and Schreiber groups demonstrated that histone-modifying enzymes have direct roles in regulating gene expression [7, 8]. The sequencing of the human genome, development of chromatin immunoprecipitation (ChIP), and next-generation sequencing technologies have now made histone PTMs the best studied of all cellular modifications. Just as there are a large number of PTMs on the histone tails, there are also numerous protein domains that recognize and bind to particular PTMs on these tails. For example, PTM-recognition domains such as PHD (Plant homeodomain) fingers, chromodomains, and Tudor domains all recognize methylated lysine residues, whereas bromodomains and 14-3-3 domains recognize acetylysine and phosphoserine/threonine respectively [9].

Most PTM-recognition domains recognize a particular modification within a defined amino acid sequence, indicating that neighboring amino acid sequence is important in the context of substrate recognition by these proteins. Because histone tails are rich in PTMs, the presence of nearby modifications influences the ability of protein factors to recognize a particular PTM. For example, phosphorylation of Ser10 on histone H3 (H3S10) negatively influences HP1 (heterochromatin protein 1) recognition of methylation on neighboring Lys9 (H3K9) [10]. This phenomenon is referred to as the “phospho-methyl” switch, as H3S10 phosphorylation acts as a switch to prevent the binding of HP1 to chromatin in mitosis [10]. Recently, investigators uncovered an exception to this finding, where the tandem Tudor domain of UHRF1 binds to H3K9 methylation irrespective of the H3S10 phosphorylation state [11]. This finding suggests that multiple PTMs can act in concert to carefully orchestrate the binding of numerous factors to the same primary modification. Similar examples have been observed elsewhere on the H3 tail, where modification at either Arg2 or Thr3 can impact recognition of either neighboring Lys4 methylation or the free N-terminus of histone H3 [12].

The example of the histone tails demonstrates that the influence of neighboring PTMs on recognition likely extends far beyond the few examples we have currently identified. Several groups recently demonstrated the potentially confounding effects neighboring PTMs have on the ability of antibodies to recognize histone modifications [13-16]. In addition to illustrating important points about how combinatorial PTMs mediate interactions, they also highlight a need for great care in the development and implementation of reagents used to detect PTMs.

3 PTM codes are dynamic – the C-terminal domain (CTD) of RNA polymerase and the “CTD code”

In addition to the combinatorial nature of PTMs, the dynamics between distinct codes play a prominent role in modulating biological function. The examples of combinatorial PTMs on histone tails might lead to the assumption that patterns of modifications are established and remain stable in the cell. While some histone PTMs are quite long-lived, such as lysine methylation associated with heterochromatin formation (H3K9 and H3K27 methylation) [17], PTM patterns on histones and other proteins are often dynamic. Dynamics in PTM patterns is perhaps most apparent in PTM changes to the C-terminal domain (CTD) of eukaryotic RNA polymerase II (RNAPII) during transcription [18, 19]. The CTD in eukaryotic cells is comprised of a repeating seven amino acid sequence (Tyr-Ser-Pro-Thr-Ser-Pro-Ser) where all seven amino acids within a repeat are subject to PTMs such as phosphorylation, O-GlcNAcylation (addition of N-acetylglucosamine), and proline isomerization [20]. As RNAPII transcribes DNA, the CTD adopts well characterized changes in phosphorylation patterns. Ser5 phosphorylation is most prevalent in RNAPII enzymes near the 5′ end of genes; Ser2 phosphorylation is most prevalent on RNAPII at the 3′ end, while phosphorylation at both Ser2 and Ser5 is associated with RNAPII in the middle of genes [20]. These PTM patterns are established, and likely re-established, during every transcription cycle by a carefully orchestrated network of kinases and phosphatases – serving to recruit specific factors to the CTD at distinct times during the course of gene transcription. For example, phosphorylation of Ser5 of the CTD repeats by the Cdk7 kinase subunit of TFIIH recruits the mRNA capping complex where it can act on the nascent transcript [21]. Similarly, the histone-modifying enzyme Set2, which orchestrates histone methylation and acetylation patterns important for transcription elongation, is recruited by the combination of both Ser2 and Ser5 phosphorylation [22]. Recent reports suggest that interactions with the CTD may not be limited to just the combinatorial phosphorylation patterns. Indeed many protein factors recognize the CTD in a manner that also depends on the isomerization state of the Ser–Pro peptide bond [126, 127]. While prolyl isomerases do not add a chemical moiety to the protein chain like kinases or methyltransferases, isomerization has a profound effect on local peptide structure. This is evident in the recent crystal structure of the phosphatase Ssu72, which only recognizes its substrate (CTD with Ser5 phosphorylation) when the Ser5–Pro6 is found in the cis-conformation [126]. As the cis-conformation is the energetically disfavored state, isomerization of the CTD (either uncatalyzed or catalyzed by prolyl isomerases) may act as a timer to coordinate the recruitment of CTD-modifying and associating factors [23].

4 Patterns of PTMs are indicative of discrete functional states – regulation of p53 by overlapping PTMs

As we learn more about the role of PTMs in protein regulation and disease progression, we would like a set of rules that simplify the potential functional outcomes of particular modifications. This could lead to the identification of biomarkers for particular diseases (e.g. hyperphosphorylation of Tau associated with Alzheimer's disease [24]). However, discerning the contribution of individual modifications to protein regulation is complicated by the dynamic and overlapping nature of PTMs. For example, the C-terminal domain of the tumor suppressor protein p53 is subject to numerous modifications (e.g. acetylation, methylation, ubiquitination, neddylation, phosphorylation). To further complicate the situation, many modifications can occur on the same location (e.g. acetylation, methylation and ubiquitination of at least four lysines: 370, 372, 373, and 382. Under normal conditions p53 protein is maintained at low levels in cells, mediated in part by ubiquitination at these lysines [25]. In response to stress, these lysines are substrates of the acetyltransferase CBP/p300, which contributes to stabilization of p53 and enhancement of DNA binding. Moreover, these same residues can be methylated, and neighboring serine/threonine residues can be phosphorylated – altering the recruitment or specificity of enzymes such as CBP/p300.

5 Detecting and deciphering PTM codes

The study of combinatorial PTMs is driven by two fundamental goals: one, the detection, mapping and quantitation of combinatorial PTMs; and two, deciphering the codes in which they participate to modulate biological function (Fig. 2 ).

figure image
Relationships between experimental approaches and how they are used to decipher the functions of PTM codes. Arrows are meant to designate the use of an approach toward identifying PTMs, defining coexisting PTMs, or deciphering function. The thickness of each arrow represents the relative extent of contributions.

Over the last ∼30 years, technological advances have dramatically improved the sensitivity and dynamic range of non-radioactive PTM detection methods. In the last two decades alone mass spectrometry-based protein analysis has driven PTM research beyond the detection of single PTMs on individual proteins to the simultaneous detection, localization and quantitation of thousands of PTMs across entire proteomes and within hours of analysis time [26]. Concomitant with advances in PTM detection there have been notable improvements in technologies that decipher the biological context of PTMs. Advancements in fluorophore chemistry, fluorescence spectrophotometry, peptide and antibody synthesis, and microarray-based technologies enable rapid analysis of PTM-dependent protein–protein interactions on a massively combinatorial scale. Further evolution of PTM research will undoubtedly benefit from integrating detection and deciphering technologies to promote a deeper understanding of the functional nature of PTMs. In the following sections, we discuss some of the fundamental technological advances aimed toward detecting and deciphering PTM codes.

6 Analysis of PTM codes by mass spectrometry

The study of PTM codes begins with the detection and quantitation of individual PTMs. Necessarily, the more PTMs that can be accurately measured in a system, the more accurately a “code” may be defined. Mass spectrometry (MS) has become the most powerful analytical technique for detecting combinatorial PTMs and relies on the integration of powerful instrumentation, sophisticated data analysis, carefully chosen analytical strategy, and applied quantitative techniques. Successfully implementing these four aspects is essential for detecting, mapping and quantifying PTMs, and therefore PTM codes. Here we briefly discuss some of the fundamental parameters underlying MS analysis of combinatorial PTMs.

6.1 Instrumentation and data analysis

Mass spectrometry (MS), the process of measuring the mass of charged particles, is accomplished by integration of three instrumental components: an ion source that generates gas phase molecular ions; a mass analyzer that separates gas phase ions by their mass-to-charge ratio (m/z); and an ion detector that registers the number of molecular ions at each m/z value. The mass spectrometer is unmatched as a bio-analytical tool – allowing one to measure the mass of any given bio-molecule with extraordinary accuracy, resolution, sensitivity, speed and reproducibility [27]. In addition to accurate mass determination, today's state-of-the-art mass spectrometers enable the fragmentation of molecular ions by tandem MS (MS/MS). MS/MS can provide valuable information about the sequence and modification state of proteins across a wide range of masses from short peptides [28], to full-length intact proteins [29-32].

MS/MS fragmentation plays a particularly important role in identification and localization of combinatorial PTMs. Peptide fragmentation mechanisms commonly used in PTM analysis include collision-induced dissociation (CID) [33], electron-capture dissociation (ECD) [34] and electron-transfer dissociation (ETD) [35, 36]. A full discussion of the types and mechanisms of peptide and protein fragmentation are outside the scope of this work, but have been reviewed elsewhere [37]. Regardless of mechanism, the challenge in all cases is to create sufficient fragmentation to accurately identify the amino acid sequence of the precursor and to localize PTMs within that sequence. A formidable hurdle for combinatorial PTM research is the fact that certain types of PTMs are more amenable to localization using certain types of fragmentation [38, 39]. If a peptide or protein is sufficiently fragmented and the PTMs are retained, the location of the PTMs can be easily mapped by comparing theoretical and observed fragment masses in silico. However, labile PTMs like phosphorylation or glycosylation can be easily “lost” during CID fragmentation [40], making their site-specific localization difficult if not impossible. Moreover, fragmentation can be effected by peptide length and charge state. Thus, alternative MS/MS fragmentation mechanisms such as ECD and ETD are often necessary to improve the detection of combinatorial PTMs [41, 42]. Today, a variety of hybrid mass spectrometers enable multiple types of MS/MS – allowing one to tailor the fragmentation method dynamically during a single experiment [43, 44].

All aspects of protein MS have benefited greatly from direct coupling between the mass spectrometer and liquid chromatography (LC) instrumentation. Indeed, detection sensitivity, MS/MS fragmentation and interpretation often improve if proteins and peptides are adequately separated before MS [45]. Today, even the most basic LC–MS/MS experiments can result in the detection and quantitation of tens-of-thousands of distinct ions in the short span of a standard LC gradient [46, 47]. At the level of single proteins, this translates to greater sequence coverage and the potential for comprehensive PTM analysis. Moreover, multi-dimensional chromatographic techniques now enable high-throughput and comprehensive analysis of entire proteomes – allowing contextual, systems-level analyses of nearly all proteins in a cell [48]. LC–MS is now the predominant method for detecting and quantifying the dynamics of combinatorial PTM codes as can be seen from its extensive application to the study of the histone “code” [49-51], the tubulin “code” [52-54], and p53 PTMs [55].

Extrapolating biological significance from MS data requires software-based integration of MS instrument platforms, analysis strategies (e.g. top down versus bottom up), and online databases [56-58]. Ironically, MS data analysis in the proteomics age is only possible because of advances in genomics. Complete genome sequences for organisms across all kingdoms have enabled the prediction of theoretical protein and peptide masses that are essential for interpreting MS and MS/MS spectra. Identifying PTMs, in turn, relies on accurate mass measurement and the search for distinct mass signatures corresponding to any one of >400 different protein modifications. A PTM “code” may be defined by very small chemical or spatial differences such as acetylation versus tri-methylation of lysine (Δm/z < 2 Da), or by the presence of phosphorylation on juxtaposed serine residues (e.g. phosphorylation of serine 4 and 9 but not serine 6 and 7 within a single peptide). Accurately detecting these subtle differences in mass and PTM localization is paramount to the process of defining a PTM “code” and remains an important problem. Accurate data analysis not only relies on accurate mass and optimal MS/MS fragmentation, but also on the precise interpretation of MS and MS/MS data [59-61]. Bioinformatics software tools are readily available and offer many options to streamline the conversion of MS spectra into meaningful biological information. The purpose of these tools is to place statistical significance on the identity and location of PTMs based on the compilation of intact precursor and fragment masses in an MS/MS spectrum [57, 62]. This is not trivial because different proteins can sometimes produce similar peptides that produce identical MS/MS fragmentation patterns. Furthermore, increasing the complexity of the sample necessarily increases the stringency required to meet these challenges. Consequently, with the throughput of today's MS instrumentation, MS data analysis software is critical for assigning a value to MS data integrity [63-65].

6.2 Analytical strategies

Choosing an appropriate analytical strategy is paramount to the detection of PTMs and PTM codes, and is intimately coupled with instrumentation and data analysis. Whether deciphering PTM codes on histone tails, tubulin, RNA polymerase, or p53, the overarching goal in every case is to maximize protein “coverage” by monitoring every amino acid and PTM on the protein – a formidable “selective pressure” in the evolution of analytical methodologies [66]. Three complementary analytical strategies have proven successful: bottom-up, top-down and middle-down. In the bottom-up strategy, proteins of interest are first digested with site-specific proteases and the resulting peptides are analyzed by MS. In the top-down strategy, full-length intact proteins are analyzed by MS followed by iterations of MS/MS fragmentation that yield sequence and PTM information. In the middle-down strategy, proteins are broken down enzymatically into large peptide fragments that fall between the mass range of either bottom-up or top-down strategies. Each strategy has some demonstrated success in identifying combinatorial PTMs, but is also hindered by inherent challenges.

To date, bottom-up MS is the most commonly used approach in proteomics. Fundamentally, the bottom-up strategy may be construed as counter productive for the detection of PTM codes since the strategy relies on proteolytic digestion a priori, and therefore un-couples the combinatorial relationship between distant PTMs on whole proteins. However, the strategy is overwhelmingly popular because it can be applied with relative ease to detect peptides and interpret their fragmentation by MS/MS. In combination with 2-dimensional LC techniques, the bottom-up strategy has proven to be successful at detecting thousands of proteins present in a single sample [47]. Furthermore, comprehensive analysis of PTM-specific sub-populations of the proteome such as the “phosphoproteome”, the “ubiquitinome”, the “acetylome” and the “methylome” are facilitated by affinity-based PTM-enrichment methods [67-71]. Despite this seemingly unmatchable advantage, what the bottom-up strategy gains in breadth of analysis, it lacks in depth of analysis. Indeed, bottom-up proteome-wide LC–MS experiments often detect as few as 1–2 peptides per protein, which is sufficient for protein identification but not ideal for the detection of combinatorial PTM codes. Moreover, affinity enrichment of specific PTMs necessarily excludes non-enriched PTMs, which may be important in the definition of a PTM “code” [66]. In contrast, targeted bottom-up MS approaches, in which proteins of interest are enriched prior to MS analysis, are more effective for detecting PTM codes on single proteins. This is effectively demonstrated in the recent work of Garcia and Reinberg [72], who use bottom-up MS to quantify the co-enrichment of histone tail PTMs. By enriching nucleosomes with antibodies to distinct histone tail modifications, Voigt et al. not only detect intra-tail PTMs but also provide convincing evidence of the asymmetrical modification between the two ‘sister’ tails of a single nucleosome (discussed further in section 9). Targeted approaches such as this are not unique to histones and have also been widely used in the study of combinatorial PTMs on multi-protein complexes. Prevalent examples include the analysis of combinatorial phosphorylation patterns underlying the cell cycle-dependent activity of ubiquitin ligases [73-75], the centrosome [76] and the mitotic spindle assembly [77]. These, among many other examples, establish targeted bottom-up MS as an advantageous strategy for the detection of PTM codes.

Top-down MS is the predominant alternative to the bottom-up strategy. In top-down, MS and MS/MS are conducted directly on intact proteins (from 8E3 Da to 2E6 Da) without proteolytic digestion [78]. As a result, information on the combinatorial nature of PTMs on single proteins is retained. Top-down MS is generally performed on ion-trap spectrometers with ETD or ECD capability. Ion trapping permits iterative cycles of MS/MS (aka MSn) fragmentation and mass analysis, consequently allowing detection of intact protein or protein fragments. A typical top-down MS experiment begins with accurate mass determination (MS), followed by primary fragmentation and fragment mass analysis (MS2), then isolation of a single MS2 fragment that is broken down even further in a subsequent MS/MS reaction (MS3) – all within the duration of a single injection [32]. Thus, sources of variation in the intact protein mass (often corresponding to PTMs) can be rapidly mapped to specific domains or amino acids within the primary structure. The top-down strategy is particularly amenable to detection of combinatorial PTM codes on single proteins like tubulin [32], p53 [79], histones [80-82] and G-protein coupled receptors [3]. The top-down strategy exhibits several advantages over bottom-up approaches for the detection of PTM codes [29, 83]. First, determining intact protein mass provides comprehensive information on the global modification state of a protein. Second, unlike the bottom up strategy, the presence of PTMs tends to have less effect on the ionization/detection efficiency of intact proteins. Third, intact protein analysis is not restricted to the position of enzymatic cleavage sites in a protein. In contrast, analysis of intact proteins is generally less sensitive compared to the detection of peptides. Furthermore, successful detection of intact proteins is extremely sensitive to the protein solvent as well as the protein sequence, increasing variability in the success rate from protein to protein.

Middle-down MS is an emerging analytical strategy that focuses on the mass range between that of the top-down and bottom-up approaches (∼4000–10,000 Da). Middle down MS capitalizes on the relative ease and reproducibility of detecting smaller proteins without losing too much “whole-protein” context that is afforded by top-down MS [78]. Proteins are typically digested to a limited extent or with alternative enzymes that have limited recognition sites in a target protein. The major challenge in middle-down MS is the precise and reproducible control of limited digestion or in finding compatible enzymes that have few recognition sites in the target protein. A recent breakthrough in the Kelleher lab may turn out to bring middle-down MS to the forefront of PTM “code” research [84]. In their recent report, Wu et al. uses a novel protease from Escherichia coli (OmpT) that cleaves proteins at recognition sites with two consecutive basic residues (e.g. R–R, K–K, R–K, or K–R). The relative frequency of the OmpT recognition sequence is low compared to traditional proteases – yielding fewer peptides of much larger size that are ideal for middle-down MS analysis. The disadvantage may lie in the fact that lysine and arginine are common targets of post-translational modification, which would effectively prevent cleavage by OmpT. Regardless, enzymes like OmpT may catapult middle-down MS into the throughput equivalence of current bottom-up strategies for the detection of combinatorial PTMs and PTM codes.

6.3 Quantitation

Simply detecting a group of PTMs reveals little information about their combinatorial function. Indeed, quantifying the dynamics of combinatorial PTMs is essential to defining a relationship between PTM codes and biological outcome [49]. Mass spectrometry is not inherently quantitative. Thus, multiple peptide and PTM quantitation strategies have emerged including stable isotope labeling (SIL), and label-free methods such as selected reaction monitoring (SRM), among others reviewed elsewhere [85, 86]. SIL capitalizes on the high resolution of mass spectrometers by using heavy isotope mass tags to distinguish the relative concentrations of a mixture of identical peptides. Importantly stable isotope labeling does not alter the amino acid composition nor the modification state of peptides. As a result, MS signals from isotope labeled and unlabeled peptides can be directly compared and quantified from the mass spectrum. Two predominant methods of stable isotope labeling have been established: incorporation of tags in vivo using stable isotope labeling with amino acids in cell culture (SILAC) [87], or covalent attachment of tags in vitro using iTRAQ or alternative chemical tags that typically modify primary amine or free thiol reactive groups [88]. Many label-free approaches to MS quantitation are also gaining prominence and are often more economical than SIL. SRM in particular, provides an instrumental approach to quantitation that avoids the need for additional sample processing [89, 90]. In a more diagnostic approach, stable isotope-labeled peptide or protein standards can be used to repetitively quantify pre-defined or known groups of modifications, such as would be found in a discrete signaling pathway [91-93].

Despite the rapid co-evolution of MS instrumentation, data analysis, analytical and quantitative strategy, no one combination of MS methods yields all the answers. Indeed, MS-based PTM detection might best be thought of like a Venn diagram in which each combination of techniques covers both overlapping and unique discovery space. Furthermore, some of the strategies described above (especially top-down and middle-down) have not yet evolved to a state in which non-experts can access and/or easily apply the methods. In fact, financial or scientific barriers often hinder connections between MS technology and meaningful biological research. Thus, the development of unifying MS approaches as well as improving their accessibility and ease of use for biologists represents a major challenge for the future study of combinatorial PTMs and the codes that they comprise.

7 Assessing combinatorial PTMs using peptide libraries

Investigation of the biochemical function of PTMs has been greatly advanced with the use of synthetic chemistry, where types and locations of PTMs can be precisely defined. Studies of combinatorial PTMs have therefore been greatly facilitated by the creation of peptide libraries, which incorporate any desired combination of modified amino acid residues during synthesis. Many modified amino acid derivatives that are compatible with peptide synthesis are commercially available (such as phosphoserine or methyllysine derivatives). Furthermore, non-natural chemical analogs enable the biochemical and biophysical examination of PTMs which are either transient or unstable,such as the use of isosteres to investigate cistrans isomerization of proline [94]). Thus, through solid-phase peptide synthesis (SPPS), large libraries of peptides containing PTMs at specific locations can be synthesized. Pairing organic synthesis with high-throughput detection approaches allows one to quickly, and simultaneously probe hundreds to thousands of interactions.

Studies with PTM-containing peptides have been bolstered by the development of surface-based microarrays. Peptide arrays involve the deposition of chemically synthesized peptides onto a medium (membrane or glass slide) in small spots, where the immobilized peptides are used in high-throughput parallel assay format to examine many aspects of protein function including binding and enzymatic activity. Two main types of peptide arrays dominate the field – in situ arrays, such as SPOT arrays, where peptides are directly synthesized on membranes, and spotted peptide arrays where pre-synthesized, purified peptides are deposited by a robotic microarrayer [95, 96]. Spotted peptide arrays use SPPS to create a defined quantity of a peptide affording excellent quality control over the peptides being tested. Furthermore, each peptide in the library can be synthesized on a scale large enough for hundreds of individual arrays yielding great reproducibility. Furthermore, as peptides are commonly immobilized through a prosthetic biotin group onto a streptavidin-coated surface, the display of peptides is largely uniform. In contrast to spotted peptide arrays, in situ arrays such as SPOT arrays, developed in the early 1990s, synthesize molecules directly on a cellulose membrane. SPOT methods can be created quickly – with no separate on-resin synthesis, cleavage steps, peptide purification, or immobilization steps [97]. This makes a greater diversity of arrayed peptides easier to achieve with SPOT arrays. However, SPOT arrays do not allow analytical analysis of each peptide once its synthesized, and as each array is essentially unique, there is often significant variability between experiments. Peptide microarrays have been used to great effect in studying the combinatorial effects of histone PTMs [13, 14, 98]. They have also been used to determine the specificity and activity of enzymes that add PTMs, such as kinases and methyltransferases [99, 100]. Advances in supports, such as the use of silicon wafers, suggest that the power of these technologies could be greatly expanded by integrating circuits and measuring interactions in real-time [100].

Peptide library-based approaches to combinatorial PTMs are not merely limited to array-based experiments. Solution-phase, “one bead one compound” libraries enable the simultaneous study of a large number of interactions. Briefly, peptides containing PTMs of interest remain bound to individual resin beads. In this method, very large numbers of randomly sequenced peptides can be created by splitting the total resin and coupling each of the split batches to different amino acids, then mixing the variously coupled resin beads back together in one pool to generate large, diverse peptide libraries [101]. Binding studies or enzymatic reactions are then carried out on the entirety of the bead pool, ensuring that every peptide in the library is sampled. The advantage of solution-phase libraries is the ability to create and probe peptide libraries with thousands of unique members. This method has been used with great success to measure the influence of individual amino acid positions on the binding of histone PTMs [102, 103].

Generally, peptide library approaches rapidly provide a wealth of information about how varying types of PTMs affect binding interactions or enzymatic activities. However these methods do not provide quantitative data. Therefore, initial results from high-throughput approaches are commonly paired with additional biophysical experiments aimed at further scrutinizing interactions detected on the arrays or bead libraries. Techniques such as surface plasmon resonance (SPR), calorimetry, or fluorescence polarization give detailed, quantitative information such as the binding constant for interactions of significance.

Since the description of the “histone code hypothesis” [2], many experiments have focused on determining how the pattern of histone PTMs regulate their function and modulate chromatin. Biophysical and structural studies are key to ultimately understanding the molecular nature of the interactions between protein readers and PTMs [104]. However, the vast number of potential histone modifications renders a library approach very effective towards initial investigation of histone-interacting proteins and antibodies. For example a library of over 100 histone peptides, containing up to seven PTMs on a single peptide, was tested against domains of histone interacting proteins and PTM-specific antibodies to determine specificity [13]. This study defined important problems with PTM-specific antibodies – notably that they are strongly influenced by neighboring PTMs and can recognize PTMs in a sequence non-specific manner. This has been reinforced by several similar studies [15, 105]. More recently, similar arrays were used to uncover an unusual combinatorial modification that controls the maintenance of DNA methylation by UHRF1, which must associate with a methylated lysine of histone H3 to carry out this activity [11]. A combinatorial aspect of this inquiry focused on the influence of a neighboring phospho-serine residue, which was found not to impact the recruitment of UHRF1 to chromatin but did evict other protein factors known to recognize the same methyllysine mark [11]. Using SPOT, Ruthenburg and colleagues demonstrated the impact of combinatorial histone modifications on the binding of different domains of a NURF chromatin remodeling complex subunit [106]. This study used arrays to examine whether the PHD finger and adjacent bromodomain of the NURF subunit acted in concert to bind histone tails. SPOT arrays have also effectively been used to examine the specificity of enzymes. For example, Rather et al. used SPOT arrays to determine the binding sequence for the G9a methyltransferase, which then allowed them to make different arrays of potential target peptides to uncover new non-histone substrates for G9a [107]. Array and bead based libraries have been utilized to define an unusual binding pattern for the plant homeodomain fingers of CHD5, which recognizes an unmodified H3 N-terminal tail, and is disrupted by several types of PTMs on the N-terminal tail [108]. In contrast, the “one bead one compound” approach was used to produce 800-member and 5000-member combinatorial libraries which demonstrated clear preferences among N-terminal PTMs on histone H3 and H4 for the binding of specific protein factors [102, 103].

The potential of peptide library based approaches toward the study of combinatorial PTMs will only continue to expand. The value and effectiveness of arrays and bead libraries in examining combinatorial (as opposed to single) PTMs of proteins has yet to be fully realized. The technology itself is also still evolving; the silicon chip support of the recently developed “Intel arrays” [100] provides an exciting new avenue of potential real-time reporting of data from peptide arrays. Furthermore, the pool of reagents for solid-phase peptide synthesis is continually expanding, which leads to a direct increase in the diversity possible within libraries. PTM-containing peptide libraries hold great promise toward the development of detection reagents such as antibodies or to study how small molecules selectively disrupt protein–protein interactions mediated by PTMs.

8 Importance of PTM-specific antibodies

We have primarily focused on methods for the de novo detection of combinatorial PTMs or techniques to decipher their roles on protein binding or catalytic activity. Not to be overlooked is the prominent role that PTM-specific antibodies play in identifying relative changes in PTM patterns both in cells and in vitro. In fact, PTM-specific antibodies have become so commonplace that antibodies for nearly any biologically-important modification are now commercially available. For example, more than 1000 PTM-specific antibodies are commercially available for just the four core histones alone. PTM codes will likely have an impact on the utility, reliability, and specificity of these antibodies. Thus steps should be taken to develop better tools for antibody verification, as well as new diagnostic tools, which can distinguish or identify important instances where multiple PTMs may be coordinately regulating a single protein.

Antibodies for the study of combinatorial modifications were amongst the first PTM-specific antibodies developed. For example, antibodies were critical to identifying hyperphosphorylated forms of RNA polymerase II [109]. Similarly, Allis and coworkers used antibodies against polyacetylated histone H4 to decipher the role of histone acetylation in transcriptional regulation [110]. Today these poly-specific antibodies are still used to identify heterogeneous biomarkers such as hyperphosphorylated tau or activated (hyperphosphorylated) p53, but these tools are being slowly replaced by other, more-specific PTM-directed detection reagents.

In the early 1980s researchers began to develop antibodies to specifically identify modified forms of proteins [111]. Antibodies have high specificity for their antigen, thus PTM-specific antibodies are widely used to detect specific modifications on proteins by Western blots or immunohistochemical methods. Because of their high affinity, PTM-specific antibodies have also been widely used to immuno-enrich/deplete for modified proteins. This is often a preliminary step to mass spectrometry analysis when trying to isolate a particular protein or protein variant. PTM-specific antibodies have also played a central role in our understanding how histone PTMs mark regions of the genome through the use of chromatin immunoprecipitation (ChIP). Briefly protein–DNA interactions are stabilized by crosslinking with formaldehyde, chromatin is sheared into small pieces to facilitate analysis, and samples are immune-enriched using an antibody raised against a protein (or PTM) of interest. Following enrichment, the crosslinks are reversed to free the DNA, which is then quantified to measure the relative amount of the protein associated with different regions of the genome. Through ChIP, researchers were able to identify histone PTMs associated with promoter regions, silenced regions of the genome, and even different regions within transcribing genes [112]. The advent of DNA microarray and next-generation sequencing technologies made it possible to map the relative levels of dozens of histone PTMs to every region of multiple genomes.

Several key papers in the past few years have pointed out several important shortcomings of PTM-specific antibodies. Fuchs et al. showed using peptide microarrays that antibodies directed against methyllysine on histones were generally sequence specific but had difficulty distinguishing between different levels of methylation (mono-, di-, and trimethyl) [13]. This has considerable impact on studies of histone methylation. For example, di- and trimethylation at both histone H3 Lys4 and Lys36 are thought to have opposing roles in transcriptional regulation in yeast (dimethylation is thought to be repressive where trimethylation is activating) [113, 114]. Thus, cross-reactivity of antibodies may obfuscate meaningful changes in histone methylation in cells, or at a particular locus. This is not only true for histone PTMs but for other important signaling events as well [115]. Several studies have now shown that acetyllysine antibodies are generally promiscuous – recognizing a variety of acetyllysine-containing peptides [13, 14, 16]. Rothbart et al. took this a step further demonstrating that this promiscuity occurs within cells making ChIP analysis of single acetylation marks on histone H4 likely impossible [16]. A large survey conducted by the ModEncode consortium tested a number of commercially-available histone antibodies and noted the failure of many in at least one common biochemical assay [15]. Highly-modified peptide arrays were also able to demonstrate that antibodies were highly influenced by neighboring PTMs. For example, recognition of phosphorylation at Histone H3 Ser10 (H3S10) was largely blocked by methylation or acetylation at neighboring Lys9 for an antibody raised to recognize H3S10 phosphorylation. Thus in a cell population, instances where these marks coexist would be normally overlooked. However, H3K9 trimethylation and H3S10 phosphorylation do coexist in cells [10] and its relevance for the targeted recruitment of the DNA-methylation-associated protein UHRF1 [11]. Lastly, as most PTM-specific antibodies are polyclonal, they are an exhaustible resource and different lots show distinct profiles [13, 15, 105]. This often makes finding the proper antibody for an experiment a laborious task. All these factors together suggest a strong need for improved PTM-specific antibodies or the development of alternative detection reagents.

The next few years should bring an even greater variety of detection technologies with a push toward monoclonal antibodies that recognize biologically important PTMs and the evolution of new, non-immunoglobulin-based scaffolds. For example, many monoclonal antibodies are now available for the study of histones. These reagents solve a problem of renewability, demand, and reproducibly. However, a monoclonal antibody recognizes its target in only one way. Thus, monoclonal antibodies may be hypothesized to be more affected by neighboring PTMs than polyclonal antibodies, which bind to a peptide substrate more heterogeneously. Several protein scaffolds have been designed which recognize PTMs. New scaffolds have considerable advantages as they can be expressed recombinantly and purified in high quantity. However, these protein scaffolds do not generally bind with the same affinity afforded by antibodies, and they too will likely be strongly influenced by neighboring PTMs. In instances where multiple PTMs are known to be important for a biological process dual-specific antibodies or alternative scaffolds might have great success. Dual-modification antibodies that recognize histone tails have exquisite specificity for example [13]. Nonetheless, thorough studies of complex neighboring PTMs, may require numerous antibodies to recognize all the combinations of PTMs that may be relevant in vivo.

9 Moving beyond single protein codes

We have highlighted a few demonstrative examples of how PTMs function in concert to regulate biological function. Our current understanding stems from experiments with individual proteins, whereby direct relationships between PTMs and protein structure and function can be evaluated through traditional molecular biology and biochemical methods. These types of experiments have revealed many different ways in which PTMs can alter the structure and function of a protein by manifesting changes in conformation, recruitment of binding proteins or through direct effects on catalytic activity. Over the last few decades, increasingly sophisticated and robust MS-based methods have uncovered a plethora of PTM types. In parallel, a wide array of biochemical studies has identified a growing list of protein domains that specifically recognize some of these modifications. The examples discussed here exemplify a common theme in which PTMs on un-structured protein segments regulate protein interaction landscapes – the array of potential protein–protein interactions that can exist for a protein. Changes to this landscape directly affect the assembly of protein complexes resulting in differential functional output of the modified protein. This hypothesis has emerged in large part due to discoveries made in support of the histone code hypothesis [116], which states that differential PTM combinations coordinate the assembly of proteins that regulate gene transcription. Similar examples include the CTD of RNA polymerase II, tubulin and p53. In all cases, the overriding hypothesis is that combinations of PTMs that can be shown to create functionally distinct interaction landscapes can be defined as a “code”. Within this paradigm, biological processes are regulated in part by PTM codes that modulate protein interaction landscapes, and therefore function, of any given protein.

Emerging and longstanding evidence suggests that PTM codes can extend beyond single proteins. Protein interaction landscapes very likely emerge from the interplay between multiple layers of PTM coordination. That is to say that PTM “codes” can coordinate groups of proteins such as those found within a multi-protein complex (Fig. 3 ). The hypothesis in this case is that the function of a multi-protein complex can be modulated by differential combinatorial codes that exist between members (subunits) of the complex. A notable example is the observation that intra-nucleosomal histone modifications can influence eachother. For example, H2B ubiquitination at Lys123 can influence the methyltransferases that modify Lys4 and Lys79 on Histone H3 [117]. More recently, Ruthenberg et al. demonstrate that methylation on Histone H3 and acetylation on Histone H4 at Lys16 both were responsible for the proper association of the tandem PHD–Bromodomain of BPTF [106]. In addition, experiments conducted by the Reinberg and Garcia labs (described in section 5) provide an excellent example of how we might begin to think about how PTM codes can be deciphered for multi-protein complexes [72]. Indeed, their work demonstrates that PTM codes extend beyond single proteins (like histone tails) and towards a combinatorial interaction between PTMs across multiple proteins or subunits. Thus, in addition to the histone code, one might also consider the possibility of a nucleosome code where combinatorial PTMs, both in cis and in trans, contribute to the recruitment of protein factors and changes in chromatin structure. Whereas the histone code dictates protein assembly onto a single histone tail, a nucleosome code might dictate the organization of histones with respect to one another inside a mono-nucleosome. The concept that combinatorial PTMs modulate function of multi-protein complexes is not restricted to histones [118]. For instance, dynamic phosphorylation of multi-subunit ubiquitin ligases such as the anaphase-promoting complex is essential for cell cycle-regulated substrate specificity [74, 75, 119]. Deciphering PTM codes in these cases has proven difficult, due in part to the significant challenge of integrating both detection and deciphering approaches into one study. However, if technological improvements can address this challenge, dynamic signaling complexes may represent an ideal model for the study PTM codes.

figure image
PTM codes can exist at varying levels of biological complexity. Two distinct outcomes (State 1 and 2) can be defined for simple linear PTM codes (top), a protein complex (middle), or a functional network consisting of several proteins acting coordinately (bottom). In all cases, differing patterns of PTMs on one or more proteins give rise to discrete biological outcomes.

Yet another layer of PTM regulation that extends beyond protein complexes has been discovered for histone code “readers” – proteins whose assembly on histone tails is dynamically regulated by combinatorial PTMs. Recent efforts by the Cairns lab has shown that PTMs on the chromatin remodeling complex, RSC, can impact nucleosome structure and function [120]. RSC specifically recognizes histone H3K14 acetylation through tandem bromodomains, and is essential for mitotic growth in yeast. Acetylation of K25 on the RSC subunit, Rsc4, results in auto-recognition by RSC through one of its bromodomains, leading to competitive inhibition of RSC/histone association. Thus, just as PTMs on the tails of histone H3 can affect nucleosome structure and function, so can PTMs that appear on histone-associated protein readers such as RSC.

10 Perspectives and future challenges

Our understanding of how concerted PTMs coordinate biological function is in its infancy. We are beginning to understand how combinations of PTMs can affect structure and function of single proteins through experiments with histones and other examples described in this review. However, the story is far from over. Emerging evidence suggests that PTM “codes” (in as much as a “code” or combination of PTMs is defined by its necessity in a functional process) extend beyond the level of a single-protein. We have put forth cases in which concerted PTMs coordinate members of a protein complex (e.g. histones within mono-nucleosomes and multi-subunit ubiquitin ligases) as well as members of a concerted function (e.g. histone code readers like RSC). What is learned from simple systems will hopefully reveal fundamental concepts underlying PTM-based coordination of biological function on a larger scale. The major question within this assumption is whether sites of protein modification, which in isolation are not commonly conserved evolutionarily, might contribute to a functional PTM network that is well conserved. Recent studies in computational systems biology have begun to address these assumptions by evaluating the co-evolution of PTMs based on co-occurrence of modification sites across multiple different eukaryotes [121]. The computational results of this work suggest the existence of a global network of co-evolving PTMs that impinge on multiple functional states of many proteins. In fact, PTMs and their maintenance through natural selection may best be understood in the context of functional networks in which they participate rather than in isolation.

With the seemingly boundless amounts of data emerging from MS-based PTM studies, deciphering functional PTM codes will likely benefit from systems-level analyses and network theory, which become increasingly useful with further integration of proteomic technology and combinatorial biophysical assays. Indeed, many efforts have already begun to utilize quantitative proteomic approaches to decipher functional PTM networks in the context of human disease and therapy [122-125]. Ideally, there will come a day when one can evaluate any PTM on any protein with regards to its contribution to any given PTM network and biological process.

Acknowledgement

The authors would like to thank S.B. Rothbart for meaningful discussions toward the preparation of this manuscript.