Structural insights into N‐terminal to C‐terminal interactions and implications for thermostability of a (β/α)8‐triosephosphate isomerase barrel enzyme

Although several factors have been suggested to contribute to thermostability, the stabilization strategies used by proteins are still enigmatic. Studies on a recombinant xylanase from Bacilllus sp. NG‐27 (RBSX), which has the ubiquitous (β/α)8‐triosephosphate isomerase barrel fold, showed that just a single mutation, V1L, although not located in any secondary structural element, markedly enhanced the stability from 70 °C to 75 °C without loss of catalytic activity. Conversely, the V1A mutation at the same position decreased the stability of the enzyme from 70 °C to 68 °C. To gain structural insights into how a single extreme N‐terminus mutation can markedly influence the thermostability of the enzyme, we determined the crystal structure of RBSX and the two mutants. On the basis of computational analysis of their crystal structures, including residue interaction networks, we established a link between N‐terminal to C‐terminal contacts and RBSX thermostability. Our study reveals that augmenting N‐terminal to C‐terminal noncovalent interactions is associated with enhancement of the stability of the enzyme. In addition, we discuss several lines of evidence supporting a connection between N‐terminal to C‐terminal noncovalent interactions and protein stability in different proteins. We propose that the strategy of mutations at the termini could be exploited with a view to modulate stability without compromising enzymatic activity, or in general, protein function in diverse folds where N and C termini are in close proximity.

Although several factors have been suggested to contribute to thermostability, the stabilization strategies used by proteins are still enigmatic. Studies on a recombinant xylanase from Bacilllus sp. NG-27 (RBSX), which has the ubiquitous (b/a) 8 -triosephosphate isomerase barrel fold, showed that just a single mutation, V1L, although not located in any secondary structural element, markedly enhanced the stability from 70°C to 75°C without loss of catalytic activity. Conversely, the V1A mutation at the same position decreased the stability of the enzyme from 70°C to 68°C. To gain structural insights into how a single extreme N-terminus mutation can markedly influence the thermostability of the enzyme, we determined the crystal structure of RBSX and the two mutants. On the basis of computational analysis of their crystal structures, including residue interaction networks, we established a link between N-terminal to C-terminal contacts and RBSX thermostability. Our study reveals that augmenting N-terminal to C-terminal noncovalent interactions is associated with enhancement of the stability of the enzyme. In addition, we discuss several lines of evidence supporting a connection between N-terminal to C-terminal noncovalent interactions and protein stability in different proteins. We propose that the strategy of mutations at the termini could be exploited with a view to modulate stability without compromising enzymatic activity, or in general, protein function in diverse folds where N and C termini are in close proximity.

Database
The coordinates of RBSX, V1A and V1L have been deposited in the PDB database under the accession numbers 4QCE, 4QCF, and 4QDM, respectively

Introduction
Elucidating the molecular basis of protein stability at high temperature continues to attract and fascinate researchers over a broad range of disciplines, and remains a challenge. A number of approaches have been employed to develop stable proteins for biotechnological applications [1,2]. Site-directed mutagenesis together with the comparative analysis of mutant structures is a powerful approach to provide valuable insights into the structural features that govern protein thermostability. Locating the target site of mutagenesis for stability improvement can reduce the screening effort required to find stable mutant(s) by orders of magnitude as compared with random directed-evolution methods [3].
Enzyme stability and activity often appear to trade off at the level of individual mutations. For example, whereas flexibility is required for the catalytic activity of most enzymes, higher thermostability necessitates an increase in the rigidity of the structure. As a result, mutants with increased stability often lose catalytic efficiency [4]. In addition, engineering protein thermostability at the expense of losing enzymatic activity is not biotechnologically desirable. Improving the stability of an already stable enzyme could be advantageous for high-temperature industrial applications. Also, even a modest increase in stability could lead to a > 10-fold longer lifetime [5,6].
Xylanases (EC 3.2.1.8) are glycosyl hydrolases that catalyze the hydrolysis of internal b-1,4-glycosidic bonds of xylan backbones, and have potential economic and environmentally friendly applications in the paper pulp, food, animal feed, and detergent industries, and bio-ethanol and bio-energy production systems [7][8][9][10]. A xylanase from Bacilllus sp. NG-27 (BSX), which is an extracellular endoxylanase, belonging to glycosyl hydrolase family 10 (GH10), shows optimum activity at a temperature of 70°C and a pH 8.5 [11]. It has a (b/a) 8 -triosephosphate isomerase (TIM) barrel fold, which has been studied concerning its function, structural properties, design, and evolution [12][13][14][15]. BSX, apart from thermoalkalophilic features, shows resistance to SDS denaturation and protease K degradation [16]. Hence, BSX serves as an important model system with which to achieve a fundamental understanding of the structure-stability-evolution relationships of the ubiquitous TIM barrel fold.
Earlier studies of recombinant BSX (RBSX) showed that just a single extreme N-terminus mutation, V1L, although not located in any secondary structural element (SSE) (helices or b-sheets), markedly enhanced RBSX stability from 70°C to 75°C without loss of catalytic activity [16]. In contrast, the V1A mutation at the same position decreased the stability of the enzyme from 70°C to 68°C. However, structural details were not available at that time, precluding any structure-based rationalization of stability changes resulting from a single mutation.
Because of the ring-like architecture of the (b/a) 8 -TIM barrel (Fig. 1), the N and C termini come close together in 3D space, enabling several contacts to be formed between the termini. It has been suggested, through mutation studies, that unfolding of the N terminus is one of the first and most critical steps of BSX denaturation at high temperature [16]. In general, terminal residues are exposed to solvent with a low number of nearest neighbor-residues, and are hence considered to have little influence on thermostability [17]. Nevertheless, certain experimental and The structure of RBSX (PDB ID: 4QCE) is shown in cartoon representation (saladbowl view). Because of the TIM barrel fold, the N terminus (blue region) comes close to the C terminus (red region), and their proximity is implicated in stability enhancement. The mutation is located away from the active site (sphere and stick in firebrick) residues (Glu149 and Glu259). The inset shows the view of RBSX looking down the TIM barrel axis, depicting a ring-like architecture.
computational studies have suggested that protein termini play a role in structural stability and function [18][19][20][21][22]. An in silico analysis of a set of two-state folding proteins that took into account the interactions between the terminal SSEs revealed the presence of an N-C motif (N-terminal to C-terminal contacts), and suggested its possible role in initial protein folding, native state stability, and final turnover [23].
Here, we investigated how a seemingly unimportant extreme N-terminus single mutation, although not located in any SSE, affects the structure and interactions, resulting in a change in the thermal stability of the xylanase. To gain structural insights into how the mutation influences the thermostability of the enzyme, we determined the crystal structure of RBSX and the two mutants (V1L and V1A). On the basis of computational analysis, including residue interaction networks (RINs) of the molecular structures of RBSX and its mutants, we demonstrate that augmentation of N-terminal to C-terminal interactions is associated with enhancement of stability of the enzyme RBSX. Perhaps for the first time, we provide a network perspective of the N-terminal to C-terminal interactions, and show that the cumulative effect of a network of noncovalent interactions, which include N-terminal to C-terminal interactions, modulates the thermal stability of the enzyme. The extreme N-terminus mutation is able to induce changes in the structure consistent with protein stabilization, as assessed through contacts and parameters of RINs. Furthermore, we discuss specific examples from different protein families that provide experimental evidence for the connection between protein stability and N-terminal and C-terminal noncovalent interactions. Our observations suggest that mutagenesis at the termini could be exploited to enhance stability without compromising enzymatic activity or function. This may be effective especially in situations where the N and C termini come close together in 3D space, thereby enabling long-range interactions (interactions between residues distantly separated in the primary sequence), as demonstrated here with the example of the TIM barrel fold xylanase, and the same could be extrapolated to diverse folds in both biotechnologically and therapeutically important proteins.

Results
Overall structure of RBSX and comparison with related GH10 xylanase structures The 3D structure of RBSX adopts the topology of a classical (b/a) 8 -barrel fold (TIM barrel), retaining the characteristic 'salad-bowl' shape observed in GH10 xylanases (Fig. 1). It comprises 12 a-helices, nine bstrands, and five 3 10 -helices (residues 5-8, 26-30, 94-98, 106-109, and 302-304), as assigned by DSSP [24]. The barrel-forming secondary structures consisting of eight major parallel b-strands lie in the middle, surrounded by eight a-helices (Fig. 1). The C-terminal ends of the b-strands of the b-barrel form a long and open cleft on the protein surface. The active site cleft is exposed to the solvent, and corresponds to the catalytic/activity face of the enzyme [25].
RBSX with an additional Met resulting from the start codon belongs to GH10. RBSX differs from many other xylanases in being operational under alkaline condition (pH 8.5) and at elevated temperature (70°C). Several GH10 xylanase structures have been solved to date (http://www.cazy.org/GH10.html). However, only one report is available on an alkalineactive (pH 10) thermostable (70°C) GH10 xylanase structure [Protein Data Bank (PDB) ID: 2UWF] produced by an alkalophilic organism, Bacillus halodurans S7 [26]. B. halodurans S7 xylanase shows 78% sequence identity with RBSX. Structures are available for a few other GH10 xylanases, which are produced by nonalkalophilic organisms and do not show significant operational stability at higher pH [11,26]. The desirable properties of xylanases in the paper and pulp industries are stability and activity at high temperature and alkaline pH [27]. It is reported that alkaline xylanases have more surface-accessible acidic residues than their nonalkaline counterparts [11,26]. The adaptation to extreme conditions such as alkalophilicity can take place through insertion or exchanges of short sequences without the requirement for gradual changes over the entire chain [11].

Comparative analysis of structures of the enzymes with stabilizing and destabilizing mutations
In an earlier study from our group, different extreme N-terminus mutants of RBSX were generated in which the first residue, Val1, was replaced with various amino acids, and each mutant was subsequently analyzed for its thermal stability at high temperatures. The thermal stabilities of RBSX and the mutant proteins were determined by CD measurements at a range of temperatures [16]. RBSX and its mutants were found to unfold in an irreversible manner, and the apparent melting temperature (T m ) was calculated for all protein samples with a constant temperature slope of 60°CÁh À1 . It was observed that the V1G and V1A mutations decreased the stability of the protein by 12°C and 2°C, respectively, whereas the V1F and V1D mutations did not significantly change its thermostability, as compared with RBSX [16]. On the other hand, the V1L mutation markedly enhanced the thermostability of RBSX from 70°C to 75°C without compromising its catalytic activity, and resulted in higher cooperativity in the thermal unfolding transition [16]. However, at present, structural details are not available for any mutants except the V1A and V1L mutants. A brief summary of crystallization, data collection, structure solutions and refinement statistics of RBSX and the V1A and V1L mutants is given in Table 1 and Doc. S1. Comparison of the molecular structures of the V1A and V1L mutants with that of RBSX showed no significant changes in the overall 3D structure of proteins, despite the difference in their thermal stabilities. The overall C a rmsd between RBSX and the V1L mutant is 0.393 A, whereas that between RBSX and the V1A mutant is 0.265 A. The question therefore arises of what may be the mechanism of thermal stabilization/destabilization, considering that there is only a minimal change in their overall 3D structures. The mutation is located on an extended loop at the extreme N-terminal region, and makes no dominant interaction exclusively by this first residue (Leu1 in the V1L mutant and Val1 in RBSX). This observation raises the possibility of effects of a noncovalent interaction network that transmits changes near and far from the site of mutation, and changes the overall stability of RBSX. In a folded protein, a network of interactions brings the distal residues in a sequence space to close proximity in 3D space. This extensive network of interactions gives proteins structural flexibility, integrity, and thermal stability [28,29]. Therefore, we focused on residue contacts and RINs to investigate, from a network perspective, this cumulative nature of thermal stabilization-destabilization and to identify the changes in both local and nonlocal interactions between the structures. The aim of the present study was not to obtain the most stable structure by carrying out all possible mutations at the extreme N terminus, but to gain structural insights into the modulation of stability caused by mutations in the terminal region.

Changes in contacts in the vicinity of the mutation
Although there was an increase of~5°C in the thermostability resulting from a single Leu mutation, analysis of the interactions undergone by the mutated residue and recombinant protein showed that Leu1 in the V1L mutant structure undergoes similar types of interaction (van der Waals) as Val1 in the RBSX structure. However, because of its greater bulk and better conformational accessibility than Val1 in RBSX, the Leu side chain in the V1L mutant structure forms more van der Waals contacts with side chain atoms of Arg344 (C-terminal residue) than the Ala side chain in the V1A mutant structure ( Fig. 2; Table S1). Furthermore, these additional cohesive contacts made by Leu1 influenced the relative decrease in the solvent-accessible surface area of Arg344 by 19.6% in the V1L mutant in comparison with Arg344 in RBSX. The decrease in the solvent-accessible surface area of Arg344 is even more pronounced (29.8%) when the structures of the V1L and V1A mutants are compared. Evaluation of atomic packing according to the small-probe contact dot surface between the mutated residue and Arg344 with the RINERATOR module [30] showed a higher noncovalent interaction score (I s ) between Leu1 and Arg344 (2.66) for the V1L mutant structure than between Val1 and Arg344 (0.063) for the RBSX structure, and between Ala1 and Arg344 (0.00) for the V1A mutant structure. Here, the noncovalent I s for the interacting residues is suggested to be proportional to the strength of the interaction (the higher the score, stronger the interactions of the connecting residues), which is the weighted sum of the nonoverlapping van der Waals contacts, the volume of hydrogen bonds, and atomic overlaps (clashes) [30,31]. As the mutated residue belongs to an extended loop in the extreme N terminus of the protein, and its interacting partner Arg344 belongs to the C-terminal region of the protein structure, this result draws attention to the role of the N-terminal and C-terminal regions in contributing to the overall stability through mutual interactions.
As discussed, side chain atoms of Leu1 are in close proximity to side chain atoms of Arg344 in the folded 3D structure. Despite the introduction of such a bulky residue (Leu1), we observed that the distance between the C b atoms of Leu1 and Arg344 in the V1L mutant structure is smaller (6.48 A) than that between the corresponding residues in the RBSX structure (7.12 A) and the V1A mutant structure (7.21 A). Thus, the distance between C b atoms belonging to interacting residues is shorter in the more stable mutant than in the less stable mutant, suggesting that the distances between C b atoms could be used to gain information about the mutation-induced structural change. Hence, we evaluated RWC b CO [32] (Doc. S1) to investigate the influence of long-range interactions in the protein structures. The analysis of RWC b CO for all three structures (V1L, RBSX, and V1A) showed that higher values of RWC b CO belong to the termini of the protein (Fig. S1). This is because of the occurrence of a number of contacts between the N-terminal and C-terminal regions, as they are in close proximity in the 3D structure. We then computed ΔRWC b CO (RWC b CO L -RWC b CO A ), the difference in RWC b CO value between the more stable mutant (V1L) and the less stable mutant (V1A), to compare the changes in the protein structure resulting from mutation, and plotted it along the polypeptide chain. It was observed that there is a significant increase in ΔRWC b CO in the Nterminal and C-terminal regions, although small differences are present in other parts of the protein structure (Fig. 3). This is one piece of evidence that both terminal regions are substantially affected by the mutation in comparison with the other parts of the protein structure, and that it could be playing an important role in contributing to the overall stability through the enhancement of long-range interactions.

Analysis of noncovalent interactions
To assess the contribution of chain termini to RWC b CO values and their relationship with mutant stability, we studied the differences in atom-atom contacts at the terminal regions in detail. We used a distance cut-off of 5 A (the higher cut-off for attractive London-van der Waals forces [33]) to capture only effective physical contacts within and between the terminal atoms. Examination of atom-atom contacts between an N-terminal segment [residues 1-25; up to the second SSE from the N-terminal end] and a C-terminal segment (residues 319-354; up to the second SSE from the C-terminal end) for each structure revealed that the V1L mutant has a higher value of normalized atom-atom contacts between termini than the V1A mutant (Table S2; Fig. S2). We observed that there is an increase of~11.3% in the contacts between the N-terminal segment and C-terminal segment in the V1L mutant as compared with the V1A mutant. Also, there is an increase of~3.4% in the contacts within the N-terminal segment in the V1L mutant as compared with the V1A mutant. Thus, the increase in contacts is more pronounced in the N-C-terminal region than within the N-terminal region, indicating a 'cosying up' of the terminal regions i.e. enhanced mutual interactions between the termini, in the more stable mutant. Furthermore, we observed that the noncovalent I s between the N-terminal segment and C-terminal segment as assessed with the RINERATOR module [30] is higher in the V1L mutant structure than in the V1A mutant structure (103.5/73.5). These observations collectively suggest that the substantial increase in thermostability of the V1L mutant could be a result of better relative contributions of various types of interaction, particularly those between terminal regions.
A network perspective of protein stability In protein structures, there are a variety of weak and strong noncovalent interactions that integrate different parts of the structure. It is the interplay of these interactions that provides structural stability. We analyzed this by using a network representation of protein structure by generating a RIN, which considers all noncovalent interactions between pairs of interacting residues (Experimental procedures). We also decomposed the network into different subnetworks based on the strength of intercation (noncovalent I s ) between interacting residues, and analyzed their global topology and network parameters. We obtained well-known network parameters, such as total number of edges or links (E), the edge/node ratio (E/N, where N is the total number of residues in the protein structure), and average number of nearest neighbors (<n>), for the more stable mutant (V1L) and the less stable mutant (V1A) at different I s cut-off (I smin ) values (Table 2). A lower E/N ratio of a network indicates that most nodes are isolated and that the graph is largely disconnected. As the E/N ratio increases, connectivity between the nodes in the graph increases. On the other hand, a higher <n> value indicates higher average connectivity of a node in the network. We observed that the values of these network parameters are very similar to each other in all three structures. This is understandable, because all three structures have the same size (354 residues), and only one residue is mutated in their primary structures. However, it is relevant to compare their network parameters, as they show different thermostability scales. The analysis revealed that all three network parameters (E, E/N, and <n>) have higher values for the V1L mutant than for the V1A Fig. 3. Difference of RWC b CO between the more stable and less stable mutants: five-residue moving average of the difference in RWC b CO between more stable (V1L) and less stable (V1A) mutant structures. The dashed line corresponds to the value of DRWCbCO, which has a Z-score of 2 (Doc. S1).  [28]. We can infer that the observed difference in network parameters is a result of the combined effect of various subtle changes in interactions (hydrogen bonds, van der Waals, ion pairs, etc.) manifested throughout the structure because of a single mutation. The size of the largest cluster in the network analysis is often used to understand the nature and connectivity of the network [28,34]. Here, we compared the change in size of the largest cluster as a function of I smin for both mutant structures. We observed that, irrespective of the difference in thermostability scales, the normalized size of the largest strongly connected component (LSCC) (in terms of nodes or number of residues) gradually decreases with an increase in I smin in both mutants. The normalized size of the LSCC undergoes a sharp transition after a particular I smin cut-off, which begins around I smin = 1 and lies within a narrow range of I smin (1-2), with no major change towards the side of higher interaction cut-off (Fig. 4). A similar trend has been observed in other studies [28,34]. This transition in the LSCC is attributed to the loss of different noncovalent interactions in the networks as I smin increases, thus quickly generating a large number of small clusters. It is of note that the more stable mutant (V1L) not only has higher E values and higher E/N ratios for different I smin values than the V1A mutant, but also has larger size of the LSCC in their network (Fig. 4). The cooperative nature of different stabilizing interactions in the network seems to be positively influencing other interactions, as shown by the sizes of the LSCC at different I smin values in the more stable mutant structure. Our result is consistent with the previous findings that thermostable proteins have improved structural attributes, such as better connectivity and higher number of nodes, at different I smin values than their corresponding mesophilic homologs [28,34].

Comparison of RINs between the V1L and V1A mutants and the importance of terminal interactions
To explore the difference between the mutant structures, we constructed a comparison network of the LSCCs of the V1L and V1A mutants at I smin = 1 by using the RINALYZER plugin [30] of CYTOSCAPE [35]. In this comparison network, each node represents a pair of aligned residues, and an edge exists if there is a noncovalent interaction between the connecting nodes in either of the two compared RINs [31]. The combined comparison network of the V1L and V1A mutants generated on the basis of the superposition alignment of the corresponding 3D structures resulted in 535 identical edges that correspond to noncovalent interactions for both the V1L mutant and the V1A mutant. We found that the more stable mutant (V1L) has a considerably higher number of nonidentical noncovalent interactions, i.e. 192 edges, which correspond to 28 unique residues (nodes), than the less stable mutant (V1A), i.e. 125 edges, which correspond to 14 unique residues (nodes) respectively (Table S3). This comparison network provides further information about the location of unique residues that have the largest change in local residue interactions in the protein structure. It is notable that many of these unique residues are distributed in and around the terminal regions of the protein (Fig. 5; Table S3). We observed that~25% of these unique residues in the V1L mutant and~14% of unique residues in the V1A mutant correspond to the N-terminal region and the C-terminal region of protein structures, implying the importance of interactions involving terminal residues (within termini and/or between termini) that might have an impact on their stability. These differences in the numbers of unique noncovalent interactions (nonidentical edges) and residues (number of unique nodes) in the comparison network indicate that there is a perturbation in the RINs brought about by a single extreme N-terminus mutation.

Discussion
The present findings provide valuable insights into the role of direct noncovalent interactions between the N and C termini in protein stabilization. Figure 2 provides an example of such interactions that are enhanced in the more stable mutant. The direct N-terminal to C-terminal contacts in the V1L mutant involving Leu1 show a clear difference in the degree of packing interactions of the side chain atoms in comparison with Val1 in RBSX and Ala1 in the V1A mutant ( Fig. 2; Table S1). It appears that these additional interactions might play a role in tying down the extreme N terminus during thermal unfolding at high temperature. Furthermore, we observed an enhancement in the number of overall N-terminal to C-terminal direct contacts in the more stable mutant structure (V1L), whereas the absence of many N-terminal to Cterminal contacts could increase local unfolding of the peptide chain at these weak links, and result in a lower unfolding temperature for the V1A mutant ( Fig. S2; Table S2). Fraying of the terminal regions may make a protein susceptible to unfolding at high temperature. Hence, it may be advantageous if the terminal regions dock with each other and mutually stabilize, thereby reducing susceptibility to unfolding at high temperature [23]. However, one cannot neglect the interactions between the terminal residues and other parts of the protein structure, and their independent role in protein stabilization [18,19,22].
The substitution of residues with different physicochemical properties and their location in the tertiary structure can cause changes in residue-residue contacts [36]. In order for the structure to adapt to the substitution of the new side chain, there is often a rearrangement of native contacts by the neighboring residues. The extent of this rearrangement of residue contacts depends on how connected the region of substitution is with the rest of the structure. We observed a structural rearrangement of contacts throughout the structure, and more so within and between terminal regions. The cooperative nature of these stabilizing contacts indirectly or allosterically propagates to the other parts of the structure, and positively influences other interactions, as shown by network analysis, where the LSCC is larger for all I smin values in the more stable mutant structure (V1L) than in the less stable one (V1A) (Fig. 4). Thus, it is likely that the increased stability shown by the V1L mutant results from cumulative effects of small changes rather than solely the effect of interactions involving the substituent residue. This effect is reminiscent of the concept in economics of 'comedy of the commons' in property resources [37], applied here to protein stabilization, in which a cumulative effect of many contributions leads to a desired outcome, in this case protein stability. In addition, residues at long distances from each other in the primary structure play an important role in stability of the protein, as shown by the analysis of RWC b CO and RINs ( Fig. 3; Table S3). Obviously, N-terminal to C-terminal contacts are the longest-range interactions possible in terms of sequence separation in any given protein. These results suggest that the overall increase in long-range interactions (primarily through N-terminal to C-terminal contacts) in the V1L structure upon mutation is one of the primary sources of the increase in thermal stability. Our results are consistent with earlier findings that long-range interactions, connecting different parts of the protein structure, have a major role in folding and stabilizing the tertiary structure of the protein [34,38,39]. However, what is remarkable is that all of these structural changes are elicited by just a single mutation not located in any SSE, at the extreme N-terminus of the protein.
A few other studies have shown the role of interactions between termini in modulating the stability of proteins. In a GH10 xylanase from Aspergillus niger, deletion of terminal disordered residues reinforced the contacts between the N and C termini, providing additional compactness to the structure, and thereby increasing protein stability [20]. In a recent mutational analysis of a similar xylanase, Song et al. [22] suggested that strengthening the hydrophobic interactions within the N-terminal element and between the N and C termini is responsible for the improved stability of the enzyme. A study of chimeric proteins by exchanging the N or C terminus from a thermophilic TmxAcat xylanase and a hyperthermophilic TmxB family xylanase from Thermotoga maritima MSB8 showed that interactions between the N and C termini contribute significantly to thermostability [40]. In an earlier study, we provided experimental data showing that the interaction between the N-terminal and C-terminal regions formed by Phe4, Trp6 and Tyr343 (aromatic cluster) is an important determinant of RBSX stability and folding [41]. In addition, information gained from deep sequencing of the TEM-1 b-lactamase enzyme has revealed that several terminal residues are sensitive to substitutions, suggesting a possible role of these residues in enzyme stability, solubility, or catalytic activity [42]. However, crystal structures are unavailable either for the deletion mutants or for the substitution mutants, which could help in providing a structurebased rationalization of stability changes resulting from interactions between termini.
The importance of N-terminal to C-terminal contacts in protein stability has also been reported for other proteins. One such example comes from the homologous pairs of cold shock proteins from the mesophilic Bacillus subtilis (Bs-CspB, T m = 53.9°C) and the thermophilic Bacillus caldolyticus (Bc-Csp, T m = 76.9°C). Both Bs-CspB and Bc-Csp are small, monomeric proteins composed of 67 and 66 residues, respectively; they do not contain any disulfide linkages, and they differ in sequence by only 12 residues [43]. It was found that the additional stability of Bc-Csp largely originates from the contribution of Arg3 (N terminus) and Leu66 (C terminus). Furthermore, mutational analysis revealed that substitution of Arg3 by Glu (the equivalent residue in Bs-Csp) decreased the stability of Bc-Csp by~4 kJÁmol À1 . Structural analysis of Bc-Csp and the R3E mutant showed that van der Waals interactions between Arg3 and Leu66 in Bc-Csp became more extensive in comparison with Glu3 and Leu66 in the R3E mutant. The authors pointed out that the decreased number of overall Nterminal to C-terminal van der Waals interactions resulting from the shorter side chain of Glu is responsible for the loss of stability in Bc-Csp [43,44]. Interestingly, by analyzing the precomputed RIN from the RINdata web service (http://rinalyzer.de/rindata.php), we found that Arg3 in Bc-Csp (PDB ID: 1C9O) has a better noncovalent I s (19.64) than Glu3 (7.03) in the R3E mutant crystal structure (PDB ID: 1I5F). There have been other experimental reports of the effects of terminal mutations on the native state stability of proteins (Fig. 6). For example, a cavity-creating mutant at the C-terminal end (I96A, I s = 19.3/10.8) of barnase destabilized the structure by 3.52-4 kcalÁmol À1 [45], whereas mutation of another terminal residue (V2T; I s = 5.3/4.7) of ribonuclease from Streptomyces aureofaciens destabilized it by~0.9 kcalÁmol À1 [46]. We also observed that, in both mutant structures, there is a reduction in N-terminal to C-terminal interactions by  [31], and are proportional to the strength of interactions between connecting residues. For both proteins, the N terminus (blue) comes into close proximity to the C terminus (red) in the 3D structure. Here, PDB IDs 1BNI and 1RGG correspond to more stable proteins.
the mutated residues, indicating the possible importance of interactions between termini in modulating their stability (Fig. 6). Many proteins have their N and C termini in contact with each other, although no physical principle has been stated regarding why they do so. An in silico analysis on structures of single-domain proteins taking into account the interactions between the terminal SSEs showed the presence of N-C motifs (N-terminal to C-terminal contacts), and suggested a possible role in initial protein folding and stability [23]. One computational analysis that addressed the role of N-terminal to C-terminal coupling in the folding transition of single-domain protein from three different protein folds suggested that switching off the N-terminal to C-terminal interactions decreases folding cooperativity and substantially lowers the free energy barrier of the folding transition state [47]. Therefore, fortifying the interactions between the terminal regions might help to increase the initial activation energy barrier, resulting in enhanced resistance against global unfolding at higher temperature. The present study reiterates as well as expands on those findings, as the mutated residue that modulates (stabilizes/destabilizes) the stability of RBSX through reinforcement of N-terminal to C-terminal interactions belongs to the extreme N terminus of the protein, and is not part of any SSE. Furthermore, our study also provides a network perspective of the interactions involving terminal residues, showing that changes are not restricted to the terminal regions, and propagate to other parts of the protein structure ( Fig. 5; Table S3).
Our work suggests that augmenting N-terminal to C-terminal noncovalent interactions is associated with an enhancement of protein stability. Such stabilization presumably protects against unfolding of an already folded protein, and may aid the folding process [23]. Although it is clearly possible to stabilize proteins with other mechanisms/factors, as previously reported [48,49], we have demonstrated that proteins can be stabilized without compromising their biological functions through optimization of N-terminal to C-terminal noncovalent interactions. Because of the ring-like architecture of the (b/a) 8 -TIM barrel (Fig. 1), the N and C termini come into close proximity in 3D space, providing opportunities for stabilization through mutual interactions. This apparent stabilization through N-terminal to C-terminal interactions could be seen to be implicated in the structures of TIM (EC 5.3.1.1) and the NAD(P)-binding Rossmann-fold domain protein glyceraldehyde-3-phosphate dehydrogenase (an example of a non-TIM barrel fold protein) isolated from different organisms when compared across monomers, despite their similar 3D structures (Doc. S1; Fig. S3; Table S4). In our study, based on the comparative analysis of crystal structures, we explicitly show that the more stable mutant is associated with better N-terminal to C-terminal interactions. When taken together, these observations support the connection between N-terminal to C-terminal noncovalent interactions and protein stability.

Conclusions
Sequence and structure-based bioinformatics analyses have delineated a methodology to identify target positions for mutagenesis that would enhance protein thermostability. In this context, our study suggests that protein termini constitute one of the regions of interest for designing an effective mutagenesis library (reduced size) in a targeted way, with a view to improving protein stability, thus adding to the repertoire of approaches for increasing the thermal stability of proteins and giving our results wider applicability. It seems interesting that a number of important folds and superfolds [50] have their N and C termini in contact with each other, thus offering opportunities for modulating stability through mutual interactions. 'Making the two ends meet' seems to be a feature common to all of these proteins. It is tempting to speculate that proteins might have evolved the N-terminal and C-terminal interactions as one of the strategies to stabilize their structures in a fold-specific manner. Because, in diverse folds/proteins, the terminal regions are in close proximity, it may be suggested that they could be considered as candidates for modulating stability by mutation focusing on terminal regions, in contrast to the general belief that terminal residues are very flexible and might have less effect on stability [17]. It is important to investigate more proteins from diverse organisms to decipher other biologically significant aspects of N-terminal to C-terminal contacts. Eventually, such studies should help in understanding the evolution and utilization of interactions between termini in the protein universe, and in developing effective protein engineering strategies.

RINs
Protein structure can be represented as a RIN between residues [28,29]. In this analysis, we used the RINERATOR module [30] to generate a RIN from the 3D structure of each protein. A RIN consists of nodes that represent residues, and are connected by edges or links that correspond to noncovalent interaction between the nodes (residues). RINERATOR creates an undirected weighted network with multiple edges, in which the edges are defined on the basis of the noncovalent I s between the connecting nodes. RINs are generated from a protein structure with the following steps. All hydrogen atoms are added to the original protein structure by the REDUCE [51] program, PROBE [52] is then used to identify noncovalent interactions between two interacting residues, and a RIN is then generated by the RINERATOR package. The edges in the RIN are labeled with different interaction types, e.g. interatomic contacts, hydrogen bonds, overlapping van der Waals radii, and generic residue interactions. The edges are weighted with the respective noncovalent I s for the interacting residues as computed by PROBE, and the score is proportional to the strength of the interaction [31]. PROBE identifies the contacts between residues in a protein by rolling a small virtual probe (0.25 A) around the van der Waals surface of each atom; an interaction is detected if the probe touches another noncovalently bonded atom. The contact scores are evaluated per dot, and then summed for each atom pair. The combined I s is the weighted sum of the nonoverlapping van der Waals contacts, the volume of hydrogen bonds, and atomic overlaps (clashes). In contrast to the RINs that are based on the spatial atomic distance cut-off between connecting residues, RINERATOR is capable of generating a more realistic RIN by sampling the atomic packing of each atom by using the small-probe contact dot surface after the inclusion of hydrogen atoms [30].

Construction of subnetworks based on the strength of the noncovalent I s
We constructed different subnetworks, based on the strength of I s as described above, between all pairs of residues in which any pair of residues is connected by an edge, if their I s is higher than a threshold value (I smin ). Then, RINs of all three structures (V1L, RBSX, and V1A) were constructed at different I smin values, and their network topology and various network parameters were evaluated and compared. As I smin increases, the number of edges in the RINs decreases, owing to the presence of fewer edges with high I s values. Once edges with an I s below the cut-off are removed, nodes with no edges incident on them are no longer considered. All of these networks were visualized using CYTOSCAPE [35]. The NETWORK ANALYZER [53] plugin of CYTOSCAPE was used to calculate simple topological parameters, and the RINALYZER [30] plug-in was used for the comparison of RINs of V1L and V1A structures.

Strongly connected components
Furthermore, we calculated strongly connected components by use of the BINOM2.5 [54] module in CYTOSCAPE for networks of more stable and less stable mutants at different I smin values to identify their distinct clusters and clusterforming nodes. BINOM uses the algorithm of Tarjan to decompose the network into strongly connected components [55]. The giant cluster (defined here as the LSCC) is the largest group of connected nodes (in terms of number of residues) in the network that are connected to each other. The size of the LSCC (number of nodes) in the network depends on the existence of edges retained based on the I smin between the nodes. Hence, the size of the LSCC can be considered as a function of I smin . We then calculated the size of the LSCC by varying the I smin , and plotted the LSCC as a function of I smin . Here, the size of the LSCC is normalized with respect to the total number of residues in the protein.

Supporting information
Additional supporting information may be found in the online version of this article at the publisher's web site: Doc. S1. Supporting information. Fig. S1. RWC b CO of RBSX and different mutants. Fig. S2. Unique N-terminal to C-terminal contacts for less stable (V1A) and more stable (V1L) mutants. Fig. S3. Structural superpositions of TIMs and glyceraldehyde-3-phosphate dehydrogenases (GAPDHs) from different organisms. Table S1. Van der Waals contacts of mutated residues. Table S2. Unique N-terminal to C-terminal atomatom contacts. Table S3. Unique residues in the cluster of the LSCC. Table S4. Comparison of N-terminal to C-terminal contacts of glyceraldehyde-3-phosphate dehydrogenase (GAPDH) structures from different organisms.