Calnexin cycle – structural features of the ER chaperone system

The endoplasmic reticulum (ER) is the major folding compartment for secreted and membrane proteins and is the site of a specific chaperone system, the calnexin cycle, for folding N‐glycosylated proteins. Recent structures of components of the calnexin cycle have deepened our understanding of quality control mechanisms and protein folding pathways in the ER. In the calnexin cycle, proteins carrying monoglucosylated glycans bind to the lectin chaperones calnexin and calreticulin, which recruit a variety of function‐specific chaperones to mediate protein disulfide formation, proline isomerization, and general protein folding. Upon trimming by glucosidase II, the glycan without an inner glucose residue is no longer able to bind to the lectin chaperones. For proteins that have not yet folded properly, the enzyme UDP‐glucose:glycoprotein glucosyltransferase (UGGT) acts as a checkpoint by adding a glucose back to the N‐glycan. This allows the misfolded proteins to re‐associate with calnexin and calreticulin for additional rounds of chaperone‐mediated refolding and prevents them from exiting the ERs. Here, we review progress in structural studies of the calnexin cycle, which reveal common features of how lectin chaperones recruit function‐specific chaperones and how UGGT recognizes misfolded proteins.


Introduction
The endoplasmic reticulum (ER) contains two major folding pathways for protein substrates [1]. One is the general folding pathway, and one is specific for glycoproteins. The general pathway is mostly mediated by BiP, the ER homolog of 70-kDa heat shock protein (Hsp70), and P4HB (PDIA1), the founding member of the protein disulfide isomerase (PDI) family. BiP acts as a general chaperone, while P4HB and other PDIs promote the formation of protein disulfides through the action of thioredoxin-like domains that catalyze oxidation and isomerization of disulfides [2][3][4].
The pathway dedicated for N-glycosylated proteins is named after calnexin, the first protein discovered in the pathway [5]. Upon entering the ER, N-linked glycoproteins have specific asparagines labeled with a Glc 3 Man 9 GlcNAc 2 glycan. Calnexin (also called IP90, major histocompatibility complex class I antigen-binding protein p88, or p90) is one of four lectin chaperones in the ER. Calnexin and its soluble homolog, calreticulin, combine a lectin-like glycan-binding domain with a flexible arm, the P-domain that recruits other chaperones. The other major components of the pathway are UDP-glucose:glycoprotein glucosyltransferase (UGGT), the protein disulfide isomerase ERp57, and the ER glucosidases Glu I and Glu II [6-8] (Fig. 1A).
Protein folding in the calnexin cycles starts with protein synthesis and N-glycosylation as the protein enters the ER. The N-glycan is then trimmed by glucosidase I and glucosidase II to remove the outer and middle glucose residues, respectively, and generate the monoglucosylated form that specifically binds to The monoglucosylated form of newly synthesized glycoproteins proteins binds to calreticulin (CRT)/calnexin (CNX) and promotes protein folding with assistance from ERp57, CypB, and ERp29. Following release of the terminal glucose by glucosidase II, natively folded proteins are transported to Golgi. Incompletely folded proteins are reglucosylated by UGGT and rebind calreticulin/calnexin for additional folding cycles. If multiple folding cycles are unsuccessful, terminally misfolded proteins are transported to the cytoplasm for degradation via the ER-associated protein degradation (ERAD) pathway. (B) Structure of N-linked glycan. The precursor glycan is attached to the protein with three glucose residues. The first two are removed through the action of glucosidases I and II to generate the monoglucosylated form that is required for binding calnexin and calreticulin. UGGT acts on misfolded glycoproteins to add back glucose to the glycan for additional rounds of chaperone-mediated folding.

4323
The  1B). Through their P-domains, the lectin chaperones bind function-specific chaperones [11][12][13][14][15][16], which act on the bound glycoprotein to promote its folding and maturation. Glucosidase II is capable of removing the remaining glucose moiety. When this occurs, the glycoprotein is no longer able to bind calnexin/calreticulin, ending the first round of glycoprotein folding. If protein has not yet adopted its native conformation, the glucosyltransferase UGGT adds back the last glucose residue to allow the glycoprotein to bind again to calnexin/calreticulin. In this way, UGGT acts as a quality control system by specifically recognizing misfolded Man 9-GlcNAc 2 glycoproteins and returning them to calnexin/calreticulin for further processing. Three different function-specific ER chaperones are known to bind to calnexin/calreticulin. ERp57 is a protein disulfide isomerase and catalyzes the oxidation and isomerization of glycoprotein disulfide bonds. The two other chaperones, cyclophilin B (CypB) and ERp29, carry out the isomerization of peptide bonds and a general chaperone function, respectively.
There are intricate relationships between calnexin cycle and antigen presentation pathways. Calnexin cycle proteins calreticulin and ERp57 do not only chaperone MHC class I heavy chains, but are also a part of the peptide-loading complex (PLC), which also includes transporter associated with antigen processing (TAP), b2-microglobulin, and tapasin. PLC is required for loading of antigenic peptides onto MHC class I heavy chains for presentation to the immune system [17]. UGGT plays a role in this pathway by surveying the loading of MHC class I complexes, including reglucosylation of empty complexes [18,19]. It also reglucosylates incorrectly assembled T-cell antigen receptor (TCR) complexes [20].
This review provides an overview of proteins involved in calnexin cycle with an emphasis on recent structural insights (Table 1). These include calreticulin in the context of PLC, P-domain recognition by ER chaperones, and structural characterization of UGGT [15,[21][22][23][24][25]. As recent structures of glucosidase II were well covered in other reviews [26,27], this work will focus on developments in understanding of UGGT and calreticulin and their mechanism of action.

Structure of calnexin/calreticulin
Calnexin and calreticulin are the most abundant representatives of a small family of lectin chaperones residing in the ER. The other members are the tissuespecific homologs calmegin and calreticulin 3. Proteins in this family share common structure consisting of a glycan-binding lectin domain and a very unusual armlike structure, termed the P-domain due to the abundance of proline residues ( Fig. 2A). The lectin domain adopts a globular fold with the P-domain inserted in the middle of the lectin domain primary sequence [28]. Two of the lectin chaperones, calnexin and calmegin, are membrane-bound through a C-terminal transmembrane helix, while the calreticulins are soluble proteins. Calnexin has a C-terminal cytosolic RKPPRRE motif involved in the endoplasmic reticulum retention, while calreticulin possesses a luminal KDEL-retrieval sequence.
Structures of the lectin domains of calnexin and calreticulin (Table 1) show a jellyroll fold largely formed by a sandwich of two large b-sheets: a seven-stranded, concave b-sheet and a six-stranded, convex b-sheet. Besides additional small b-sheet and two short a-helices (Ala32-Arg36 and Leu196-Asp199), a prominent feature of calreticulin is a long C-terminal a-helix (Glu336-Asp362) that runs along and beyond the convex b-sheet (Fig. 2B). The recent cryo-EM structure of the peptide-loading complex containing full-length calreticulin modeled this helix extending until Glu386 with~30 missing residues due to disorder [21]. It appears likely that the crystal structures provide a more realistic view of folded boundaries in solution, as 20 C-terminal residues in the cryo-EM structure are modeled without sufficient electron density. In agreement with that, limited proteolysis experiment readily yielded cleavage at Lys368, suggesting that the folded region ends prior to that residue [29]. It should be noted that while the C-terminal tail is unlikely to produce a stable structure in solution, it might become more ordered upon binding calcium ions [30].
The details of glycan binding were revealed by the high-resolution structure of calreticulin in complex with Glc 1 Man 3 tetrasaccharide, the Glc(3)-Man(D1)-Man(C)-Man(4) branch of the monoglucosylated Glc 1 Man 9 GlcNAc 2 glycan [29] (Fig. 2C). The tetrasaccharide binds along the long groove formed by the curved b-sheet with all sugar moieties engaged in protein binding. Importantly, the glucose moiety lies flat in the shallow cavity, the base of which is formed by Met131 and Ile147. In addition to these hydrophobic contacts, every oxygen of the glucose Glc(3) is involved in direct or indirect hydrogen bonds with the lectin domain, thus providing the specificity for glucose. The most crucial hydrogen bond is between O2 of Glc(3) and the side chain of Lys111 [29]. Mutagenesis studies have shown that Lys111 is required for the calreticulin-carbohydrate interaction [31,32].
Man(D1) and Man(C) mainly use their O4-O6 edges for interactions with the lectin domain. In Federation of European Biochemical Societies particular, O4 of Man(D1) engages in three direct hydrogen bonds with Tyr109 and both the side chain and backbone carbonyl of Asp317. Asp317 is required for binding because it also makes direct hydrogen bonds with O4 and O6 of Man(C) (Fig. 2C). The affinity of Glc 1 Man 3 to the calreticulin lectin domain is 0.7 µM, which is very close to the reported value for intact calreticulin [33] suggesting that glycan binding is major route for substrate recognition by lectin chaperones. The glycan-binding surface is essentially identical in calnexin and calreticulin. The residues that are involved in carbohydrate binding are highly conserved and adopt very similar conformations in both proteins. In the cell, calnexin and calreticulin display overlapping but distinct patterns of interaction with substrate glycoproteins [34][35][36]. Because the calnexin/calreticulin lectin sites are nearly identical, the observed differences in substrate specificity must be based on other properties. Previous studies have shown that the distinct luminal versus membrane-bound topologies of calreticulin and calnexin affect selection of substrate glycoproteins [34,37,38].
The lectin domains of both calreticulin and calnexin contain a solvent-exposed disulfide bridge on the edge of lectin site. Previous studies showed that treatment with reducing agents dithiothreitol and tris(2-carboxyethyl)phosphine (TCEP) abrogates carbohydrate binding by calreticulin [29,39]. These cysteines are also essential to the chaperone function of calreticulin [40]. This is because this disulfide bond is involved in contacting the Man(C) and Man(4) moieties of glycan ( Fig. 2C) [29].
The calreticulin lectin domain structures also defined the location of a high-affinity calcium-binding site [29,41]. The calcium ion is coordinated by the side chain of Asp328, and backbone carbonyls of Gln26, Lys62, and Lys64 (Fig. 2D). Besides the high-affinity site, the C-terminal tail of calreticulin contains multiple low-affinity Ca 2+ -binding sites [42] and is  [43]. Likewise, the highly acidic N-terminal and C-terminal regions of calnexin also contain multiple lowaffinity calcium-binding sites [44]. More recent studies demonstrated that the C terminus of calreticulin has a propensity to form a helical structure [45] and its secondary structure gets enhanced in the presence of Ca 2+ ions [45,46].

P-domains
Sequence identify among the lectin chaperones is highest in the P-domains. The domains are hairpin-like structures composed of multiple type I and type II motif repeats [28,47]. The calnexin and calmegin P-domains are~140 residues long and composed of four type I motifs IxDPxxxKP(E/D)DWD followed by four type II motifs GxWxxxxIxNP. The domains from calreticulin and calreticulin-3 are smaller with only three repeats of each motif. The reason for that difference is unclear. It could reflect specificity for different protein substrates, or it could be due to fitting requirements into calnexin-and calreticulin-specific multiprotein complexes. While calreticulin is best known for its involvement in MHC class I assembly and calnexin/ calreticulin cycle in the endoplasmic reticulum, a multitude of recent studies demonstrated calreticulin expression on cell surface, where it appears to play a role in apoptosis and phagocytosis of dying cells (for a review, see Raghavan et al. [48]).
In the folded P-domain structure, the type I motifs interact with type II in a head-to-tail fashion forming four modules each containing a small hydrophobic core of two tryptophans and a lysine (Fig. 2E). The hairpin-like structure is additionally stabilized via interactions of conserved isoleucines producing an isoleucine zipper. In addition to being shorter, the calreticulin P-domain is missing a disulfide bond (Cys360-Cys366) in the beginning of the tip module of calnexin and calmegin. The importance of this disulfide is unknown, but its reduction leads to local unfolding in the calnexin P-domain (G. Kozlov, unpublished observations).

ERp57
Cooperative interactions of chaperones are crucial for efficient protein folding in the ER. Calnexin and calreticulin often serve as a scaffold bringing together Nglycosylated proteins with the ER-resident chaperones. ERp57, a protein disulfide isomerase, was one of these proteins originally identified and established as a part of calnexin cycle pathway [11,49]. More recent studies revealed and characterized interactions of lectin chaperones with cyclophilin B, a peptidyl-prolyl isomerase, and a general chaperone ERp29 [14,15].
Structurally, ERp57 consists of four thioredoxin-like (TRXL) domains termed a, b, b 0 , and a 0 . The N-and C-terminal a and a 0 domains contain CGHC catalytic motifs, while the b and b' domains have lost the catalytic cysteines (Fig. 3A). ERp57 is similar to PDI both in its domain organization and the primary sequence. Similarity is highest in the catalytic a and a 0 domains (~50% identity) and lowest in the b and b 0 domains (~20%).
The thioredoxin-like fold TRXL = babababba (Pfam Thioredoxin_6 family PF13848) is a very stable and common domain consisting of a central five-stranded b-sheet covered by two a-helices on each side. It is a derivative of classical thioredoxin fold TRX = bababba (Pfam Thioredoxin family PF00085). Approximately twenty proteins in the large family of protein disulfide isomerases contain at least one thioredoxin-like domain with a babababba sequence of secondary elements. Some are catalytically (redox) active domains containing CxxC motif at the N terminus of helix 2, while noncatalytic domains do not contain catalytic cysteines and either play structural role or are involved in protein interactions. The bacterial thiol disulfide oxidoreductase, DsbA, that functions analogously to PDIs, displays a modified version of thioredoxin fold, a DsbA-like thioredoxin fold DSBA = bab-aaaa-abba (Pfam DSBA family PF01323) with an extra four-helical subdomain that caps one side of the domain (Fig. 4).
The extensive presence and diverse functions of thioredoxin-like domains in the ER are quite remarkable (Table 2). What is the reason for the presence of this fold in so many proteins? One reason is that these domains are very robust and able to withstand a wide range of changes in environment. Another reason could be a versatility of this fold. Even within one protein (for instance PDI itself), the same fold can be utilized for oxidoreductase activity and for substrate binding.
Previous studies using NMR spectroscopy and mutagenesis revealed that the tip of the P-domain of calnexin/calreticulin binds to ERp57 [12-13,54], while a large positively charged patch of residues in the ERp57 b 0 domain represents the calnexin/calreticulinbinding site [55]. In particular, mutating Asp347 and Met346 of human calnexin (Asp258 and Met257 of calreticulin) completely abrogates the binding, as does the R282A mutation in ERp57. Furthermore, the K214A, K274A, and R282A mutants of full-length ERp57 are compromised in their ability to fold  RNase B in in vitro folding assay, demonstrating a requirement for the calnexin-ERp57 interaction for efficient glycoprotein folding [55].
A visualization of calnexin/calreticulin-ERp57 interaction eluded numerous co-crystallization attempts for a long time. Recently, a low-resolution snapshot of the calreticulin-ERp57 complex in the context of peptideloading complex was obtained by cryo-EM [21]. In that structure, the tip module of the P-domain primarily interacts with the b 0 domain of ERp57 (Fig. 3B). The binding site comprises the N-terminal half of long helix a2, the region preceding helix a4 of the b 0 domain, and the unusually long b4-b5 loop of the b domain. As a result of the interaction, the catalytic sites of ERp57 are facing the glycan-binding site of calreticulin.
It is important to note that the structure represents only one of the possible orientations between calreticulin and ERp57 because of the intrinsic mobility of the P-domain. The relative orientation of calreticulin and ERp57 in the peptide-loading complex is mostly constrained by tapasin, which plays a role of pseudosubstrate of ERp57 by engaging its catalytic sites, while also interacting with the C-terminal helix of calreticulin. The ERp57-tapasin positioning in the cryo-EM structure is similar to the previously determined crystal structure of these proteins [56]. In the context of protein folding, the P-domain flexibility would result in widening or narrowing the distance between the catalytic sites of ERp57 and the lectin site of calreticulin/calnexin. One of the implications would be an ability to adjust to protein substrates of variable sizes. On the other hand, this movement could be a driving force for unfolding the bound substrate, a necessary step in disulfide reshuffling.
It should be noted that the precise ERp57:CRTbinding determinants are still to be resolved. While the structure confirms the binding sites on both proteins, its low resolution (5.8 A) precludes us from identifying individual contacts responsible for the interaction. Moreover, the exact placement of the P-domain relatively to ERp57 needs to be adjusted. This conclusion follows from steric clashes for a number of residues upon restoring their side chains missing in the model, for instance Lys274 of ERp57. Secondly, the structure does not explain the role of critical residues (such as Met257 and Asp258 of calreticulin among others), which are required for the binding. Therefore, a highresolution structure of the complex would be very informative in pinpointing structural determinants of the binding.

CypB
Cyclophilin B (CypB) is a peptidyl-prolyl cis-trans isomerase (PPIase) found in the ER [57,58] and inhibited by cyclosporin A binding to its active site with high affinity [57] (Fig. 3A). The functional relevance of cyclophilin B in the ER is demonstrated by its involvement in the folding of collagen [59] and the maturation of transferrin [60]. CypB expression is activated by the ER stress, whereas its absence makes cells more sensitive to ER stress [61].
The crystal structure of CypB in complex with the P-domain from calmegin provided a mechanism for recruitment of PPIase activity to misfolded N-glycoproteins and suggested that CypB functions as part of the calnexin cycle [14]. The structure shows that the tip of the P-domain binds to a well-defined surface opposite the cyclosporin A-binding site and with a The single most important residues from each protein are Lys97 of CypB and Asp338 of the P-domain (corresponding to Asp347 of calnexin and Asp258 of calreticulin), as mutations of each these residues abolish the interaction [14]. Lys97 of CypB forms salt bridges with Asp338 and Asp332 and hydrogen bonds with the carbonyl of Asp332. Among the many lysine residues of the binding site, only the side chains of Lys9, Lys97, and Lys183 of CypB are involved in the interactions with P-domain underlying specificity of the binding. Besides interacting with Lys97, the side chain of Asp338 forms an intermolecular hydrogen bond with the side chain of Thr36. Also, the side chain of Met337 at the very tip of the P-domain inserts between the aliphatic parts of Lys9 and Lys35 of CypB. The absence of a side chain for Gly339 allows for closer approach of the P-domain to CypB surface [14]. The P-domains of both calnexin and calreticulin bind CypB with affinity on the order of 10 µM as estimated from NMR studies [14]. This is very similar to the affinities of ERp57 binding to calnexin (K d of 6 µM) [55] and to calreticulin (7 µM) [12].
It is very likely that calnexin/calreticulin and CypB interact in vivo. CypB and calnexin/calreticulin co-localize in the ER and are associated with multichaperone ER complexes [62,63]. The interaction between CypB and calreticulin has been proposed to contribute to ER retention of CypB, which lacks other known ER-retention signals [64]. The association of glycan-binding activity with CypB provides a mechanism for the recruitment of PPIase activity in the ER to newly synthesized glycoproteins, such as the C H antibody heavy chain. The heavy-chain C H 1 domain possesses three cis-prolines in its native state, and its folding is markedly accelerated by CypB [65]. Future work is required to test whether monoglucosylation affects the rate of proline isomerization of N-glycoproteins.
The NMR studies identified P-domains of calnexin/ calreticulin and the D-domain of ERp29 as the domains responsible for the ERp29-calnexin/calreticulin interactions [15]. In fact, binding of ERp29 and ERp57 involve the same residues at the tip of the Pdomain from either calnexin or calreticulin [13,15]. The binding affinity between the ERp29 D-domain and calnexin P-domain, or between full-length ERp29 and calreticulin is in the order of 13 µM measured using NMR and surface plasmon resonance [15,16].
The ERp29 D-domain shows an unusual fold where two C-terminal antiparallel helices are partially solvent-exposed by extending out from a three-helix bundle. These solvent-exposed helices form the binding site for the P-domain (Fig. 3D). In particular, Arg223 of ERp29 is crucial for the binding as it makes salt bridges with Asp348 of calmegin (Asp258 of calreticulin) and hydrogen bonds with backbone carbonyl of Asp342 (Asp252 of calreticulin). The positively charged Lys204, Lys208, Arg226, and Lys237 of ERp29 are also engaged in polar interactions with the P-domain. Similar to CypB interactions, the side chain of Met347 at the very tip of the P-domain binds in a hydrophobic pocket on the ERp29 surface [15].
The D347K mutation in the calnexin P-domain results in no binding to ERp29. The same mutation was previously shown to abrogate calnexin binding to ERp57 and CypB [14,77]. Therefore, the same site is responsible for interactions with all three proteins. On the ERp29 side, the R223A, R223E, L227E, and L241K mutations also abolish the binding. The P-domains from calnexin, calreticulin, and calmegin are all able to specifically bind the D-domain of ERp29. The D-domain of ERp29 is unique in the human genome, but conserved in ERp29 homologs from other species with sequence conservation highest in the P-domain-binding residues. Therefore, it is likely that calreticulin/calnexin binding is a conserved ERp29 function across species.
Windbeutel, the Drosophila ortholog of ERp29, functions in embryo development through processing of a Golgi sulfotransferase, Pipe [71,78]. Two regions of Drosophila ERp29 are required for Pipe localization: one in the TRXL domain that mediates binding of denatured thyroglobulin and Pipe, and a second in the D-domain of previously unclear function [72,79]. The structural data suggest that the principal function of the D-domain is calreticulin/calnexin binding. In agreement with that, mutations in the calreticulin/calnexin-binding site of Drosophila ERp29 block processing of Pipe [79,80]. In particular, loss of Arg223 blocks both Pipe processing and P-domain binding. Interestingly, while full-length human ERp29 cannot replace Drosophila ERp29 for Pipe localization in vivo, the Ddomain can be swapped, suggesting a functional conservation of that domain [75]. In another example of functional implication, the calreticulin/calnexin-binding site is required for the ER retention of the Dictyostelium ERp29 ortholog, which lacks an ER-retention signal [81].
The dimerization of ERp29 allows for the assembly of larger chaperone complex with two lectin chaperones bound to one ERp29 dimer. While the functional implication of that is currently unclear, this may lead to a tighter binding of multiglycosylated protein substrates. ERp29 dimerization may also play a role in glycosylation-independent chaperone function by promoting direct binding of nonglycosylated substrates to calreticulin and calnexin.

Common features of calnexin/ calreticulin interactions with partners
Comparison of the crystal structures of P-domains from calnexin luminal domain [28], calreticulin with partially truncated P-domain [82], and the P-domain complexes [14,15] shows that the structures of P-domain modules are highly similar despite the intrinsically flexible nature of P-domains in solution. The rigidity of a module originates from a small hydrophobic core formed by side chains of two tryptophan residues along with lysine followed by a proline residue. The very tip of the P-domain forms a one-turn helix. The hydrophobic core and helical turn were observed in the solution structure of the calreticulin P-domain, confirming that the conformation is formed prior to binding [47]. Thus, the overall flexibility of P-domains likely arises in the hinge regions between the modules.
A number of residues are highly conserved in the Pdomains. Some of these such as tryptophan and lysine play a structural role, while others are involved in protein binding. Among the latter, a methionine, aspartic acid, and glycine residue at the tip of the P-domain (the MDG-binding motif) are absolutely conserved in all family members and are crucial for the ERp57, CypB, and ERp29 binding. The helical turn projects the key binding residues, methionine and following aspartic acid (Met346 and Asp347 in human calnexin, and Met257 and Asp258 in human calreticulin), to their binding partners. The aspartate residue makes key salt bridges with its counterparts, Lys97 of CypB and Arg223 of ERp29. It is tempting to speculate that it interacts with Arg282 of ERp57, but a higher resolution structure is needed to confirm this. Significantly, the calnexin P-domain D347K mutation abolishes binding to ERp57 [14], CypB, and ERp29, while the homologous aspartic acid is required for calreticulin binding to ERp57 [77]. The side chain of methionine is involved in intermolecular hydrophobic interactions, while the absence of a side chain in glycine residue allows for close packing with the binding partner. It appears that no other residue could be tolerated at this position, explaining the conservation of this glycine in the calnexin/calreticulin protein family. High conservation of these residues (with leucine replacing methionine in CRT3) strongly suggests that this lectin chaperone would also interact with the same binding partners.
Remarkably, the binding sites for the P-domain are formed from strikingly different structural scaffolds (Fig. 3B-D). The ERp57 site is composed of one helix and two loops; the CypB site consists of loops, while the ERp29-binding site is all-helical. Beyond these differences, the common feature is the pronounced positive charge, accounting for the presence of multiple aspartates and glutamates in the P-domains.
The interactions of calnexin/calreticulin with ERp57, CypB, and ERp29 form a highly interconnected cluster of protein-protein interactions within the ER. The binding affinities to all three proteins are in the same range of 5-15 µM, suggesting no strong preference to any of the partners. While one lectin chaperone can only bind one other associated chaperone, the dynamic nature of interaction likely prevents folding bottlenecks or dead ends. Thus, calreticulin and calnexin appear to act as plurivalent adaptors that recruit other  to assist in different aspects of protein folding, such as disulfide bond formation, proline isomerization, or general chaperone activity. The affinity of the calnexin/calreticulin for monoglucosylated glycans is roughly an order of magnitude higher (0.7 µM) [29], suggesting that the lectin-glycoprotein associations are longer-lived than the lectin-chaperone associations. This opens a possibility of different chaperones sequentially acting on the same glycoproteins assisting with different aspects of folding.
The sequence conservation of the P-domains is in sharp contrast with the diversity of binding sites on ERp57, CypB, and ERp29. This suggests that these chaperones became specialized for glycoprotein folding through convergent evolution of their P-domain-binding sites. The remarkable versatility of the tip of the Pdomain to interact with different structural scaffolds hints at the existence of other protein partners yet to be discovered.

Interactions of calreticulin with other ER-resident proteins
Do calnexin and calreticulin work only as a scaffold by bringing together protein substrates and other chaperones, or do they provide some chaperoning themselves? Because the substrate would be likely positioned between lectin site and another chaperone bound to the tip of P-domain, it is reasonable to expect some contacts between the protein substrate and the interior side of P-domain. Indeed, the P-domain truncation mutants of calreticulin display decreased ability to suppress protein aggregation in vitro [82,83].
Another interesting aspect is the ability of calreticulin and calnexin to bind directly to nonglycosylated hydrophobic peptides with micromolar K d [82][83][84][85] or to suppress aggregation of nonglycosylated proteins in vitro [82,[86][87][88]. This aggregation suppression was mapped to the lectin domain of both calnexin and calreticulin [82,83]. Consequently, the identification of such peptide binding is of considerable interest. The surfaces overlapping with the lectin site were previously proposed to be binding sites for nonglycosylated substrates [40,41], but this should be taken with caution. Treatment with monoglucosylated oligosaccharide, which would block the proposed site, does not affect binding of hydrophobic peptides by calreticulin [82].
More recently, a surface distant from lectin site was identified as responsible for in vitro binding of nonglycosylated substrates [89]. In particular, two double mutants P19K/I21E and Y22K/F84E of calreticulin do not efficiently suppress aggregation of firefly luciferase and do not bind hydrophobic peptides. The use of these peptide-binding-deficient and lectin-deficient mutants in calreticulin-negative cells allowed accessing the relative contributions of glycan-dependent and glycan-independent in calreticulin function in biogenesis of MHC class I molecules [89]. The conclusion is that the lectin-based interactions provide the major contribution, whereas the peptide-binding site has little affect on calreticulin function in vivo.
Experiments using T7 phage display system revealed interactions between calreticulin and protein disulfide isomerase-related (PDIR) protein [90]. The interaction was later confirmed by a mass-spectrometry study [91]. The affinity of the binding was measured as 16 µm using surface plasmon resonance [90], which would place this interaction into a similar range of affinities with other known calreticulin-binding partners such as ERp57, CypB, and ERp29. PDIR (also called PDIA5) was originally found in a human placental cDNA library [92]. It is upregulated in mucopolysaccharidoses, diseases caused by defects in degrading glycosaminoglycans [93].
PDIR consists of four thioredoxin-like domains, but has a unique architecture in PDI family, as it contains an N-terminal noncatalytic domain followed by three catalytic domains. Crystal structure of the noncatalytic domain identified a conserved positively charged surface, a prime candidate for interacting with the negatively charged P-domain [94]. Indeed, NMR titrations showed some binding between P-domain and noncatalytic PDIR domain, but the binding was centered on the hinge region instead of the tip of the P-domain [94]. It should be noted that the observed interactions were too weak to account for the full affinity. There is still more to learn about the calreticulin-PDIR binding, and perhaps, future studies would identify other domains of both proteins contributing to this interaction.
Early studies reported interactions between calreticulin and PDI, though the binding was not observed in the presence of Ca 2+ ions [95]. This work pointed to the P-domain as a major site of this interaction, but these results may have to be re-evaluated, as the calreticulin constructs were designed in the absence of structural information at the time. More recent studies tested a panel of seven PDIs (ERp27, ERp29, ERp44, ERp46, ERp57, PDI, and PDIp) for calreticulin interactions by surface plasmon resonance and only identified ERp29 and ERp57 as calreticulin-binding proteins [16].
There are intriguing similarities in glycoproteins processing by calnexin cycle and ER-associated degradation (ERAD) machineries. They both heavily rely on the state of the glycan, which is recognized and captured by ER lectins, CNX/CRT and UGGT in calnexin cycle and the ER degradation-enhancing a-mannosidase-like proteins (EDEMs) in ERAD. Both systems also display functional and specific interactions with a number of PDIs, where the latter often responsible for reduction and/or reshuffling disulfides in glycoprotein clients. Those include CNX/CRT-ERp57 and UGGT-Sep15 pairings in calnexin cycle, while EDEM1-ERdj5 and EDEM2-TXNDC11 display reminiscent functional cooperativity in ERAD [96][97][98]. Does glucosidase II interact with calnexin/calreticulin, or it gets recruited via another member of calnexin cycle? How do they compete for monoglucosylated proteins? This has important implications on the rate with which glycoproteins escape from calnexin cycle. One study showed the preference of glucosidase II for folded versus misfolded monoglucosylated substrates when in the presence of calreticulin but not on its own [99], but there is still much to learn on their interplay. Future studies will likely uncover more calnexin/calreticulin interactions with other ER-resident proteins, a result of comprehensive folding machinery in the ER.

Structure of UGGT
N-glycoproteins that are difficult to fold undergo multiple rounds of folding with assistance of ER lectin chaperones. By reglucosylating misfolded proteins, UGGT plays the role of a checkpoint allowing misfolded proteins to rebind to the lectin chaperones and preventing their exit from the ER. UGGT expression is elevated upon ER stress and is a part of unfolded protein response [10]. UGGT also controls the loading of peptide antigens onto major immunological molecules, T-cell receptor, and the major histocompatibility complex [17][18][19][20]. Most vertebrates possess two homologous genes UGGT1 and UGGT2. UGGT2 shares significant sequence identity (55%) to UGGT1 but does not display comparable reglucosylation activity on certain substrates [11]. More recently, UGGT2 was shown to possess enzymatic activity using synthetic substrates [12,103]. It is very likely that UGGT1 and UGGT2 evolved to have different clients in glycoprotein folding pathway. UGGT2 has been recently proposed to serve as a folding checkpoint for a distinct set of yet-to-be-identified misfolded glycoproteins [14].
Mammalian UGGTs are approximately 1500-residue proteins, where the N-terminal~1200 residues are responsible for sensing misfolded substrates and the C-terminal~300 residues harbor a glucosyltransferase 24 family (GT24) A-type catalytic domain (Fig. 4A). For a long time, multiple efforts to structurally characterize UGGT were unsuccessful, with only the structure of one of the domains determined in 2014 [15]. Finally, there was a breakthrough in 2017 with several laboratories reporting UGGT structures by X-ray crystallography, electron microscopy, and small-angle Xray scattering [22][23][24]. We now have a comprehensive view of the structure of UGGT. All crystal structures have been done on UGGT from thermophilic fungi, which possess a single UGGT gene. Nevertheless, the structural conclusions should be applicable to both UGGT1 and UGGT2 in vertebrates given high sequence identity between UGGT1 and UGGT2. The crystal structures show UGGT forms a saddle-like shape with a large central cavity (Fig. 4B) [22,25]. The shape is consistent with the low-resolution EM structures and with molecular envelope obtained in solution using SAXS data [22][23][24].
The structure consists of four N-terminal ab-sandwich domains, followed by a saddle-shaped pair of bsandwich domains that seat the catalytic domain (Fig. 4B,C). Overall, the N-terminal domains of UGGT are very unusual and structurally more similar to DsbA than to PDIs (Fig. 4C,D). They were assigned their own families by Pfam database of structural folds: Thioredoxin_12 (PF18400), Thiore-doxin_13 (PF18401), Thioredoxin_14 (PF18402), and Thioredoxin_15 (PF18403) for UGGT domains 1, 2, 3, and 4, respectively. Rather confusingly, they were termed thioredoxin-like (TRXL) domains despite their significant deviation from the canonical PDI-like TRXL fold (PF13848) and obvious similarity to DsbA fold (PF01323). However, for consistency with the previous UGGT literature, we are referring to the absandwich UGGT domains as TRXL1, TRXL2, TRXL3, and TRXL4 in this review.
While the first ab-sandwich UGGT domain resembles DsbA (Fig. 4D), the order of secondary structure elements is different. In DsbA (babÀaaaaÀabba), the helical subdomain arises from residues inserted into the middle of the thioredoxin fold, while in the first UGGT domain (aaaaÀbab-abba), the helical elements precede the thioredoxin fold. Even more striking, the TRXL4 and b-sandwich domains are folded with discontinuous regions of the primary sequence (Fig. 4A). This complex topology is largely responsible for earlier difficulties in predicting UGGT structural domains.
The similarity of domains of UGGT to DsbA fold raises the question of the origin of UGGT. While it is generally assumed that many PDIs originated via gene duplication of TRXL domains, it does not appear to apply to UGGT. TRXL2 and TRXL3 are most similar among UGGT domains, but even they possess significant differences (Fig. 4C).
High-resolution structures of UGGT catalytic domain were determined in complex with UDP-glucose and UDP [24]. The structure shows significant fold similarity with GT8 family of glycosyltransferases [16]. UDP-glucose and the catalytically important calcium ion are buried in the active site with two aspartates from the invariant DxD motif coordinating Ca 2+ (Fig. 4E). One of the helices (residues 1325-1344 corresponding to residues 1389-1408 of human UGGT1) on the edge of the active site is significantly distorted, with the place of distortion creating a flat cavity leading to the UDP-glucose hydrolysis site. This is similar to the position of substrate in another member of GT8 family, a galactosyltransferase LgtC [16] (Fig. 4E). Thus, it is a very likely location of the glycan entrance in UGGT. Vicinity of the active site contains a number of small patches of hydrophobic residues such as Phe1333, Gly1337, Tyr1338, and Trp1339 (Phe1397, Gly1401, Tyr1402, and Trp1403 in human UGGT1

Mechanism of action of UGGT
Despite the recent breakthrough in UGGT structural characterization, the mechanism of its action is still not clear. The full-length UGGT structures showed only a limited range of mobility with the catalytic domain fixed to the b-sandwich domain, while the main source of mobility originates from TRXL3 and especially TRXL2 domain (Fig. 4F) [22]. Comparison of the full-length Ch. termophilum UGGT structures with the catalytic domain-deleted fragment of UGGT from Th. dupontii [24] similarly shows large shifts in the positions of TRXL2 and TRXL3 domains, while the relative positions of the TRXL1, TRXL4, and brich domains are preserved. This suggests that TRXL1, TRXL4, and b-sandwich domains comprise a rigid scaffold, while the TRXL2, TRXL3, and catalytic domains account for the ability of UGGT to act on protein substrates of differing sizes and shapes. In agreement with this, UGGT activity was impaired when mobility of its N-terminal domains was limited using engineered interdomain disulfide bonds [22]. Thus, flexibility appears to be important for UGGT activity and versatility toward numerous substrates in the cell.
Because of their influence on the size of the saddle, the TRXL2 and TRXL3 domains are expected to be partially responsible for recognizing misfolded stretches of protein substrates. Multiple TRXL domains were shown to convey binding of hydrophobic stretches starting from protein disulfide isomerase (PDI) itself to other members of PDI family [2, 94,108]. Most likely, the cavity-faced surfaces of UGGT TRXL2 and TRXL3 domains participate in recognition of misfolded substrates. The important role of TRXL2 in substrate reglucosylation has been recently supported by UGGT deletion mutagenesis and molecular dynamics simulations [25]. Future mutagenesis studies should confirm the substrate-binding surfaces.
There is still a lack of clarity in the mode of catalytic domain involvement. Early theories proposed a great deal of mobility between the N-terminal part and catalytic domain, while the full-length UGGT structures invariably showed the catalytic domain firmly entrenched in the b-sandwich surface [22]. At the same time, negative-stain EM and SAXS data implied significant movements of the catalytic domain in solution [24], but alternative interpretation is also possible [25]. Notably, the ability of catalytic domain to be stable in solution independently from the rest of UGGT is supported by crystal structures of individual catalytic domain [24]. Perhaps, the release of catalytic domain from the bsandwich domain may be facilitated by binding to UDP-glucose and/or protein substrate. This contradiction can be resolved by permanently tethering catalytic domain to the b-sandwich domains via engineered disulfide bonds and testing the mutant for activity.

UGGT-Sep15 interactions
UGGT1 binds with high affinity (K d of 20 nM) to ER oxidoreductase Sep15 [19]. Sep15 (also called 15-kDa selenoprotein or selenoprotein F) is a member of small family of selenoproteins found in the ER [110]. Sep15 lacks a typical ER-retrieval signal suggesting that it is maintained in the ER via a different mechanism, most likely through high-affinity binding to UGGT1. Supporting that hypothesis, the entire pool of Sep15 was shown to be bound to UGGT1, while UGGT1 occurs in both Sep15-bound and free states [111].
Structurally, Sep15 consists of two domains, a~50residue cysteine-rich N-terminal domain followed by a Sep15/SelM redox domain (Pfam Sep15_SelM family PF08806). The Sep15 redox domain contains selenocysteine (U), which is separated from cysteine by a single residue in Sep15 catalytic motif (CxU). This is a deviation from typical oxidoreductases, including PDIs, which possess CxxC catalytic motif. The NMR structure of this redox domain revealed significant differences from thioredoxin [112]. In particular, the structure contains a four-stranded b-sheet with a-helices on only one side (Fig. 4D). In comparison with a typical thioredoxin fold, the structure is missing two helices so that one side of the b-sheet is solvent-exposed. This surface presents several hydrophobic residues, which could potentially interact with misfolded substrates. It is also possible that this surface is used for binding UGGT1 or intramolecular contacts with the N-terminal domain of Sep15. The fold is also missing an N-terminal b-strand that is usually found in thioredoxin-like domains in the PDI family [2]. Thus, Sep15 represents a simplified topology of the redox domain with the babbba organization as compared to a most typical babababba thioredoxin-like fold in PDI family. Curiously, unlike thioredoxin-like domains, the catalytic motif of Sep15 is located in a loop rather than at the N terminus of an a-helix. Why does Sep15 contain selenocysteine in place of one of the cysteines in its active site? Selenocysteine likely modifies the redox potential affecting the potency of its oxidoreductase activity; however, it does not appear to be a requirement for Sep15 function as the Drosophila ortholog possesses a cysteine as opposed to selenocysteine. The redox potential of Drosophila Sep15 is À225 mV [112], which lies between the potentials of the protein disulfide oxidase PDIA1 (À175 mV) [113] and thioredoxin (À270 mV) [114]. This suggests that Sep15 is likely involved in the reduction or isomerization of disulfide bonds (rather than their formation).
Sep15 possesses a distinct cysteine-rich N-terminal domain, which is responsible for binding to UGGT1 [19]. Six invariantly conserved cysteines were shown to be critical for the interaction. As the structure of this domain is still unknown, it is currently unclear whether these cysteines actually contact UGGT1 or play a structural role. A structure of UGGT1 in complex with Sep15 will provide important mechanistic insights into Sep15-UGGT1 cooperativity in protein folding.
What is the role of Sep15 in the function of UGGT1 and the calnexin/calreticulin cycle in general? Previous studies showed enhancement of UGGT1 and UGGT2 activities upon binding to Sep15 [12,103]. Recent results suggest that Sep15 prevents secretion of disulfide-rich glycoproteins with incorrectly formed disulfides to Golgi providing additional step of quality control in the ER [115]. It is plausible that Sep15 enhances UGGT activity via reduction in incorrect intramolecular/intermolecular disulfides in misfolded UGGT substrates, thus enabling easier access of glycan to the UGGT active site. This is reminiscent of the EDEM-ERdj5 cooperation [96].

Future directions
Recent years have seen new exciting developments in structural understanding of folding pathways of glycoproteins in the ER and brought new potential members of the calnexin cycle into the light. Despite this progress, many questions still remain unanswered. The molecular details of UGGT action are still not fully understood. What is the basis of Sep15 involvement in UGGT function? Future studies of UGGT complexes with Sep15 and substrates would clarify many of these aspects.
On the calnexin/calreticulin side, recent insights provided an exciting view of structural organization of these proteins and how they recruit their helpers assisting in folding N-glycosylated substrates. Calnexin and calreticulin have been traditionally viewed as chaperones, but in light of recent studies they rather appear to function as scaffolds. We now have a much better understanding of their scaffolding function, where the P-domain works as a long flexible arm that recruits a folding assistant and brings it to a glycosylated substrate captured via the lectin domain. It also became apparent that this process is much more complex than originally thought and involves multiple folding assistants besides ERp57. Because glycan-based interactions are approximately 10-fold stronger resulting in a longer lifetime of the bound state, calnexin/calreticulin likely shuffles through multiple chaperones assisting with different aspects of protein folding of any single substrate. Based on recent developments, it would not be surprising if additional chaperone partners of calnexin/calreticulin will be discovered in future years. It will also be interesting to see whether there are other ways in which calnexin/calreticulin can bind chaperones (such as PDIR) and whether this could lead to the formation of multichaperone complexes to assist with folding of specific substrates.
An important unanswered question in the field is the interplay of calnexin/calreticulin and UGGT with glucosidase II. Are activities of lectin chaperones, glucosidase II, and UGGT coordinated in any way? How does glucosidase II get recruited into calnexin cycle and how it competes with calnexin/calreticulin for monoglucosylated substrates? There is still much to learn about calnexin cycle pathway, and the future years will undoubtedly bring us more exciting discoveries.
by Natural Sciences and Engineering Research Council of Canada Discovery Grant RPGIN 2014-04686 (to KG).

Conflicts of interest
The authors declare no conflict of interest.