Journal list menu

Volume 287, Issue 13 p. 2723-2743
Original Article
Free Access

Molecular basis for the preferential recognition of β1,3-1,4-glucans by the family 11 carbohydrate-binding module from Clostridium thermocellum

Diana O. Ribeiro

Diana O. Ribeiro

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Search for more papers by this author
Aldino Viegas

Aldino Viegas

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Search for more papers by this author
Virgínia M. R. Pires

Virgínia M. R. Pires

CIISA - Faculdade de Medicina Veterinária, Universidade de Lisboa, Avenida da Universidade Técnica, Lisboa, Portugal

Search for more papers by this author
João Medeiros-Silva

João Medeiros-Silva

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Search for more papers by this author
Pedro Bule

Pedro Bule

CIISA - Faculdade de Medicina Veterinária, Universidade de Lisboa, Avenida da Universidade Técnica, Lisboa, Portugal

Search for more papers by this author
Wengang Chai

Wengang Chai

Glycosciences Laboratory, Department of Medicine, Imperial College London, London, UK

Search for more papers by this author
Filipa Marcelo

Filipa Marcelo

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Search for more papers by this author
Carlos M. G. A. Fontes

Carlos M. G. A. Fontes

CIISA - Faculdade de Medicina Veterinária, Universidade de Lisboa, Avenida da Universidade Técnica, Lisboa, Portugal

NZYTech Genes & Enzymes, Campus do Lumiar, Estrada do Paço do Lumiar, Edifício E, Lisboa, Portugal

Search for more papers by this author
Eurico J. Cabrita

Corresponding Author

Eurico J. Cabrita

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Correspondence

A. L. Carvalho and A. S. Palma, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

Tel: +351 212948300

E-mails: [email protected] (ALC); E-mail: [email protected] (ASP)

E. J. Cabrita, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

Tel: + 351 212948358

E-mail: [email protected]

Search for more papers by this author
Angelina S. Palma

Corresponding Author

Angelina S. Palma

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Glycosciences Laboratory, Department of Medicine, Imperial College London, London, UK

Correspondence

A. L. Carvalho and A. S. Palma, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

Tel: +351 212948300

E-mails: [email protected] (ALC); E-mail: [email protected] (ASP)

E. J. Cabrita, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

Tel: + 351 212948358

E-mail: [email protected]

Search for more papers by this author
Ana Luísa Carvalho

Corresponding Author

Ana Luísa Carvalho

UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Caparica, Portugal

Correspondence

A. L. Carvalho and A. S. Palma, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

Tel: +351 212948300

E-mails: [email protected] (ALC); E-mail: [email protected] (ASP)

E. J. Cabrita, UCIBIO, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal

Tel: + 351 212948358

E-mail: [email protected]

Search for more papers by this author
First published: 03 December 2019
Citations: 8
Diana O. Ribeiro and Aldino Viegas contributed equally

Abstract

Understanding the specific molecular interactions between proteins and β1,3-1,4-mixed-linked d-glucans is fundamental to harvest the full biological and biotechnological potential of these carbohydrates and of proteins that specifically recognize them. The family 11 carbohydrate-binding module from Clostridium thermocellum (CtCBM11) is known for its binding preference for β1,3-1,4-mixed-linked over β1,4-linked glucans. Despite the growing industrial interest of this protein for the biotransformation of lignocellulosic biomass, the molecular determinants of its ligand specificity are not well defined. In this report, a combined approach of methodologies was used to unravel, at a molecular level, the ligand recognition of CtCBM11. The analysis of the interaction by carbohydrate microarrays and NMR and the crystal structures of CtCBM11 bound to β1,3-1,4-linked glucose oligosaccharides showed that both the chain length and the position of the β1,3-linkage are important for recognition, and identified the tetrasaccharide Glcβ1,4Glcβ1,4Glcβ1,3Glc sequence as a minimum epitope required for binding. The structural data, along with site-directed mutagenesis and ITC studies, demonstrated the specificity of CtCBM11 for the twisted conformation of β1,3-1,4-mixed-linked glucans. This is mediated by a conformation–selection mechanism of the ligand in the binding cleft through CH-π stacking and a hydrogen bonding network, which is dependent not only on ligand chain length, but also on the presence of a β1,3-linkage at the reducing end and at specific positions along the β1,4-linked glucan chain. The understanding of the detailed mechanism by which CtCBM11 can distinguish between linear and mixed-linked β-glucans strengthens its exploitation for the design of new biomolecules with improved capabilities and applications in health and agriculture.

Database

Structural data are available in the Protein Data Bank under the accession codes 6R3M and 6R31.

Abbreviations

  • CAZy
  • Carbohydrate-Active enZymes Database
  • CAZymes
  • Carbohydrate-Active enZymes
  • CBMs
  • carbohydrate-binding modules
  • Ct
  • Clostridium thermocellum
  • CtCBM11
  • C. thermocellum family 11 CBM
  • DP
  • degree of polymerization
  • ESI-CID-MS/MS
  • tandem electrospray mass spectrometry with collision-induced dissociation
  • Mixed-linked glucans
  • β1,3-1,4-mixed-linked d-glucans
  • NGL
  • neoglycolipid
  • PDB
  • Protein Data Bank
  • Introduction

    Plant cell walls are composed of structurally diverse and complex polysaccharides presenting many biological and biotechnological applications [1-4]. The β1,3-1,4-mixed-linked glucan polysaccharides (or mixed-linked glucans) are unevenly distributed across the plant kingdom but are abundant in the cell walls of most Poaceae members. These include the endosperm of cereals and grasses, which are of considerable economic importance as storage tissues [5-7]. Mixed-linked glucans are also found in the walls of algae, pathogenic fungi and lichen-forming ascomycete symbionts. These glucans have several commercial and biotechnological applications and are of particular interest for the malting and brewing processes and bioenergy production [2, 8], as well as sources of dietary fibres with major health benefits [9]. These properties of β1,3-1,4-glucans make the study of their recognition by proteins of fundamental importance.

    Mixed-linked glucans are composed by a linear chain of 2–5 β1,4-linked d-glucopyranose residues separated by single β1,3-linkages [10]. The β1,4-linked residues form rigid regions, while the β1,3-linkages are flexible, creating links within the linear backbone chain [5, 11]. This results in an extended twisted conformation of the polysaccharide, which presents a unique binding surface for recognition by proteins [10, 11], including noncatalytic carbohydrate-binding modules (CBMs) [12] and glycoside hydrolases (GHs) [13]. In addition, the backbone incorporation of β1,3-linkages renders the polysaccharide much more soluble than cellulose.

    In recent years, enzymatic systems employed by cellulolytic microorganisms to efficiently hydrolyse the plant cell-wall polysaccharides have been gaining interest to reduce energy costs and avoid the usage of environmentally harmful chemical processes. One of these microorganisms is the thermophilic anaerobic bacterium, Clostridium thermocellum (C. thermocellum, Ct) [14]. This bacterium produces an extracellular modular multienzyme complex – the cellulosome – in which Carbohydrate-Active enZymes (CAZymes), incorporated in the complex, contact with their target substrates via appended CBMs. CBMs play a pivotal role in plant cell-wall biodegradation, highly potentiating the enzymes’ catalytic efficiency by bringing the adjoining catalytic modules into contact with their target polysaccharides [14, 15]. CBMs have been grouped into families based on sequence similarities, and an ever-growing number of 84 CBM families are presently compiled in the Carbohydrate-Active enZymes Database (CAZy) database (http://www.cazy.org/) [16]. Several CBMs are involved in the recognition of β1,3-1,4-glucans and, due to their diversity in the cellulosome, are excellent case studies to rationalize molecular recognition mechanisms that determine the specificity of mixed-linked glucan recognition in general [2, 4, 17-19]. An archetypal example is the family 11 CBM (CtCBM11) of the C. thermocellum Lic26A-Cel5E, an enzyme that contains GH5 and GH26 catalytic domains that display β1,4- and β1,3-1,4-mixed-linked endoglucanase activity [12].

    In previous work, we demonstrated that CtCBM11 exhibited a preference for β1,3-1,4-mixed-linked glucans and lower affinity for β1,4-linked glucans [12, 20]. The three-dimensional structure of CtCBM11 [Protein Data Bank (PDB) 1v0a], together with mutagenesis studies, revealed a typical type B CBM with a β-sandwich fold with a concave side forming a putative single binding cleft that could accommodate β1,3-1,4- and β1,4-linked glucans [12]. Amino acid residues Tyr22, Tyr53 and Tyr129, located in the putative binding cleft, were identified as playing a central role in the recognition of the ligands [12, 21]. Interaction studies using STD-NMR with the β1,4-linked cellohexasaccharide showed that CtCBM11 interacts preferably with the central four glucose units, mainly through interactions with internal positions 2 and 6 of the glucose rings [21, 22]. Overall, these studies suggested that CtCBM11 contains four binding subsites (Fig. 1), with the carbohydrate reducing end always facing the same side of the protein (subsite 1). The approximately four times higher affinity for the mixed-linked tetrasaccharide G4G4G3G, when compared to β1,4-linked cellotetrasaccharide, suggested that CtCBM11 displays a preference for a β1,3-linked glucose in at least one of the four subsites.

    Details are in the caption following the image
    Top view on the identified binding site of wild-type CtCBM11. Analysis of the crystal structure of unbound CtCBM11 (PDB 1v0a) [12], together with mutagenesis and interaction studies using ITC, NMR and molecular docking allowed to pinpoint the protein’s binding site (englobed by the β-strands in orange) and identify some key residues involved in ligand recognition (e.g. Tyr22, Tyr53, Asp99, Arg126, Tyr129, Asp146 or Tyr152, here represented as sticks and depicted with yellow carbon atoms) [12, 21, 22]. The polypeptide chain of CtCBM11 is depicted in white ribbon, with stretches Tyr53-Ser59, Arg86-Ser93, Asp99-Ser106, Arg125-Tyr129, Asn144-Tyr152 coloured in orange and numbered. The individual glucose binding subsites are schematized as transparent grey circles, numbered from 1 to 4. Subsite 1 accommodates the carbohydrate reducing end [21]. Calcium atoms are represented as green spheres. Image generated using UCF Chimera [60].

    In recent studies, the ability of CtCBM11 to bind to β1,4- and with higher affinity β1,3-1,4-linked glucans has been exploited for its use as a tool for the biotransformation of lignocellulosic materials. Fonseca-Maldonado et al. [23] investigated the β1,3-1,4-glucanase activity of a chimeric Bacillus subtilis endo-β1,4-glucanase after exchanging its CBM3 domain by CtCBM11, which resulted in an increase in the hydrolytic efficiency of the enzyme towards β1,3-1,4-glucans. Cattaneo et al. [24] have designed a chimeric protein by adding CtCBM11 to the C terminus of a hyperthermostable endoglucanase from Dictyoglomus turgidum (Dtur CelA). The resulting chimeric enzyme displayed enhanced stability at extreme pHs, with higher affinity and activity on insoluble cellulose. Furthermore, Furtado et al. [25] combined directed protein evolution and phage display approaches to obtain engineered CtCBM11 mutants that would exhibit high affinity to xyloglucans.

    In the present work, an integrated approach combining carbohydrate microarrays, NMR, X-ray crystallography, site-directed mutagenesis and ITC was conducted to extend the knowledge on the molecular determinants that enable CtCBM11 to distinguish between linear and mixed-linked β-glucans. The results now reported demonstrate the preference of CtCBM11 for mixed-linked glucans via a conformation–selection mechanism, in which CH-π stacking and hydrogen bonding interactions contribute to the specific ligand chain conformation and orientation in the binding cleft. The optimal conformation is achieved by having a β1,3-linkage at the reducing end of the saccharide, while the central units are linked by β1,4-glycosidic linkages. Ultimately, the structural and affinity data confirmed the sequence G4G4G3G as the minimum binding epitope and evidenced that recognition by CtCBM11 is not only dependent on the ligand chain length and the β1,3-linked glucose in the reducing end, but also on its specific position between the β1,4-linked glucose units.

    Results and Discussion

    Our previous studies identified Tyr22, Tyr53, Asp129, Arg126, Asp128, Tyr129 and Asp146 as key residues in ligand recognition by CtCBM11 [12, 21, 22] and suggested that the binding cleft contained four binding subsites (Fig. 1), with a preference for a β1,3-linked glucose residue in at least one of those subsites. The unique binding properties of CtCBM11 and the increasing interest in its application as a biotechnological tool led us to carry out further studies to characterize its selectivity for the β1,3-linked glucose and to elucidate the molecular determinants of the specificity towards mixed-linked glucans. The representative sequences of β1,4- and β1,3-1,4-mixed-linked gluco-oligosaccharides investigated for the interaction studies with CtCBM11 are depicted in Table S1.

    Specificity assignment using carbohydrate microarrays

    To resume carbohydrate-binding specificity at oligosaccharide level, the CtCBM11 was first analysed using a carbohydrate microarray comprised of diverse sequence-defined gluco-oligosaccharides prepared as neoglycolipid (NGL) probes [20]. The oligosaccharides encompassed different chain lengths [from degree of polymerization (DP)-2 up to DP-16] and linear or branched sequences with α- or β-configurations (Fig. 2). Carbohydrate sequence information of these probes is in Table S1.

    Details are in the caption following the image
    Analysis of carbohydrate-binding specificity using a microarray of sequence-defined gluco-oligosaccharides. (A) CtCBM11 was analysed using serial dilutions at the indicated concentrations; (B) CmCBM6-2 was analysed as a positive control. The validated microarray encompassed 153 gluco-oligosaccharide probes prepared as NGLs [20]. The DP and glucose linkages are indicated on top of the coloured panels. Some relevant carbohydrate probe sequences are highlighted for binding to CtCBM11 in panel A; G, glucose; AO, NGLs prepared from reducing oligosaccharides by oxime ligation with an aminooxy (AO) functionalized lipid DHPE (1,2-dihexadecyl-sn-glycero-3-phosphoethanolamine) [61]. The sequence information on the oligosaccharide probes is depicted in Table S1. The binding signals are means of fluorescence intensities of duplicate spots at 5 fmol of oligosaccharide probe arrayed (with error bars) and are representative of at least two independent experiments.

    CtCBM11 showed a narrow binding profile, exhibiting strong binding to barley-derived β1,3-1,4-mixed-linked oligosaccharides(DP-7 to DP-16, probes 111-120; Fig. 2A) and displaying only a weak binding to β1,4-linked cellooligosaccharides (DP-9 to DP-13, probes 84-88). The binding of CtCBM11 contrasted with the broad β-glucan binding profile observed with CBM6-2 of Cellvibrio mixtus (CmCBM6-2) used as a control protein (Fig. 2B), in accord with the reported specificity attributed to its two binding clefts [20, 26]. The serial dilution of CtCBM11 concentration highlighted its specificity for β1,3-1,4-mixed-linked glucose sequences (Fig. 2), in agreement with our previous carbohydrate microarray data [20]. The observed oligosaccharide chain length dependency for CtCBM11 binding agrees with the current knowledge on type B CBMs. These CBMs bind the carbohydrate chains internally (endotype), hence requiring a minimum chain length for the recognition event to take place.

    For immobilization on the array surface, the oligosaccharides were conjugated via the reducing end glucose to an aminooxy-functionalized lipid by oxime ligation [20]. Although the oxime-linked NGLs have a significant proportion of the lipid-linked monosaccharide core in a ring-closed form, the conjugation and presentation in the microarray may have hindered access of the CBM to the binding epitope presented in short mixed-linked oligosaccharides with a 3-linkage at the reducing end. This would explain the lack of binding to the mixed-linked tetrasaccharide G4G4G3G (probe 103; Fig. 2A), for which high affinity was previously reported [12], and the weak binding was observed to the pentasaccharide G4G4G4G3G (probe 107) and to the hexasaccharide G4G3G4G4G3G (probe 109). These results, together with the binding to the cellooligosaccharides, where binding was not detected to probes shorter that DP-9, hinted that both the sequence of β1,4-linkages adjacent to a β1,3-linked glucose and the chain length are important for ligand recognition by this CBM. The higher binding intensities observed to the mixed-linked heptasaccharide G4G4G3G4G4G3G and longer chain probes (probes 111-120), where the sequence G4G4G3G is preserved for interaction, suggested this tetrasaccharide as the minimum epitope recognized by CtCBM11.

    Ligand epitope mapping by STD-NMR

    To define the binding epitope and understand the influence of the β1,3-linked glucose position at the molecular level, CtCBM11–ligand interactions were studied in solution by saturation transfer difference NMR (STD-NMR) spectroscopy. This technique provides information on the atoms of the ligand that are closer to the protein upon binding (epitope mapping) [27]. The interaction of CtCBM11 was studied with the β1,4-linked cellotetrasaccharide (G4G4G4G) and two β1,3-1,4-mixed-linked tetrasaccharides (G4G3G4G and G4G4G3G) (Fig. 3). In all cases, the overall shape of the tetrasaccharides is fairly linear [28] and, for the β1,4-linked cellotetrasaccharide, the CH2OH groups alternate sides in consecutive glucose units in the oligosaccharide chain (Fig. 3D). The introduction of a β1,3-glycosidic bond alters this disposition of the CH2OH groups, due to relative orientation of O3 and O4 with respect to the anomeric O1 (in beta configuration). O3 and O1 point to the same face of the sugar ring in a 4C1 conformation, while O4 and O1 point to opposite faces; thus, the shift of a β1,4- to a β1,3-bond causes a rotation of the sugar ring of about 180° (Fig. 3D) and positions the hydroxymethylene group of the consecutive glucose unit in the same direction. This occurs without a significant change in the interglycosidic torsion angles (φ and ψ) [28].

    Details are in the caption following the image
    STD-NMR spectra of the interaction of CtCBM11 with the selected ligands. (A) cellotetrasaccharide (G4G4G4G); (B) mixed-linked tetrasaccharide G4G3G4G; (C) mixed-linked tetrasaccharide G4G4G3G. Bottom and top spectra are the reference 1H and the corresponding STD-NMR spectra, respectively. Depicted above the spectra are the representative sequences of β1,4- and β1,3-1,4-mixed-linked gluco-oligosaccharides investigated, as well as the corresponding epitope maps as calculated by the ASTD value (dark orange: 76–100%; orange: 51–75%; pale orange: 31–50%; yellow: 10–30%) – see Tables S4-S6 for all measured ASTD values. The position of the β1,3-linkage is indicated by a black arrow. Glucose residues are labelled from A (reducing end) to D, as recommended by IUPAC-IUB JCBN (1983) [62]. (D) Crystal structures of bound G4G4G3G (top – PDB 6R3M, in this paper) and G4G4G4G (bottom – PDB 3AMM [63]). The presence of the β1,3-glcyosidic linkage causes a rotation of the consecutive glucose unit of about 180°, positioning the hydroxymethylene group (transparent grey circles) in the opposite direction.

    Saturation transfer to the ligand was observed in all STD-NMR experiments (Fig. 3), which indicated that all oligosaccharides are bound to the protein [27]. Moreover, the peak pattern in the STD-NMR spectra for the different oligosaccharides (glucose residues affected and the intensity of their peaks) was quite different. The STD-NMR results showed that the introduction and position of the β1,3-glycosidic bond had a significant influence on the interaction of the ligand with CtCBM11, which was clearly revealed in the STD-NMR-derived epitope mapping for each ligand (Fig. 3 and Tables S3-S5).

    For the cellotetrasaccharide(G4G4G4G; Fig. 3A), the STD-NMR amplification factor(ASTD) showed the maximum relative intensity for protons H2, H3, H4 and H5 of the central glucose units, meaning that these protons were the ones closer to the protein upon binding. All other protons have ASTD values between 20% and 31% (Fig. 3A). Unfortunately, due to signal overlapping it was not possible to clearly distinguish the individual proton contributions. However, in general, all glucose residues showed some degree of saturation, indicating that the whole molecule was in contact with the protein.

    The rotation of the glycosidic bond, caused by the introduction of a β1,3-linkage in the second position of the tetrasaccharide (G4G3G4G), was accompanied by a change in the recognition pattern detected in the STD-NMR (Fig. 3B). For this ligand, the highest relative ASTD was observed for protons located at the nonreducing end (protons H2 and H4), followed by protons H3 and H5 of the same residue and protons H2, H4 and H5 of Glc2. The reducing end protons showed very little interaction (10% or less), indicating that they were relatively far away from the protein when the complex was formed. When compared to the cellotetrasaccharide, the introduction of the β1,3-glycosidic bond at this position and the consequent change in the glycosidic torsion angle seem to be responsible for a major shift in the position of the oligosaccharide inside the binding cleft, resulting in a decrease in the number of contacts with the protein.

    The analysis of the STD-NMR data for the G4G4G3G ligand (Fig. 3C) revealed another change in the epitope when compared to G4G3G4G due to the different position of the β1,3-glycosidic bond. In this case, the interaction pattern was very similar to that of cellotetrasaccharide (Fig. 3A), indicating that they should bind to CtCBM11 in a similar way. The main difference was observed at the reducing end of the tetrasaccharides. As before, due to the presence of the β1,3-linkage, the reducing end unit was rotated by approximately 180°, leaving the central core conformation (which seems to be the most important factor for binding) almost unmodified. Similar to the cellotetrasaccharide, the highest ASTD values were found for the central (β1,4-linked) glucose residues, showing that these were the ones closer to the protein upon binding. It is noteworthy that in the case of the G4G4G3G ligand, there was no significant signal overlapping between the signals of protons H2 of these residues, and only H2 of Glc3 displayed a significant ASTD value. When comparing the STD results for the tetrasaccharides G4G4G4G and G4G4G3G, the only observable differences occurred at the reducing end. While for the cellotetrasaccharide the most affected proton was H2, for the mixed-linked G4G4G3G, the most affected protons are the ones from the methylene group. This may be a consequence of the rotation of this glucose residue as imposed by the β1,3-glycosidic bond.

    The binding epitopes derived from the STD-NMR data are a clear indication that the CtCBM11 binding cleft is able to discriminate between different types of glycosidic bonds (hence, different carbohydrate structures) and point to a very specific recognition mechanism in which the structure of the ligand in solution is the main determinant of the binding event. This is in agreement with our previous results [21, 22], which pointed to a conformation–selection mechanism of ligand recognition and binding for CtCBM11. Taken together with our previous data [12, 21, 22], the STD-NMR data indicate that the binding cleft of CtCBM11 is tuned to bind at least 4 glucose units, displaying a preference for a β1,4-glycosidic bond at the central part of the ligand. The presence of a β1,3-glycosidic bond at the central part of the ligand led to a decrease in the number of contacts in the STD-NMR, which most probably was due to a change in the oligosaccharide’s shape. When the β1,3-glycosidic bond is present at the reducing end, the binding pose did not seem to be affected. This was probably due to a better fit of the reducing end of G4G4G3G driven by the change in conformation imposed by introduction of the β1,3-bond at the reducing end (see discussion at section: CtCBM11 binding mode). The observed results are in good agreement with the data obtained using carbohydrate microarrays and explain the higher affinity measured by ITC for this ligand when compared with the cellotetrasaccharide, supporting the view that the protein displays a preference for a β1,3-linkage in subsite 1 (see discussion at section: The CH-π stacking and hydrogen bonding network as determinants of the ligand specificity).

    Protein interaction surface mapping by 1H-15N-HSQC

    To determine which of the protein residues were more perturbed by the interaction with the ligands, 1H-15N-HSQC spectra of 15N-labelled CtCBM11 were recorded upon titrations with the tetrasaccharides G4G4G3G and G4G3G4G (Fig. 4). To better represent the distribution of affected and nonaffected residues, the combined chemical shift perturbation, Δδcomb, was calculated and a cutoff line [29] was determined to select the relevant residues. As represented in Fig. 4, several protein amide protons substantially changed their chemical shifts upon addition of increasing amounts of the ligands (Fig. 4A,D). The mapping of these residues onto the protein surface identified the location of the binding cleft (Fig. 4C,F). The observed effects on the chemical shifts indicated that the interaction with both ligands was fast in the NMR timescale and that the variations in chemical shifts can be used to determine the equilibrium dissociation constants [29, 30]. Arg126, Y129 and D146, which were identified by mutagenesis and ITC studies as a key residue for ligand recognition [12], were above the cutoff line and were used for the determination of the NMR-derived dissociation constants using Eq. 4 (Fig. 4B,E). The obtained values were averaged for a more precise measure (Table S6), and for better comparison with the ITC data (Table S11), the dissociation constant (KD) obtained was converted into association constant (Ka). The determined Ka values for G4G4G3G and G4G3G4G were 5.14 (±0.74) ×105 m−1and 1.81 (±0.92) ×104 m−1, respectively, and were in good agreement with the ITC results (see discussion at section: The CH-π stacking and hydrogen bonding network as determinants of the ligand specificity). For the cellotetrasaccharide, previous studies by ITC and NMR yielded Ka values of 4.4 (± 0.8) ×104 m and 2.33 (± 0.56) ×104 m, respectively [12, 21].

    Details are in the caption following the image
    CtCBM11 backbone amide chemical shift variations upon binding and determination of the dissociation constant (KD). (A, D) 1H, 15N-HSQC NMR spectra of 15N-labelled CtCBM11 in the presence of increasing amounts of G4G3G4G and G4G4G3G, respectively. The concentration of protein was maintained at 300 µm, and the concentration of ligand varied from 0 to 2.5 and from 0 to 5.7 equivalents for G4G3G4G and G4G4G3G, respectively. (B, E) Plots of the combined chemical shifts (Δδcomb) against the ligand concentration for the three residues selected for the calculation of the dissociation constant (KD), according to eq. 3. (C,F) Interaction between CtCBM11 and G4G3G4G and G4G4G3G, respectively. (*) – residues that disappear during the titration. The residues that show significant chemical variations are depicted in red in the surface of the structure of CtCBM11.

    Comparison of the histograms in Fig. 4 revealed a similar profile in the type and number of residues perturbed by the two tetrasaccharides. This indicated that these oligosaccharides interacted in the same location and mostly with the same residues of the protein and that the difference in affinity must be due to differences in the network of contacts with the protein induced by the different geometry of the oligosaccharide chain.

    When compared to G4G4G3G, the tetrasaccharide G4G3G4G (Fig. 4B,E) had a lower affinity and displayed a very different profile in the STD-NMR, where a strong interaction was only detected for the glucose residue at the nonreducing end (Fig. 3B,C). However, more than 40 residues of CtCBM11 were affected upon its binding (Fig. 4C). A closer look at the STD-NMR data (Fig. 3B) showed that all glucose residues of the tetrasaccharide G4G3G4G received some degree of saturation, meaning that these were still in contact with the protein and therefore perturbed the protein residues, but in general, it did not fit as tightly as G4G4G3G in the binding cleft. Although for G4G4G3G the affected residues were essentially the same as for G4G3G4G (Fig. 4C,F), the analysis of the map of the affected residues on the surface of CtCBM11 revealed that in the former these are slightly less dispersed in relation to the centre of the binding cleft. According to the previous interpretation, this might indicate that the ligand is better accommodated and centred in the cleft, thus establishing stronger interactions with the protein, in accordance with the STD-NMR results. This would also account for the higher affinity of G4G4G3G when compared to G4G3G4G.

    Crystal structure of CtCBM11 bound to β1,3-1,4-gluco-oligosaccharides

    To obtain atomic detail on the CtCBM11–ligand interactions that promote the recognition of mixed-linked β-glucans and the preference for the β1,3-linked glucose, the crystal structures of CtCBM11 were determined in complex with mixed-linked oligosaccharides featuring a β1,3-linkage at the reducing end (tetrasaccharide G4G4G3G) and both at the reducing end and at an internal position (hexasaccharide G4G3G4G4G3G). The linkages and sequence were determined by negative-ion tandem electrospray mass spectrometry with collision-induced dissociation (ESI-CID-MS/MS) sequencing, as previously reported [20].

    The bound structures of CtCBM11 were solved at a resolution of 1.45 and 2.6 Å for complexes with G4G4G3G (PDB 6R3M) and G4G3G4G4G3G (PDB 6R31), respectively (Fig. 5). Statistics of data processing and model refinement and validation are presented in Tables S7 and S8. Both structures presented a classical distorted β-jelly roll fold already revealed for the unbound CtCBM11 [12], consisting of two six-stranded antiparallel β-sheets, which form a convex side and a concave side that constitutes the binding cleft where each ligand is accommodated. In the CtCBM11-G4G4G3G structure, the anomeric carbon of Glc1 was observed in both α and β configurations, as supported by residual mFo-DFc electron density map (Fig. 5C), and was modelled in both anomers. The overall fold of the two bound structures was similar to the fold of native CtCBM11 (PDB 1v0a) [12], with a rmsd value of 0.435 Å over 141 Cα atoms, for the tetrasaccharide-bound structure, and 0.509 Å over 150 Cα atoms, for the hexasaccharide-bound structure (Fig. 6). The high similarity between the free and bound conformations of CtCBM11 is in good agreement with the relaxation and internal mobility data obtained previously by NMR [21] that showed only minor dynamical variations upon cellotetrasaccharide binding. This is consistent with a rigid protein backbone that selects a defined oligosaccharide conformation; that is, CtCBM11 recognizes its ligands by a conformation–selection mechanism.

    Details are in the caption following the image
    Ribbon representation of the three-dimensional crystal structures of CtCBM11 complexes. Representation of the overall structure of CtCBM11 in complex with the ligands, exhibiting the typical distorted β-barrel conformation. Left panel – CtCBM11-G4G4G3G(PDB 6R3M). Right panel – CtCBM11-G4G3G4G4G3G (PDB 6R31); (A, B) Cartoon and surface representation of the CtCBM11 complexes; the concave side of CtCBM11 forms the binding cleft where the ligands are accommodated; (C) Initial mFo-DFc electron density maps for the complexes of CtCBM11 with G4G4G3G and G4G3G4G4G3G calculated in the absence of ligand and with resolutions of 1.45 and 2.60 Å, respectively. The ligands are overlaid in the picture for reference. The electron density maps are shown in green mesh, contoured at 2.5 σ; calcium ions are indicated as green spheres. Images generated using UCF Chimera [60].
    Details are in the caption following the image
    Comparison of unbound and ligand-bound CtCBM11 structures. Superposition of CtCBM11-unbound structure (PDB 1v0a) (orange) with the bound structures of CtCBM11-G4G4G3G (PDB 6R3M) (grey) and CtCBM11-G4G3G4G4G3G (PDB 6R31) (blue). Alignment was performed using MatchMaker tool from UCF Chimera [60], with an rmsd value of 0.435 and 0.509, respectively.

    CtCBM11 binding mode

    The identified residues that constitute the binding cleft of CtCBM11 are solvent-exposed and interact with the ligands through hydrophobic CH-π stacking interactions, hydrogen bonds and van der Waals contacts (Fig. 7, and Tables S9 and S10). The ligand G4G4G3G interact with the CBM by direct hydrogen bonds of the equatorial OH groups of all the 4 glucose monomers with residues Tyr152, Arg126, Asp99 and Asp146, as well as water-mediated contacts with residues Asp51, Glu25, Tyr22, Tyr53, Tyr129, His149, Ser147 and Ser59 (Fig. 7A and Table S9). For the hexasaccharide G4G3G4G4G3G ligand, the same direct hydrogen bonds were observed, although, due to the lower resolution of the hexasaccharide complex, no water-mediated hydrogen bonds were identified as the water molecules could not be unequivocally modelled (Fig. 7B and Table S10). The CH-π stacking interactions between residues Tyr22, Tyr53 and Tyr129 with the glucose rings at the centre of the cleft (Glc2 and Glc3) were evident, which validated our previous models using computational studies (molecular docking and molecular dynamics) [22] and confirmed these residues to play a key role by guiding and stabilizing the ligand chain for recognition by CtCBM11.

    Details are in the caption following the image
    CtCBM11–ligand interactions. Close-up view on the CtCBM11 binding site, evidencing the protein–ligand contacts between the CBM and (A) the tetrasaccharide G4G4G3G and (B) the hexasaccharide G4G3G4G4G3G, as listed in Tables S9 and S10. The carbohydrate chains and the side chains of the amino acid residues inside the binding cleft that interact with the ligands are shown as stick models. Water molecules are represented by red spheres and calcium ion as green sphere. Hydrogen bonding is indicated by dashed lines, and CH-π stacking interactions are represented as double arrows. Image generated using UCF Chimera [60].

    The structures of the bound CtCBM11 provided clear evidence for a conformation–selection mechanism. While the central β1,4-linked glucose units appear to be pivotal for CtCBM11 recognition through CH-π stacking with the tyrosine residues, the flanking β1,3-linked glucose at the reducing end (Glc1) seems to impose a specific ligand chain conformation and, consequently, its orientation in the binding cleft. This is probably due to the hydrogen bond between Asp146 and the OH of the Glc1 methylene group (Fig. 7). If, at this position, a β1,4-glycosidic bond was present instead of the β1,3-glycosidic bond (as in the case of cellotetrasaccharide), the glucose ring would be in a different orientation, with the CH2OH group rotated by about 180° (Fig. 3D). Although in this conformation a hydrogen bond is still possible between the OH group of carbon 2 and Asp146, the OH group would sit further away from Asp146 and the hydrogen bond would be weaker, thus explaining the lower affinity to β1,4-linked oligosaccharides. This is in very good agreement with the specificity observed in carbohydrate microarrays (Fig. 2) and with the STD-NMR data (Fig. 3) that showed that for the cellotetrasaccharide, the most affected proton of Glc1 was H2, whereas for the mixed-linked tetrasaccharide G4G4G3G, the methylene protons were the ones showing more saturation.

    The superposition of the bound structures highlighted that the Glc2 and Glc3 stacked by the tyrosine residues were almost completely coincident (Fig. 6), which provides further evidence for the importance of the positioning of these two β1,4-linked monosaccharides at the central subsites 2 and 3 (Fig. 1). Comparing the two structures, the Glc5 and Glc6 of the hexasaccharide were mostly exposed to the solvent, not establishing significant contacts with the protein residues (Fig. 6), other than the direct hydrogen bond between Glu25 and the CH2OH group of the β1,3-linked Glc5 (Fig. 7B and Table S10). This observation provides evidence for the major contribution of subsites 1-3 in CtCBM11 binding and confirms the sequence G4G4G3G as a minimum binding epitope, whereas a second β1,3-linked glucose (putative subsite 5) may affect affinity or ligand specificity. Superimposing also the unbound structure (PDB 1v0a) [12] (which exhibited the C terminus residues of a symmetry-related molecule in the binding cleft) showed that the residues previously identified in the binding cleft to interact with the C terminus tail were coincident with the ones now identified to be responsible for the ligand stabilization (Fig. 6). The majority of these residues suffered only minimal changes in the bound CtCBM11 structures, in accordance with a conformation–selection model mechanism.

    The CH-π stacking and hydrogen bonding network as determinants of the ligand specificity

    The structural data obtained by NMR and X-ray crystallography allowed not only the identification of key residues involved in CtCBM11 binding, but also the structural features of the oligosaccharide ligands that were able to modulate binding. With this structure-based rationale, mutant alanine derivatives of residues involved in direct hydrogen bonds with the ligand (Ser59, Asp99, Arg126, Asp146) were produced to analyse influence of hydrogen bonding on CtCBM11 binding affinity towards different carbohydrates (polysaccharides and oligosaccharides), for which chain length and the presence and position of β1,3-linkages varied (Fig. 8 and Table S11).

    Details are in the caption following the image
    Representative isothermal calorimetry titrations of binding of CtCBM11 and its mutants to oligosaccharides. The top portion of each panel shows the raw power data, while the bottom parts show the integrated and heat of dilution corrected data. The solid lines show the nonlinear curve fits to a one site binding model with the stoichiometry fixed at 1.

    In agreement with the structural data, the comparison of the binding affinity of wild-type CtCBM11 (WTCtCBM11) to the three tetrasaccharides analysed (G4G4G4G, G4G3G4G and G4G4G3G) showed that the marked affinity effect occurred when introducing the β1,3-glycosidic bond at the central part of the ligand (i.e. for ligand G4G3G4G). When compared with the cellotetrasaccharide (G4G4G4G), this modification caused a decrease in the affinity of about 4.4-fold. Inversely, when placing the β1,3-bond at subsite 1 (G4G4G3G) the increase in the affinity was about 2.4-fold, supporting the preference of CtCBM11 for a β1,3-linkage at the reducing end.

    The CtCBM11 Arg126Ala mutant bound to β-glucans with a 100-fold lower affinity than wild-type CtCBM11 (Table S11). This corroborated what was observed in the CtCBM11-G4G4G3G structure, where atoms Nη1 and Nη2 of Arg126 are hydrogen-bonded to the O3 and O2 atoms, respectively, of the glucose residue located at subsite 3. Thus, the two hydrogen bonding contacts of Arg126, together with the CH-π stacking with the tyrosines, are fundamental for holding the ligand. The affinity of the Asp99Ala and Asp146Ala mutants for β-glucan and the oligosaccharides tested was reduced by approximately 4-fold and 10-fold, respectively. While Asp146 is hydrogen-bonded to the OH group of the methylene group from the glucose residue at subsite 1, Asp99 established polar contacts with O6 of the glucose residue located at subsites 2 and O4 of the glucose residue located at subsites 1 and 2 (Fig. 7). The cumulative effect of the double mutation Asp99Ala/Asp146Ala resulted in a similar trend, although leading to an overall lower affinity (Table S11). This suggested that these residues are equally important for binding both β1,3-1,4-mixed-linked and β1,4-linked glucans. Furthermore, the hydrogen bond interactions of Asp146 may also contribute to the higher affinity observed towards G4G4G3G. The β1,3-glycosidic linkage brings the CH2OH group of Glc1 in closer proximity to the side chain of Asp146 when compared with a β1,4-bond in the same position as observed in the structure of the complex (Fig. 7), making a stronger hydrogen bond. As such, a β1,3-glycosidic linkage towards the reducing end of the oligosaccharide is preferred. Replacement of Ser59 by an Ala led also to a significant loss in binding affinity, which corroborated the disruption of an important hydrogen bond established with the endocyclic oxygen of Glc1. The cumulative effect of Ser59Ala/Asp146Ala results in an almost complete loss of binding to both β-glucan and hydroxyethyl cellulose (HEC), highlighting the importance of subsite 1 for substrate recognition. The CtCBM11 Val57Ala mutation, which was produced to assess the influence of hydrophobic interactions at subsite 1, showed no significant effect in the binding affinity to the mixed-linked ligands (Table S11). This result highlights that the major contributions of subsite 1 for CtCBM11 binding are mediated by hydrogen bonding interactions.

    The hexasaccharide complex showed Glu25 making a hydrogen bond to the CH2OH group of the β1,3-linked Glc5 (Fig. 7B and Table S10). As a β1,4-linkedGlc5 would have its CH2OH group facing away from Glu25 and Asp51, these two residues could play an important role in CtCBM11 preference for β1,3-1,4-mixed-linked over β1,4-linked glucans. Glu25Ala and Asp51Ala mutants were produced to test this hypothesis. Although there was a slight decrease in the ability of both mutants to bind β-glucan, the affinity for HEC was also affected. This means that although important for binding to the ligand, this putative subsite 5 does not seem to be key for substrate specificity.

    In summary, the structural and affinity data demonstrate the contribution of CH-π stacking and hydrogen bonding interactions for specific ligand chain conformation and orientation in the binding cleft are determinant for the specificity of CtCBM11 towards mixed-linked β-glucans. The conformational change in the orientation of the glucose residues by the introduction of a β1,3-glycosidic bond leads to key hydrogen bonds with Asp 146 and Ser 59 (subsite 1) and Asp99 (subsites 1 and 2), which have a direct impact on CtCBM11 specificity and on the affinity displayed towards the different ligands. The data also show evidence that the central part of the oligosaccharide (the residues that bind at subsites 2 and 3) must be planar (β1,4-linked), in order to take full advantage of the CH-π stacking interactions with tyrosine residues 22, 53 and 129, and hydrogen bonding with Arg126 (subsite 3). The hydrogen bonding mediated by Glu25 and Asp51 (subsites 4 and 5) contributes to increase the affinity to the ligands, but not to the specificity towards the mixed-liked glucans.

    CtCBM11 ligand specificity in the context of CAZy CBMs

    The analysis of conservation of the interacting protein residues by sequence alignment of six family 11 CAZy CBMs revealed that only Tyr129 was invariant, whereas Arg126 was conserved in five out of the six sequences (Fig. 9), which highlights the critical role of these residues in the ligand recognition by CAZY family 11 CBMs. In its turn, Tyr22, Tyr53, Tyr152 and Asp99 were conserved in only two of the six CBMs. However, the lack of conservation of other key residues involved in the ligand recognition by CtCBM11 is not totally unexpected as plasticity of specificities is often observed within type B CBM families.

    Details are in the caption following the image
    Alignment of CBM11 family members. Primary sequences aligned from Clostridium thermocellum (CtCBM11, P16218), Clostridium cellulolyticum (CcCBM11, P25472), Fibrobacter succinogenes (FsCBM11, C9RQE4), Streptomyces avermitilis (SaCBM11, Q82JP6), Kribbellaflavida (KfCBM11, D2PWV9), Salinispora tropica (StCBM11, A4X7P1) and Streptomyces bingchenggensis (SbCBM11, D7BY98). Identity to CtCBM11 is indicated with blue boxes. Residue numbers refer to the corresponding CBM11 sequence. The (*) identifies the CtCBM11 residues involved in the CH-π stacking of the oligosaccharide ligands, and the (x) identifies the residues that establish hydrogen bonds with the ligand. The sequence alignment was calculated with the program Clustal Omega [64], and the picture was produced with the program Jalview [65].

    In general, the ligand specificity of type B CBMs reflects the substrate specificity of the associated catalytic modules. CtCBM11 is comprised in the celH gene, which also encodes two functional catalytic domains, a GH from family 5 (GH5, Cel5E) and a second from family 26 (GH26, Lic26A). While Cel5E is a bifunctional β-1,4-endoglucanase/xylanase [31], Lic26A has lichenase activity, specific for β1,3-1,4-mixed-linked glucans, accommodating in its binding cleft substrates that comprise the G4G4G3G sequence [32]. The observed preference of CtCBM11 for a β1,4-glycosidic bond in the central part of the ligand and β1,3-linked glycosidic bond at a reducing end provides evidence that this CBM mimics the specificity of the associated GH26 mixed-linked endoglucanase.

    CBMs that bind β-glucan chains often display broad specificity recognizing β1,4-glucans, β1,3-1,4-mixed-linked glucans and xyloglucan, by targeting the β1,4-glucan backbone common to these polysaccharides. According to the information deposited in the CAZy database, besides family 11, CBMs from families 4, 6, 22, 28 [16], 46 [33] and 65 [34] have been reported to bind mixed-linked glucans. However, to our knowledge, only CtCBM11 has been described to have a more restricted binding specificity and affinity to mixed-linked glucans. It is noteworthy that CtCBM11 is the only CBM from family 11 found in C. thermocellum, which might point to a crucial role played by CtCBM11 in the metabolism of mixed-linked glucans of this cellulosome-expressing bacterium.

    In the context of the cellulosome, the structural details here revealed on the CtCBM11 ligand recognition site may influence the planning and development of efficient and low-cost mechanisms for the conversion of biomass into usable sources of energy, as well as into nutrients for animal feedstock. Additionally, the understanding, at the molecular level, of how CtCBM11 selects and binds its ligands may inspire the design of new biomolecules with improved capabilities to be explored in health and agriculture applications.

    Materials and methods

    Gene cloning, mutagenesis and protein purification

    Plasmid pAG1, a pET21a (Novagen, Darmstadt, Germany) derivative that encodes the family 11 CBM of Clostridium thermocellum (CtCBM11), was selected for these experiments [12]. Recombinant CtCBM11 generated by pAG1 contains a C-terminal hexahistidine tag. Site-directed mutants were generated using the NZYMutagenesis kit (NZYTech Ltd, Lisbon, Portugal) according to the manufacturer’s instructions using pAG1 as template. Primers used to generate the mutant DNA sequences are listed in Table S12. Recombinant sequences of all mutant plasmid derivatives were verified by sequencing to ensure that only the appropriate mutations were incorporated into the nucleic acids.

    To express CtCBM11 in Escherichia coli, the CtCBM11 encoding gene was constructed as described previously [12]. E. coli BL21 harbouring the CtCBM11 encoding gene containing a C-terminal His6 tag was cultured in Luria–Bertani (LB) containing 100 µg·mL−1 ampicillin at 37 °C until mid-exponential phase (A600 = 0.6), at which point isopropyl-β-d-thiogalactopyranoside was added to a final concentration of 1 mm. Cultures were then further incubated overnight at 30 °C. Cells were collected by centrifugation, and the cell pellet was resuspended in a 50 mm sodium HEPES buffer, pH 7.5, containing 1 m NaCl and 10 mm imidazole. CtCBM11 was purified by ion metal affinity chromatography. Fractions containing the purified complex were buffer-exchanged into Milli-Q water containing 2 mm CaCl2 and concentrated with Amicon 10-kDa molecular-mass centrifugal membranes to a final protein concentration of 40 mg·mL−1. For the expression of 15N-CtCBM11, the bacterial culture grew in M9 minimal medium at 37 °C containing 100 μg·mL−1 ampicillin, 15NH4Cl and regular glucose. All other steps were as described above.

    Sources of carbohydrates

    The soluble barley β-glucan, the cellooligosaccharides and the β1,3-1,4-mixed-linked tetrasaccharides were purchased from Megazyme International (Bray, Ireland). The HEC and lichenan were purchased from Sigma-Aldrich (St. Louis, MO, USA). For the NMR studies, the cellotetrasaccharide was obtained from Seikagaku Corporation (Tokyo, Japan). The barley hexasaccharide fraction used for X-ray crystallography was obtained as described [20] by enzymatic hydrolysis of barley β-glucan with a cellulase (Novozymes, Copenhagen, Denmark) and purified by repeated gel filtration chromatography on a Bio-Gel P4 column.

    Mass spectrometry analysis of β1,3-1,4-oligosaccharides

    Sequence analysis of β1,3-1,4-mixed-linked tetrasaccharides(G4G4G3G and G4G3G4G) and of the barley-derived hexasaccharide fraction used in the cocrystallization studies was carried out by negative-ion electrospray tandem mass spectrometry with collision-induced dissociation (ESI-CID-MS/MS) on a Synapt G2-S instrument (Waters, Manchester, UK), essentially as described [20]. Cone voltage was kept at 80 eV for MS and CID-MS/MS. For peudo-MS3 to encourage in-source fragmentation, the cone voltage was increased to 180 eV. Collision gas (Ar) was at a pressure of 7.3 × 10−3 mbar. The collision energy was between 15 and 17 eV for optimal fragmentation. The ESI-CID-MS/MS confirmed the sequences of the barley-derived tetrasaccharides and the hexasaccharide, showing that this fraction contained mainly the sequence Glcβ1, 4Glcβ1, 3Glcβ1, 4Glcβ1, 4Glcβ1, 3Glc (G4G3G4G4G3G), as reported previously [20].

    Carbohydrate microarray analysis

    The binding specificity of CtCBM11 was analysed using a carbohydrate microarray of ~ 150 gluco-oligosaccharide NGL probes prepared as previously described [20]. Carbohydrate sequence information of these probes is in Table S1.

    Microarray binding analyses were performed using Alexa Fluor 647-labelled Streptavidin for readout, essentially as described [20, 35]. CtCBM11 was analysed at a final concentration of 2, 10 or 20 µg·mL−1. The Cellvibrio mixtus family 6 CBM6-2 (CmCBM6-2) was included as a protein control and analysed at final concentration of 2 µg·mL−1. The CmCBM6-2 was provided by Harry Gilbert (University of Newcastle, UK) and contains an N-terminal His-tag [26]. Both CBMs were analysed precomplexed with mouse monoclonal anti-polyhistidine (Sigma-Aldrich, H1029) (Ab1) and biotinylated anti-mouse IgG antibodies (Sigma-Aldrich, B7264) (Ab2) at a ratio of 1 : 3 : 3 (by weight). The CBM–antibody complexes were prepared by preincubating Ab1 with Ab2 for 15 min at 20 °C, followed by addition of the CBMs and further incubation for 15 min. The mixture was then diluted to the final concentration of the CBMs in the blocking solution made of 1% BSA (Sigma-Aldrich, A8577) and 0.02% Casein (Thermo Scientific, Waltham, MA, USA, 37583) in HBS (5 mm HEPES buffer, pH 7.4, 150 mm NaCl) with 5 mm CaCl2. Microarray data analysis was performed using a dedicated software [36] developed by Mark Stoll from the Glycosciences Laboratory (Imperial College London, UK). The microarray data and metadata, including details of the gluco-oligosaccharide probe library, the generation of the microarrays, imaging and data analysis are in the Supplementary glycan microarray document (Table S2) in accordance with the Minimum Information Required for A Glycomics Experiment (MIRAGE) guidelines for reporting glycan microarray-based data [37].

    NMR spectroscopy

    All NMR spectra were acquired in a 600-MHz Bruker Avance III spectrometer (Bruker, Wissembourg, France) equipped with a 5-mm inverse detection triple-resonance z-gradient cryogenic probehead (CP TCI) and processed with the software TopSpin3.1 (Bruker).

    Ligand 1H and 13C resonance assignment

    All samples were prepared by dissolving 1 mg of the ligands in 1.0 mL of deuterated water (Euriso-top 99.9%), for a final concentration of 2 mm.

    The temperature was set to 298 K in all experiments. The 1H and the 13C chemical shifts were expressed in the δ (p.p.m.) scale and were internally referenced to tetramethylsilane. The experimental conditions for the data acquisition were as follows: 1H NMR – a standard pulse sequence from the Bruker library was used [38] with a 90ᵒ flip angle of 8.05 µs at 3.50 dB, acquisition time 5.46 s, spectral window of 3000 Hz centred at 2824.8 Hz, 32 transients with 32 K data points, relaxation delay of 1.0 s and digital resolution of 0.09 Hz per point. Water suppression was accomplished using excitation sculpting with gradients [38] using a 180ᵒ shaped pulse (Sinc1.1000) with 2000 ms at 40.78 dB; 1H-1H-COSY – a standard pulse sequence from the Bruker library was used (cosygpppprqf) with four transients acquired in a matrix with 4096 data points in t2 in a spectral window of 3000 Hz, centred at 2824.8 Hz and 512 increments in t1 with a relaxation delay of 1 s and an acquisition time of 0.68 s; 1H-13C-HSQC – a standard pulse sequence from the Bruker library was used [39-41] with 4 transients in a matrix with 1024 data points in t2 in a spectral window of 3000 Hz centred at 2824.8 Hz and with 256 increments in t1 in a spectral window of 21 128 Hz centred at 11 314.0 Hz and with a relaxation delay of 1.5 s and an acquisition time of 0.17 s. A delay of 1.72 ms was used for the evolution of the 1 bond CH coupling calculated for 1J(C,H) = 145 Hz; 1H-13C-HSQC-TOCSY – a standard pulse sequence from the Bruker library was used(hsqcdiedetgpsisp.1) with eight transients in a matrix with 1024 data points in t2 in a spectral window of 3000 Hz centred at 2824.8 Hz and with 256 increments in t1 in a spectral window of 25 000 Hz centred at 11 314.0 Hz and with a relaxation delay of 1.5 s and an acquisition time of 0.17 s. A delay of 1.72 ms was used for the evolution of the 1 bond CH coupling calculated for 1J(C,H) = 145 Hz. A delay of 80 ms was used as the mixing time. A delay of 3.45 ms was used for multiplicity selection (CH, CH3 positive, CH2 negative); 1H selective TOCSY – a standard pulse sequence from the Bruker library was used [42-45] with 32 transients in a spectral window of 3000 Hz with 32 K data points, a relaxation delay of 1.0 s and digital resolution of 0.18 Hz per point. A delay of 100 ms was used for the mixing time, and a delay of 2.5 ms was used for the trim pulse. The selective irradiation was performed by applying a 180ᵒ shaped pulse (Gaus1_180r.1000) with 80 ms at 69.71 dB at the frequency of the several anomeric protons; ROESY – a standard pulse sequence from the Bruker library was used [46] with 32 transients in a matrix with 2K data points in t2 in a spectral window of 3600 Hz centred at 2824.1 Hz and with 512 increments in t1 with a relaxation delay of 1.5 s and an acquisition time of 0.57 s. A delay of 200 ms was used for as the ROESY spinlock. Data processing was carried out with the topspin3.2 software (Bruker). Complete 1H and 13C NMR assignment tables of the ligands used in the NMR experiments are available in Tables S13 and S14.

    Saturation transfer difference NMR

    Identification and mapping of the ligand epitopes was achieved using the saturation transfer difference NMR experiment (STD-NMR) using the pulse sequence from the Bruker library (stddiffesgp.3) [38, 47]. The pseudo-2D spectra were performed using a solution of 2 mm of carbohydrate ligand and 20 µm protein in D2O. All the spectra were recorded with 256 scans in a matrix with 32 k points in t2 in a spectral window of 6410.26 Hz centred at 2733.30 Hz. Excitation sculpting with gradients [38] was employed to suppress the water proton signals. A spinlock filter (T) with a 2-kHz field and a length of 50 ms was applied to suppress protein background. Selective saturation of protein resonances was performed by irradiating at 0.6 p.p.m. (on-resonance spectrum) using a series of 40 Eburp2.1000 shaped 90° pulses (50 ms, 1 ms delay between pulses), for a total saturation time of 2.0 s. For the reference spectrum (off-resonance), the sample was irradiated at 20 p.p.m. Proper control experiments were performed using the ligand alone in the same conditions as the experiments in the presence of the protein (data not shown).

    The results were analysed using the STD amplification factor (ASTD) [27, 48], obtained by multiplying the relative STD effect of a given hydrogen (ISTD/I0) at a given ligand concentration ([L]T) with the molar ratio of ligand in excess relative to the protein ([L]T/[P]), according to Equation 1:
    urn:x-wiley:1742464X:media:febs15162:febs15162-math-0001(1)
    where ASTD is the STD amplification factor, I0, ISAT and ISTD are the intensities of the reference (off-resonance), saturated (on-resonance) and difference spectra (STD-NMR), respectively.

    CtCBM11 titration

    The residues of CtCBM11 responsible for binding were identified by titrating a sample of double-labelled protein with each one of the ligands and acquiring a 1H-15N-HSQC at each titration point. The concentration of protein was maintained at 300 µm, and the concentration of ligand varied from 0 to 5.7 equivalents (0, 0.7, 1.0, 1.5, 2.9, 4.3, and 5.7), in the case of G4G4G3G, and from 0 to 2.5 equivalents (0, 0.3, 0.4, 0.7, 1.3, 1.9, and 2.5), in the case of G4G3G4G. The 1H-15N-HSQC spectra were acquired with 2048 × 256 points and eight scans. Spectral widths were 9615 for 1H and 2311 Hz for 15N. The central frequency for proton was set on the solvent signal (2817.40 Hz) and for nitrogen was set on the centre of the amide region (7175.66 Hz).

    For the evaluation of the behaviour of individual amino acids upon addition of increasing amounts of ligand, we calculated the combined amide proton and nitrogen chemical shift differences using Equation 2 [29]:
    urn:x-wiley:1742464X:media:febs15162:febs15162-math-0002(2)
    where ΔδH and ΔδN are the variations of the chemical shifts of proton and nitrogen. In order to decide whether a given residue belongs to the class of interacting or noninteracting residues, we calculated a corrected standard deviation to zero [29].

    Calculation of the dissociation constant, KD

    The combined chemical shifts of the NH resonances of Arg126, Tyr129 and Asp146 were used to obtain the dissociation constant (KD) from the titration experiment according to Equation 3 [30]:
    urn:x-wiley:1742464X:media:febs15162:febs15162-math-0003(3)
    where Δδcomb is the combined chemical shift deviation defined by Eq. 2, Δδmax is the maximum chemical shift deviation between free and bound state of protein, KD is the dissociation constant, and [P]0 and [L]0 are the concentration of the protein and ligand, respectively. The obtained values were then averaged for a more precise measure, and the standard deviation of the three measurements was taken as the associated error of the overall value (Table S6).

    Crystallization and X-ray diffraction data collection

    The complexes of CtCBM11 were produced by overnight incubation of the protein (15-20 mg·mL−1) with β1,3-1,4-mixed-linked tetrasaccharide (G4G4G3G) and hexasaccharide (G4G3G4G4G3G) ligands at 1:10 molar ratio, respectively. Crystals of each complex were grown in hanging drops, using the vapour diffusion method. Crystals grew from precipitant solutions containing 20–28% (m/v) polyethyleneglycol 3350 and 0.2 m potassium phosphate in 0.1 m sodium acetate buffer, pH 4.6. For the CtCBM11-G4G4G3G complex, although sea urchin-like crystals appeared in the drops in 1 or 2 days, hexagonal crystals grew later over a period of 3 weeks. Crystals of the CtCBM11-G4G3G4G4G3G complex appeared after a period of 2 weeks, although affected by significant multiplicity. All crystals were harvested using a 0.1 m sodium acetate-buffered solution (pH 4.6) containing 30% (m/v) polyethyleneglycol 3350 and 0.2 m potassium phosphate. Crystals grown in 20-24% (m/v) polyethyleneglycol 3350 were flash-cooled frozen in liquid nitrogen using 30% (v/v) glycerol as cryoprotectant added to the harvesting solution, while crystals grown with 28% (m/v) polyethyleneglycol 3350 were flash-cooled using paratone oil.

    X-ray diffraction data from a single crystal of the CtCBM11-G4G4G3G complex were collected under a nitrogen stream at 100K in I02 beamline at Diamond Light Source (Oxfordshire, UK), to a maximum resolution of 1.45 Å and using radiation of 0.9763 Å wavelength. The CtCBM11-G4G4G3G crystal indexed in space group H3 (R3:H), with cell constants a = b = 103.2 Å and c = 39.6 Å, corresponding to a calculated Matthews coefficient of 2.05 Å3·Da−1 and a solvent content of 40%, suggesting the presence of one molecule of CtCBM11 in the asymmetric unit. Data for the CtCBM11-G4G3G4G4G3G complex were collected, from a crystal protected with paratone oil and flash-cooled in nitrogen stream at 100 K, in ID23-2 beamline at the ESRF (Grenoble, France) to a maximum resolution of 2.6 Å and using X-ray radiation at a fixed wavelength of 0.8729 Å. The CtCBM11-G4G3G4G4G3G crystals indexed in space group H3 (R3:H), with cell constants a = b = 104.9 Å and c = 39.5 Å. Data collection, processing, model building and validation statistics are shown in Table S7.

    Phasing, model building and refinement

    Data sets were processed using MOSFLM [49] and SCALA [50] from the CCP4 suite [51]. Phasing for the CtCBM11-G4G4G3G complex was performed by molecular replacement with Phaser MR [52] from CCP4 using the CtCBM11 polypeptide chain of the PDB 1v0a structure [12] to position the protein model in the indexed H3 space group. After model building and refinement, the polypeptide chain of this new structure (PDB 6R3M) was used, in a similar procedure, to solve the structure of the CtCBM11-G4G3G4G4G3G complex (PDB 6R31). Model completion, editing and initial validation were carried out in COOT [53]. Automatic addition of water molecules and restrained refinement of the full models were done using REFMAC5 [54]. Phenix.elBOW from the PHENIX suite [55] was used to generate restraints for β-d-glucose monomers used in refinement of G4G4G3G.

    Structure validation was performed using MolProbity [56], and PDB-REDO [57] was used to generate the final models. PRIVATEER [58] was used for the validation of the stereochemistry and conformation of the carbohydrate ligands (Table S8). The CtCBM11-G4G4G3G complex, with R = 15.7 % (Rfree = 18.8 %), consists of 178 amino acid residues, two calcium ions, one acetate and four phosphate ions, 212 water molecules and one G4G4G3G ligand. The side chain of Leu172 was omitted due to disorder and consequent absence of meaningful electron density. For the CtCBM11-G4G3G4G4G3G complex, a final R = 18.8 % (Rfree = 24.6 %), consisting of 173 amino acid residues, two calcium ions, two phosphate ions, 57 water molecules and one G4G3G4G4G3G ligand. Residues Asp79 to Ser81 were omitted from the model due to poorly defined electron density.

    In the CtCBM11-G4G4G3G structure, the anomeric carbon of Glc1 could be observed in both α and β conformations, as supported by the mFo-DFc electron density map (Fig. 5C). As such, the hydroxyl group was hence modelled in both positions, with partial occupancy (Figs 5D and Figs 6).

    Isothermal titration calorimetry

    Isothermal titration calorimetry (ITC) was performed essentially as described previously [12], using a MicroCal VP-ITC calorimeter (Northampton, MA, USA) at 25° C. Before the experiment, purified proteins were buffer-exchanged against 50 mm phosphate buffer, pH 7.0, containing 0.1 mm CaCl2. The reaction cell contained protein at 35–50 µm, while the syringe contained either the oligosaccharides at 0.5–10 mm or the soluble polysaccharides at 1–6 mg·mL−1. The ligands were dissolved in the dialysis buffer (separately) to minimize heats of dilution. Titrations were performed by a first injection of 2 µL followed by 28 subsequent injections of 10 µL aliquots of either polysaccharide or oligosaccharide at 220-s intervals into ITC sample cell (volume 1.4467 mL) containing different enzyme samples. The stirring speed and reference power were set at 307 r.p.m. and 15 µcal·s−1, respectively. The heat background was measured under the same conditions by dropping the buffer only without ligand into the protein at the same concentration as in the cell. The molar concentration of CBM binding sites present in polysaccharide ligands was determined as described previously [59]. Data analysis was performed by nonlinear regression using a single binding model (microcal origin 7.0 software, Malvern Panalytical, Malvern, UK), and thermodynamic parameters, such as the association constant (Ka), number of binding sites in the protein (n) and the binding enthalpy change (ΔH), were determined. Gibbs free energy change (ΔG) and the entropy change (ΔS) were calculated according to Equation 4:
    urn:x-wiley:1742464X:media:febs15162:febs15162-math-0004(4)
    where R is the gas constant, and T represents the absolute temperature.

    Acknowledgements

    The authors wish to thank Prof. Maria João Romão for critical reading of the manuscript, for valuable discussions and for access to the Macromolecular Crystallography Facilities in FCT-NOVA. We acknowledge the Glycosciences Laboratory directed by Professor Ten Feizi for access to the Carbohydrate Microarray Facility, and the past and present members of the Laboratory, in particular Hongtao Zhang, Yibing Zhang, Lisete M. Silva and Yan Liu, for their contribution to the preparation of glucan-related NGL probes and the generation of microarrays. We are also grateful to collaborators that provided valuable samples of oligosaccharides included in the microarrays, in particular Barry V. McCleary (Megazyme International) for various barley hydrolysates. We acknowledge the European Synchrotron Radiation Facility (ESRF) and the Diamond Light Source (DLS) for access to synchrotron facilities. This work was supported by: Fundação para a Ciência e a Tecnologia (FCT-MCTES), Portugal, through grants PTDC/QUI-QUI/112537/2009 (to ASP), PTDC/BIA-MIC/5947/2014 (to CMGAF), PTDC/BBB-BEP/0869/2014 (to ALC), SFRH/BD/100569/2014 (to DOR), EXPL/IF/01621/2013 (to VMRP) and IF/00023/2012 (to ASP); Wellcome Trust Biomedical Resource grant WT108430 to Ten Feizi for the funding to the Carbohydrate Microarray Facility; and by the Applied Molecular Biosciences Unit (UCIBIO), which is financed by national funds from FCT-MCTES (UID/Multi/04378/2019). The NMR spectrometers at FCT-NOVA are part of Rede Nacional de RMN (PTNMR), supported by FCT-MCTES (ROTEIRO/0031/2013-PINFRA/22161/2016) (cofinanced by FEDER through COMPETE 2020, POCI and PORL and FCT through PIDDAC).

      Conflict of interest

      The authors declare no conflict of interest.

      Author contributions

      EC, ASP and ALC conceived the study. CMGAF, EJC, ASP and ALC directed and supervised the research. DOR, AV, VMRP, JM-S and PB conducted most of the experiments and analysed the data. DOR performed carbohydrate microarray and X-ray crystallography analysis. AV and FM performed NMR experiments. VMRP and PB performed the mutagenesis and ITC experiments. WC and ASP constructed the microarrays. WC prepared and analysed the barley hexasaccharide. DOR, AV, EC, ASP and ALC wrote the paper with contributions from all the other authors.