Journal list menu

Volume 276, Issue 3 p. 816-824
Free Access

Expression of Helicobacter pylori CagA domains by library-based construct screening

Alessandro Angelini

Alessandro Angelini

ESRF, Grenoble, France

Department of Chemistry, Institute of Biomolecular Chemistry, CNR, University of Padua, Italy

These authors contributed equally to this work

Search for more papers by this author
Tommaso Tosi

Tommaso Tosi

ESRF, Grenoble, France

These authors contributed equally to this work

Search for more papers by this author
Philippe Mas

Philippe Mas

European Molecular Biology Laboratory, Grenoble Outstation, France

Search for more papers by this author
Samira Acajjaoui

Samira Acajjaoui

ESRF, Grenoble, France

Search for more papers by this author
Giuseppe Zanotti

Giuseppe Zanotti

Department of Chemistry, Institute of Biomolecular Chemistry, CNR, University of Padua, Italy

Search for more papers by this author
Laurent Terradot

Laurent Terradot

ESRF, Grenoble, France

Search for more papers by this author
Darren J. Hart

Darren J. Hart

European Molecular Biology Laboratory, Grenoble Outstation, France

Search for more papers by this author
First published: 14 January 2009
Citations: 30
D. J. Hart, European Molecular Biology Laboratory, Grenoble Outstation, 6 rue Jules Horowitz, BP181, 38042 Grenoble Cedex 9, France
Fax: +33 476 20 71 99
Tel: +33 476 20 77 68
E-mail: [email protected]
L. Terradot, ESRF, 6 Rue Jules Horowitz, BP220, 38043 Grenoble Cedex 9, France
Fax: +33 476 20 94 00
Tel: +33 476 20 94 54
E-mail: [email protected]

Abstract

Highly pathogenic strains of Helicobacter pylori use a type IV secretion system to inject the CagA protein into human gastric cells. There, CagA associates with the inner side of the membrane and is tyrosine-phosphorylated at EPIYA motifs by host kinases. The phosphorylation triggers a series of interactions between CagA and human proteins that result in a dramatic change of cellular morphology. Structural and functional analyses of the protein have proved difficult, due to the proteolytically sensitive nature of the recombinant protein. To circumvent these difficulties, we applied ESPRIT, a library-based construct screening method, to generate a comprehensive set of 5′-randomly deleted gene fragments. Screening of 18 432 constructs for soluble expression resulted in a panel of 40 clones, which were further investigated by large-scale purification. Two constructs of approximately 25 and 33 kDa were particularly soluble and were purified to near homogeneity. CagA fragments larger than 40 kDa were prone to heavy proteolysis at the C-terminus, with a favoured cleavage site near the first EPIYA motif. Thus, these well-expressed recombinant constructs isolated are likely to be similar to those observed following natural proteolysis in human cells, and open the way for structural and functional studies requiring large amounts of purified material.

Abbreviations

  • cag-PAI
  • cytotoxin-associated gene pathogenicity island
  • IPTG
  • isopropyl thio-β-d-galactoside
  • T4SS
  • type IV secretion system
  • Helicobacter pylori is a human pathogen that infects more than half of the human population [1,2]. It is considered to be the main cause of most gastric pathologies, including chronic gastritis, peptic ulcer, gastric adenocarcinoma, and lymphoma of the mucosa-associated lymphoid tissue [3]. Infection with H. pylori strains carrying the cytotoxin-associated gene pathogenicity island (cag-PAI) leads to a higher risk of gastric cancer [4,5]. This gene cluster of around 40 kb codes for 29 proteins involved in the assembly of a type IV secretion system (T4SS) [6,7]. T4SSs are used by many bacteria for genetic conjugation, high-frequency recombination, and delivery of effector macromolecules [8]. The cag-encoded T4SS forms a long appendage that is necessary to deliver the effector protein, CagA, into gastric epithelial cells [9,10].

    Once injected inside target cells, CagA associates with the inner side of the cytoplasmic membrane and is tyrosine-phosphorylated by host kinases, including c-Src, Lyn, Fyn, Yes and c-Abl [11]. Phosphorylated CagA then binds several SH2 domain-containing host proteins involved in signalling pathways, including the SHP-2 phosphatase, Csk tyrosine kinase, and the Crk adapter protein [12–14]. These events ultimately result in cellular morphological changes known as the ‘hummingbird phenotype’, and enhance cellular motility, resulting in cell scattering [11,15–17]. Phosphorylated CagA has also been found to interact with PAR1/MARK kinase, promoting both cell polarity defects and the hummingbird phenotype through the inhibition of PAR1 phosphorylation [18]. Several CagA-mediated effects are independent of its phosphorylation status. CagA is known to interact with the scaffold protein ZO-1 and also the tight junctional adhesion protein JAM, leading to a redistribution of tight junction proteins and the alteration of apical–junctional complex function [19,20]. An increase in cell motility is furthermore induced by the intracellular interaction of CagA with the scatter factor receptor c-Met, which deregulates the c-Met receptor pathway and thereby induces cell invasion [21,22]. The C-terminal portion of CagA has also been found to interact with Grb2, triggering a Ras-dependent signalling pathway that results in increased cell scattering and proliferation [23].

    CagA effects exerted in vivo are mediated by protein–protein interactions with host cell components. However, the biochemistry of CagA and of its interactions remains largely elusive, because the full-length protein is unstable in recombinant expression systems and rapidly degraded [24]. The purification of a CagA fragment (from residues 392–733) has been previously reported, and this was shown to bind HpYbgC in vitro [25,26], but the biological relevance of this interaction has yet to be elucidated. The N-terminal 57 residues interact with ZO-1 and JAM [19] and contain a sequence important for translocation by the T4SS [27]. The C-terminus of CagA contains the tyrosine phosphorylation sites, which are major determinants of its mode of action. One phosphorylation site comprises a stretch of 34 residues including the five amino acid EPIYA motif. EPIYA motifs can be found multiple times in the C-terminal region of the protein, depending on the bacterial strain [11]. Whereas the first two motifs (EPIYA-A and EPIYA-B) are conserved among all strains, Western isotypes possess 0–4 copies of the EPIYA-C motif, thereby explaining the variation in molecular mass observed in different CagA proteins. Furthermore, the Eastern CagA isotypes do not carry an EPIYA-C motif, but a different sequence, EPIYA–D [28]. The EPIYA-C and EPIYA-D motifs are the major, if not the only, phosphorylation sites in CagA [29], and sequences within the EPIYA motifs also mediate the attachment to the membrane [30]. In addition, the multimerization of CagA mediated by EPIYA motifs seems to be a prerequisite for the interaction with the SHP-2 phosphatase. Finally, the CagA C-terminus mediates translocation inside human gastric cells. A lysine-rich motif located within the last 20 residues is necessary, together with the CagA N-terminus, for the translocation of the protein [27]. Furthermore, CagF, another component of the cag-PAI, is known to interact with a stretch of 100 amino acids located at the CagA C-terminus, just upstream of the C-terminal motif necessary for translocation [24,31]. Finally, the pilus-associated protein CagL is required to facilitate the receptor interaction prior to injection of CagA [32].

    Although these data clearly define the C-terminal region as being of great functional importance, information about the boundaries of domains involved in protein–protein interactions and their in vitro biochemical and structural characterization is limited. In order to investigate the biochemistry and structure of CagA, we sought to identify soluble fragments in the C-terminal region of CagA for use in subsequent studies. As CagA has no significant sequence homology with other proteins, it was impossible to identify putative domains through analysis of multiple sequence alignments. We therefore used a recently developed high-throughput screen for soluble constructs, ESPRIT (expression of soluble proteins by random incremental truncation) [33–35], in which a comprehensive 18 432 clone random library of 5′-deletion constructs was synthesized and screened for soluble expression. A panel of C-terminal fragments was isolated that were well expressed in Escherichia coli and purifiable by affinity chromatography. Endogenous proteolysis of several of these fragments led to further refinement of domain boundaries. The resulting constructs will be of significant use in biochemical and structural studies on the role of this important virulence factor.

    Results

    Library construction

    An exonuclease III and mung bean nuclease truncation protocol was performed on a precloned cagA gene to generate a comprehensive random library of constructs in which most (if not all) possible fusion points of a cleavable N-terminal hexahistidine tag were present. One-third of the constructs were in-frame with this sequence, resulting in T7 promoter-driven expression constructs. All constructs were fused in-frame with a C-terminal biotin acceptor peptide used as a marker for protein solubility and stability during colony screening. The library was divided into three insert size ranges (500–1500, 1500–2500, and 2500–3500 bp) by size selection on agarose gel to reduce the dominance of small expression constructs. The resulting sublibraries were handled separately thereafter. Colony PCR and DNA sequencing of unselected clones indicated an approximately even distribution of construct sizes, with no obvious bias. The frequencies of amplified DNA inserts were 40% (13/32), 56% (18/32) and 47% (15/32), representing minimum insert efficiencies for the three sublibraries. The pooled sublibraries were used to transform BL21-CodonPlus(DE3)-RIL, and 18 432 colonies (6144 for each sublibrary) were picked into 384-well plates. The minimum insert efficiency combined with the length of the gene (3561 bp) predicts an approximately threefold oversampling of constructs.

    Assessment of clones for soluble purifiable protein expression

    A three-step solubility screening process was employed (Fig. 1A): (a) robotically arraying clones on nitrocellulose membranes; (b) growing colonies; and (c) inducing protein expression by shifting membranes to agar plates containing isopropyl thio-β-d-galactoside (IPTG). Putative soluble expression clones were isolated from each sublibrary by analysing membranes hybridized simultaneously with Alexa488 streptavidin and a monoclonal anti-hexahistidine tag with associated Alexa532 mouse secondary antibody (Fig. 1B). Values of signal intensities for both N-terminal and C-terminal tags were extracted from arrays into Microsoft Excel, and an initial filtering was applied to eliminate all clones with no detectable hexahistidine tag signals.

    Details are in the caption following the image

    Screening for protein expression and solubility from a CagA random truncation library. (A) Flow chart for improving CagA protein expression and solubility. A library of 18 342 clones of 5′-random cagA truncations was robotically picked and gridded onto nitrocellulose agar to grow colony arrays in which protein expression was induced and assessed by intensity of N-terminal hexahistidine tag and C-terminal biotin acceptor peptide signals. The 96 best ranked clones were purified from small-scale expression cultures by Ni2+-nitrilotriacetic acid purification. From SDS/PAGE analysis, 40 constructs were selected for larger-scale purification. (B) Colony-based solubility screening of the construct library. Example of detection of N-terminal hexahistidine (red, top panel), C-terminal biotinylation (green, middle panel), and merged image (lower panel). (C) Testing of putative soluble CagA fragments by purification on Ni2+-nitrilotriacetic acid agarose from 4 mL expression cultures. Results are presented for eight of the 96 clones tested showing successful outcomes. For each clone, the Coomassie blue-stained SDS polyacrylamide gel (S) is displayed alongside fluorescent western blot analysis (W) of the same sample. An overlay of the hybridizations with antibody and streptavidin [as in (B)] is presented, showing the presence of both genetically encoded protein termini, indicating proteolytically stable fragments. Protein molecular mass markers are indicated (M).

    In total, 9635 colonies showed some signal for the N-terminal hexahistidine tag and were subsequently ranked for streptavidin fluorescence, resulting in 453 clones with significant signals for both tags. In this way, clones exhibiting only one terminus due to internal translational initiation, premature translational termination or proteolysis were removed, generating a higher-quality subset for subsequent liquid expression testing and purification trials. Indeed, we observed a high number of clones in medium and large insert sublibraries exhibiting strong C-terminal biotinylation signals, but a complete absence of detectable hexahistidine tag. This effect was not found in the small sublibrary, and is suggestive of an internal translational start site common to all fragments larger than 1500 bp; these constructs were eliminated, due to the absence of hexahistidine tag.

    Ninety-six clones comprising the 32 highest expressers from each sublibrary were assessed for insert size by PCR and simultaneously for expression of Ni2+-nitrilotriacetic acid purifiable protein from 4 mL cultures. Proteins eluted in the imidazole-containing buffer were visualized by SDS/PAGE and western blot performed with both fluorescent streptavidin and antibody against hexahistidine tag (Fig. 1C).

    Analysis of expression clones

    Of the 96 constructs tested, three short constructs [20.3 (not shown), 29.2 and 29.7 kDa) and six medium constructs (36.4, 37.4, 41.6, 42.1, 42.2 and 44.5 kDa) were well expressed, and purified as a single band (Fig. 1C). Sizes of the proteins were predicted from DNA sequencing of inserts and confirmed by SDS/PAGE (including 5 kDa of tags). Nineteen other constructs were purified, but displayed weak bands by fluorescent western assay, suggesting that levels of soluble expression were very low. Nineteen further constructs between 45 and 60 kDa exhibited significant degradation. Owing to the weak nature of many of the bands, western blot analysis was also performed using streptavidin and antibody against hexahistidine to confirm the SDS/PAGE results. The remaining 49 constructs showed no significant protein expression in these small-scale experiments, and so they were not studied further, due to lack of expression (false positives) or because of the requirement for inconvenient scale-up. Further sequencing identified the expression-compatible N-termini and revealed a total of 40 unique sequences that were in-frame with the hexahistidine tag. Significant heterogeneity at the tag fusion position was observed (Fig. S1).

    Scale-up expression and purification

    One litre scale protein expression trials were performed on all unique clones exhibiting detectable protein expression in small-scale experiments, and Ni2+-nitrilotriacetic acid purification was performed using 1 mL His-trap columns. Two proteins of 25 and 33 kDa were purified as a single band in SDS/PAGE (Fig. 2A; clones 8 and 37) at high yield and quality. Five larger constructs (Fig. 2A; clones 94, 90, 70, 83 and 88) revealed proteins of 81, 87, 95, 106 and 107 kDa. They were also purified, but a second band of lower molecular mass was detectable in the SDS/PAGE gel. Western blot analysis of these five fragments with antibodies against hexahistidine tag indicated that the subfragments correspond to truncated proteins lacking a C-terminal portion of about 33 kDa (Fig. 2A). This repeated pattern suggested that these constructs were all expressed and soluble but contained a C-terminal domain that was particularly prone to proteolysis in E. coli cells. From the size of the fragments and the CagA sequence, we hypothesized that the C-terminal domain cleaved in these constructs started at a stretch of five asparagines, corresponding to residues 885–889 (Fig. 2B,C). We thus designed two new constructs, 90ΔC and 94ΔC, with the same N-terminal residues as clones 90 and 94, respectively, but ending at these five asparagines (Fig. 2B). The expression and purification of these new constructs resulted in stable purifiable domains exhibiting little degradation and located internally within the CagA sequence (Fig. 2B).

    Details are in the caption following the image

    Expression of soluble C-terminal CagA fragments. (A) Seven CagA protein fragments (clones: 8, 37, 94, 90, 70, 88 and 83) of different sizes (25, 33, 87, 81, 95, 106 and 107 kDa, respectively) were expressed at 1 L scale with an N-terminal hexahistidine tag and purified by Ni2+-nitrilotriacetic acid affinity chromatography. All purified parental fragments are indicated with an arrow. From left: clones 8 and 37 (25 and 33 kDa, respectively) show relatively high proteolytic stability, whereas five other constructs (clones 94, 90, 70, 83 and 88) clearly show the presence of a cleaved ‘subfragment’ (indicated by an asterisk). (B) Clones 94ΔC and 90ΔC, subcloned from clones 94 and 90 with deletion of a C-terminal fragment (around 33 kDa), express proteolytically stable, purifiable protein fragments as assessed by SDS/PAGE. (C) Summary diagram of CagA constructs yielding purifiable fragments. Above: a schematic view of the CagA protein sequence. N-terminal and C-terminal translocation signals are orange, the stretch of five asparagines (885–889) is grey, EPIYA motifs are dark red, and multimerization motifs are dark green. Below: the positions of the purifiable fragments are indicated by solid bars. Yellow bars indicate C-terminal constructs exhibiting little proteolytic degradation. Blue bars are constructs yielding mixtures of both unproteolysed and proteolysed fragments, the latter corresponding to loss of the C-terminal region. Pink bars represent clones that were subsequently constructed by deleting the region corresponding to the longer stable C-terminal construct (clone 37) shown in yellow.

    Discussion

    The pathogen H. pylori uses a T4SS to inject the CagA pathogenicity factor into human gastric cells, where it perturbs host signalling pathways to provide a local environment that is more suitable for the survival of the pathogen. Biochemical studies have associated specific functions with some protein regions [11]. However, the large size of the CagA protein and its weak homology with other eukaryotic and prokaryotic proteins of known structure and function make it difficult to predict regions that constitute well-folded domains through the use of multiple sequence alignments. This has prevented both structural and most in vitro functional studies on this important virulence factor.

    To circumvent these problems, we have applied ESPRIT, a random construct library screening approach, to identify C-terminal regions of CagA that can be overexpressed in a soluble form in E.coli and purified with a yield high enough for structural studies and high protein consumption in vitro analytical methods. All possible 5′-unidirectional truncations of the target gene were generated and tested for soluble expression using high-throughput screening robotics [33]. The usefulness of this method for empirical dissection of large, crystallographically intractable proteins into manageable domains has been demonstrated recently for influenza proteins [33–35], and it is used here on a bacterial pathogen for the first time.

    In this study, CagA reveals itself to be a challenging protein for recombinant expression, due to a propensity for degradation. This may be ascribed to the high proportion of basic residues (15% value overall), together with predicted unstructured regions. Systematic truncation analysis, as performed here, permits all structural N-termini to be tested as well as the effects of nonstructural factors, such as translation-inhibiting mRNA secondary structure and compatibility with E. coli cellular proteolysis mechanisms.

    Two types of CagA protein constructs were identified during the screening process: first, two C-terminal fragments, 25 and 33 kDa, that are well expressed and easily purifiable (Fig. 2C); and second, longer constructs of 40–100 kDa that were partially proteolysed by the cell. Analysis of the degradation products revealed a region upstream of the C-terminal fragment that can be expressed as a stable entity once subcloned. Thus, we have been able to identify two adjacent domains, one directly and the second indirectly.

    These results are in partial agreement with in vivo and in vitro experiments [24,31,36–38], as our larger purified protein samples between 40 and 100 kDa showed a similar degradation pattern to that previously observed. These earlier experiments had already suggested that full-length CagA was a fragile protein that was proteolytically sensitive, breaking into subunits at defined positions. In particular, CagA cleaved to yield a C-terminal fragment of about 35–40 kDa in human cells [39], as we have observed in E. coli. However, no further optimization of these constructs into high-level expression constructs useful for subsequent studies has been reported.

    The 33 kDa stable fragment from clone 37 corresponds approximately to the C-terminus of CagA, starting at the first EPIYA motif. The region preceding this fragment (around residues 877–918) is predicted to be highly disordered according to order/disorder predictors; moreover, a long loop is predicted in position 882–916 (Fig. 2). It may also be of significance that a stretch of five asparagines lies just before it, perhaps constituting a natural linker that is cleaved inside human cells. Interestingly, the C-terminal part of CagA contains the domain responsible for interacting with CagF, a putative chaperone of CagA [24,31]. The binding of CagF may be important in stabilizing the entire protein and in preventing proteolysis before its injection.

    The soluble, purifiable constructs of the CagA protein identified here through application of ESPRIT have not been described previously. We believe that large-scale expression and purification of these domains will enable progress towards a definition of the CagA structure and will aid functional interaction studies of CagA with CagF and host cell components.

    Experimental procedures

    Amplification and subcloning of the CagA gene

    The gene hp0547 coding for CagA was amplified by PCR from genomic DNA (H. pylori strain 26695) with primers hp0547for1 (5′-CACCA TGACT AACGA AACTA TTGAT C-3′) and hp0547rev1 (5′-TTAAG ATTTT TGGAA ACCAC CTTTT G-3′) and cloned into pET151/D topo (Invitrogen, Carlsbad, CA, USA). Internal NsiI restriction sites were silenced (ATGCAT to ATGCGT) by Quikchange mutagenesis (Stratagene, La Jolla, CA, USA), and the mutated gene used as a template to again amplify hp0547 by PCR, adding a 5′-AscI and a 3′-NsiI site, by using the flanking primers hp0547for2 (5′-GATCC TAGGG CGCGC CACTA ACGAA ACTAT TGATC AAACA AGAAC ACCAG-3′) and hp0547rev2 (5′-CTAGG ATCAT GCATT AGATT TTTGG AAACC ACCTT TTGTA TTAAC ATT-3′). The PCR product was then digested with AscI and NsiI, and inserted into a pET9a-derived vector, pTAR010 [34], designed for the 5′-truncation of the gene, in-frame with a TEV-cleavable N-terminal hexahistidine tag (MGHHHHHHDYDIPTTENLYFQG) and a C-terminal biotin acceptor peptide (SNNGSGGGLNDIFEAQKIEWHE) to generate pHAR3011-CagA.

    Construction of the 5′-CagA deletion library

    Plasmid was purified from a 200 mL (LB medium, 50 μg·mL−1 kanamycin) saturated culture of E. coli 10G Duos (Lucigen, Middleton, WI, USA) (pHAR3011–CagA). After overnight growth at 37 °C, the cells were harvested, and lysed by alkaline lysis treatment, and the DNA was extracted using phenol/chloroform/isoamyl alcohol (25 : 24 : 1) followed by isopropanol precipitation. The plasmid DNA was then further purified using a midiprep kit (Qiagen, Valencia, CA, USA) to remove contaminants.

    Plasmid (10 μg) was digested with AscI and AatII, yielding an AscI 5′-end (sensitive to exonuclease III) and an AatII 3′-end (insensitive). For the exonuclease III truncation reaction, 4 μg of digested plasmid was diluted in 130 μL of reaction buffer [1× buffer 1 from New England Biolabs (Ipswich, MA, USA) supplemented with 30 mm NaCl] and 400 units of exonuclease III (New England Biolabs), and incubated at 22 °C. To ensure even fragment distribution, 2 μL aliquots were taken every 2 min over 2 h and immediately added to an ice-cold tube containing 200 μL of 3 m NaCl. The quenched reaction was denatured at 70 °C for 20 min, and DNA was purified using a Nucleospin Extract II kit (Macherey-Nagel, Düren, Germany). The remaining 5′-overhang was removed by incubation with 5 units of mung bean nuclease (New England Biolabs) at 30 °C for 30 min, and then repurified with the Nucleospin Extract II kit. The library DNA (40 μL from column elution) was then incubated with Pfu native polymerase (Stratagene) in a total volume of 50 μL (1×Pfu polymerase native buffer, 2.5 mm dNTPs, and 1 unit of enzyme) at 72 °C for 20 min to generate blunt end fragments. The reaction was loaded onto a 0.5% agarose gel, and plasmids with inserts in the ranges 500–1500, 1500–2500 and 2500–3500 bp were excised from the gel and purified using the QiaexII gel extraction kit (Qiagen), generating three sublibraries. The linear DNA fragments of the three sublibraries were then recircularized using T4 DNA ligase (Roche, Mannheim, Germany). One Shot Omnimax 2 T1 cells (Invitrogen) were transformed with 2 μL of ligation reaction, and then recovered at 37 °C for 1 h and plated on 22-cm-square LB agar plates with 50 μg·mL−1 kanamycin. Approximately 19 000 colonies were scraped from the plate, and resuspended in NaCl/Pi; plasmid DNA was then purified with a midiprep kit (Qiagen). The supercoiled library DNA was then used to transform E. coli BL21-CodonPlus(DE3)-RIL (Stratagene) for protein expression testing.

    Robotic processing of the library

    The bacteria were plated on 22 cm LB agar plates supplemented with kanamycin and chloramphenicol at 50 μg·mL−1, and incubated overnight at 37 °C. Approximately 20 000 colonies were picked into 384-well plates containing 70 μL of TB medium (kanamycin and chloramphenicol at 50 μg·mL−1) per well using a colony-picking robot (KBiosystems, Basildon, UK). Plates were shaken at 30 °C overnight, and then arrayed onto 22 cm nitrocellulose membranes on LB agar plates supplemented with antibiotics as above. Plates were incubated at 25 °C until small colonies appeared, and then the membrane was moved to similar LB plates additionally supplemented with 1 mm IPTG and 50 μm biotin to induce protein expression at 30 °C for a further 4 h.

    Identification of putative soluble protein-expressing clones

    Colonies were lysed in situ on the nitrocellulose membrane by placing the membrane on Whatman filter paper soaked in lysis solution (500 mm NaOH, 1.5 m NaCl) for 10 min, followed by a neutralization solution (1.5 m Tris, pH 7.5, 1.5 m NaCl; 2 × 10 min). Membranes were washed in 2× SSC buffer for 15 min, the cellular debris was removed by gentle scraping of the membrane under liquid, and then the membranes were incubated overnight at 4 °C in Superblock (Pierce, Woburn, MA, USA). Membranes were washed with PBS-T buffer (NaCl/Pi with 0.2% Tween-20), and then incubated for 1 h at 4 °C with monoclonal antibody against hexahistidine (Roche; 1 : 3000 in PBS-T buffer). After being washed in PBS-T buffer, the membranes were further incubated for 1 h at 4 °C with a secondary mouse antibody–Alexa532 conjugate (Invitrogen) and streptavidin–Alexa488 (Invitrogen) at dilutions of 1 : 1000 and 1 : 5000, respectively. The membranes were washed with PBS-T buffer, and then with distilled water, and visualized with a Typhoon 9400 fluorescence imager (GE Healthcare, Piscataway, NJ, USA), using lasers at 488 and 532 nm to quantify hexahistidine tag and biotin acceptor peptide signal intensities, respectively.

    High-throughput expression and purification of CagA fragments

    Clones with clearly visible hexahistidine tag signals were ranked according to their biotinylation signal intensity, and the most intense 96 clones were analysed for insert size by PCR direct from culture using standard T7 forward and T7 reverse primers. Small-scale protein expression testing of positive clones was performed at 4-mL scale in 24-well plates with TB medium (kanamycin and chloramphenicol, 50 μg·mL−1). Cells were grown at 37 °C with shaking at 250 r.p.m. to an attenuance at 600 nm (D600) of 0.6, and then induced with 1 mm IPTG and grown overnight at 25 °C with shaking at 250 r.p.m. Cells were pelleted by centrifugation (2250 g for 20 min), and then resuspended in 4 mL of sphaeroplast buffer (20 mm Tris, pH 8, 250 mm NaCl, 20% sucrose) with 1 mg·mL−1 lysozyme). The resulting sphaeroplasts were pelleted by centrifugation, and the supernatant was discarded and resuspended in 700 μL of sphaeroplast lysis buffer (10 mm Tris, pH 7.5, 0.5% Brij-58, DNaseI, and Roche protease inhibitors mix). After 20 min at 4 °C, the lysate was centrifuged (2250 g for 20 min), and the hexahistidine-tagged proteins were purified from the supernatants with a liquid handling robot (Tecan) in 96-well format, using filter plates charged manually with 60 μL of Ni2+-nitrilotriacetic acid resin (Sigma, St Louis, MO, USA). Wash buffer (50 mm phosphate buffer, pH 7, 300 mm NaCl, 5 mm imidazole) and elution buffer (the same, with 300 mm imidazole) were used. Samples from the elution step were analysed by SDS/PAGE, and the sizes of fragments were compared with the sizes of DNA bands obtained by PCR screen. The DNA sequences of the selected clones were determined.

    Scale-up expression trials and purification of selected clones

    Plasmid DNA from the selected clones was used to transform E. coli BL21(DE3) pLysS (Invitrogen). Expression trials were performed in 1 L of TB medium (kanamycin and chloramphenicol, 50 μg·mL−1). Cultures were grown to D600 0.6 at 37 °C; the temperature was reduced to 20 °C, and cultures were induced with 1 mm IPTG for 4 h. Cells were harvested by centrifugation (7500 g for 20 min), and resuspended in 20 mL of lysis buffer (50 mm Tris, pH 8, 150 mm NaCl, 5% glycerol, 0.2% Triton X-100). Lysis was performed by sonication or by cell disrupter, and the clarified supernatant was loaded on a 1 mL HiTrap affinity column (GE Healthcare). All purifications were performed on an AKTA Prime system (Amersham Biosciences, Piscataway, NJ, USA), using a gradient of imidazole for the elution (buffers used were: 50 mm Tris, pH 8, 150 mm NaCl, 5% glycerol; and 50 mm Tris, pH 8, 150 mm NaCl, 5% glycerol, 500 mm imidazole). Samples were analysed by SDS/PAGE and western blot, performed using monoclonal antibodies against hexahistidine tag.

    The internally located clones 94ΔC and 90ΔC were cloned with primers starting at residues Glu465 (5′-CACCG AGTTT AATAA TGGGG ATTTG AGC-3′) and Lys414 (5′-CACCA AATTA GACAA CTTGA GCGAG AAAG-3′) and reverse primer 5′-GTTTT TGAGT CCATT ATTATTCTAA TTG-3′, with Lys892 being substituted by a stop codon in the latter. The purified PCR products were cloned into pET151/D-topo vector (Invitrogen) according to the instructions supplied. Protein expression and purification was carried out as above, using BL21(DE3) strains CodonPlus-RIL (Stratagene) or pLysS (Invitrogen).

    Acknowledgements

    We acknowledge Partnership for Structural Biology for an integrated structural biology environment. A. Angelini was supported by grants from Ing. Aldo Gini Foundation and the Italian Ministries of University and Research (MIUR). This work was partly funded by the ESRF ‘In House’ Research program.