Structural basis of the strong cell‐cell junction formed by cadherin‐23

Cadherin‐23, a giant atypical cadherin, form homophilic interactions at the cell–cell junction of epithelial cells and heterophilic interactions with protocadherin‐15 at the tip links of neuroepithelial cells. While the molecular structure of the heterodimer is solved, the homodimer structure is yet to be resolved. The homodimers play an essential role in cell–cell adhesion as the downregulation of cadherin‐23 in cancers loosen the intercellular junction resulting in faster migration of cancer cells and a significant drop in patient survival. In vitro studies have measured a stronger aggregation propensity of cadherin‐23 compared to typical E‐cadherin. Here, we deciphered the unique trans‐homodimer structure of cadherin‐23 in solution and show that it consists of two electrostatic‐based interfaces extended up to two terminal domains. The interface is robust, with a low off‐rate of ~ 8 × 10−4 s−1 that supports its strong aggregation propensity. We identified a point mutation, E78K, that disrupts this binding. Interestingly, a mutation at the interface was reported in skin cancer. Overall, the structural basis of the strong cadherin‐23 adhesion may have far‐reaching applications in the fields of mechanobiology and cancer.


Introduction
Cell-cell adhesion by classical cadherins, a subfamily of cadherin class of proteins, is well-studied [1]. Nonclassical cadherins that comprise more than 80% of cadherins also actively participate in cell-cell adhesion [2,3]. While the physiological significance of nonclassical cadherin-mediated cell-cell junction is well recognized, little is explored on their molecular structures. Cadherin-23 (Cdh23) is one of the giant nonclassical cadherins that forms strong intercellular junctions in nearly 90% of healthy epithelial tissues including the brain, lymph node, kidney, gastrointestinal tract, testis, and skin (The Human Proteome Atlas) [4][5][6]. The strong junction that is mediated by the homophilic trans-interactions of Cdh23 serves as metastasis suppressor for solid cancers including sarcoma, adrenocortical carcinoma, and cervical cancer (The Cancer Genome Atlas, TCGA) [6,7]. Interestingly, Cdh23 is also known for its strong binding with protocadherin-15 (Pcdh15) at the tips of stereocilia in neuroepithelial cells where the complex serves as gating spring under sound stimuli [5,6]. While the structural detail of the heterophilic complex with Cdh23 is understood [8], the molecular details of the homophilic complex are not identified yet. Here, we aim to understand the molecular mechanism of the homophilic interactions of Cdh23 that mediate a robust cell-cell adhesion.
Cdh23 comprises a cytosolic domain, a single-pass transmembrane region followed by 27 extracellular (EC) domains, unlike just 5 EC domains in classical cadherins [9][10][11]. However, similar to classical cadherins, Cdh23 causes cells to adhere using two types of homophilic interactions, trans and cis [4,12]. Electron tomography revealed a unique pattern for the cis-homodimer of Cdh23: a pair of Cdh23 molecules aligned in the same orientation and intertwined to form a helical complex through interactions between all EC domains except few terminal domains [5]. The domains at the N termini that are exposed outward are available for trans-interactions (Fig. 1A). Using single-molecule Forster resonance energy transfer (smFRET) and live-cell aggregation assays, here we first identified the number of domains that participate in the transdimerization. In relevance, the engagement of the two terminal domains in a heterophilic trans-interaction with protocadherin-15 (Pcdh15) was already reported in tip links in the inner ear [8].
Classical cadherins undergo homophilic trans-interactions using the outermost terminal domain, EC1 [13,14]. They first form an X-dimer, a kinetically driven interaction and then converted to a thermodynamically stable strand-swap dimer (S-dimer) via an intermediate with the overlap of the linker region between two terminal domains [15,16]. Among nonclassical cadherins, desmosomal cadherins form an Sdimer [17], whereas T-cadherin and R-cadherin form only X-dimer [18]. Cdh23, however, lacks the sequence determinants for either S-or X-dimerization. Moreover, the EC1 domain of Cdh23 has several unique The schematic indicates that most of the EC domains of Cdh23 are engaged in cisinteractions between proteins from the same cell surface, and only a few N-terminal EC domains are available for trans-interactions to mediate cell-cell junction. (B) Analytical SEC of Cdh23 EC1-2 and Cdh23 EC1-3 showed two peaks, whereas Cdh23 EC1 eluted in a single fraction. The first peak (P1) corresponded to a dimer, and the second peak (P2) corresponded to a monomer. (C) SDS/PAGE for all the SEC fractions of (i) Cdh23 EC1-3, (ii) Cdh23 EC1-2, and (iii) Cdh23 EC1 (A) along with molecular weight ladder is shown here. All the proteins appeared at their respective theoretical molecular weights: Cdh23 EC1-3 (37 kDa) (i), Cdh23 EC1-2 (26 kDa) (ii), and Cdh23 EC1 (15 kDa) (iii). (D) The linear plot between the partition coefficient and molecular weight of standard proteins maps the molecular weight of the proteins eluted at different fractions in the SEC (B). The apparent molecular weights for the two elutions, P1 and P2, of Cdh23 EC1-2 (blue) and Cdh23 EC1-3 (olive green), are marked in the calibration map. The single elution for Cdh23 EC1 is marked in red in the calibration curve. All the SECs were run in triplicate. The error bars represent the standard error of the mean (SEM) with N = 3.  10 -helix just prior to the A* b-strand, an a-helical loop connecting two b-strands, and most strikingly, an additional Ca 2+ -binding site toward the N terminus [19]. Together, these features indicate a unique interface for Cdh23-mediated homophilic trans-interactions. It was therefore imperative to decipher the molecular details of the Cdh23-mediated homophilic trans-interactions.
Using small-angle X-ray scattering (SAXS) in combination with in silico measurements including docking and molecular dynamics, we identified the molecular structure of the trans-homodimer of Cdh23. We verified the binding interface with a single point mutation that impaired the dimer. Finally, using analytical methods including ultracentrifugation (AUC) and single-molecule force spectroscopy (SMFS) with an atomic force microscope (AFM), we estimated the thermodynamic and kinetic parameters of the homophilic complex.

Results
The homodimer of Cdh23 is a trans-dimer interacting via the N termini of the two outermost domains (EC1-2) Two outermost domains of Cdh23 are known to form the trans-heteromeric complex with Pcdh15 in tip links. In order to determine the number of EC domains required for the trans-homodimerization, we expressed Cdh23 with varying lengths of EC domains: first domain alone (EC1), first two domains (EC1-2), and first three domains (EC1-3). All the constructs were expressed in E. coli BL21 RIPL following a reported protocol (Materials and methods) [19] and purified in two steps by Ni 2+ -NTA-based affinity followed by size exclusion chromatography (SEC) (Fig. 1B). We ran SEC for all the three constructs at a high concentration (100 µM) and observed two distributions in elutions, P1 and P2, for EC1-2 and EC1-3 and EC1 eluted as a monomer (Fig. 1B). We subsequently ran SDS/PAGE for all the eluted fractions and observed a single band at 26 kDa (Fig. 1C) for Cdh23 EC1-2 and 38 kDa for Cdh23 EC1-3, suggesting that the P2 and P1 corresponded to the monomer and a higher-order association, respectively. To determine the molecular weights of the elutions, we developed a calibration curve for SEC with standard proteins of varying molecular weights under the same conditions (Fig. 1D). From the standard curve, we then estimated the apparent molecular weights of proteins eluted at P2 and P1 fractions . The apparent  molecular weights corresponded to 27 and 51 kDa for   Cdh23 EC1-2 and 45 and 76 kDa for Cdh23 EC1-3,  respectively. These estimated molecular weights corroborated with the theoretical monomer and dimer  molecular weights, 26 and 52 kDa for Cdh23 EC1-2  and 39 and 79 kDa for Cdh23 EC1-3, respectively, confirming a dimer in the higher-order association. The negligible variations in the apparent molecular weights, as opposed to the theoretical values, may arise from the differences in shapes of Cdh23 than the standard globular proteins. Further, we noticed significantly higher intensity for P1 than P2 for Cdh23 EC1-2 and reversed for Cdh23 EC1-3, though the loading concentration of proteins was the same for both the SEC runs (Fig. 1B). Thus, the higher intensity of P1 than P2 for Cdh23 EC1-2 is indicative of the highest binding affinity for Cdh23 EC1-2 toward homodimerization than the other constructs.
Next, we performed smFRET to decipher the orientations and the extent of overlap of the constituent monomers in the dimer. We used Cdh23 EC1-2 and Cdh23 EC1-3 for the experiment and not Cdh23 EC1, as Cdh23 EC1 showed the weakest affinity toward dimerization in the SEC. Since we performed smFRET on a glass coverslip using a total internal reflection fluorescence microscopy (TIRFM), we covalently anchored the C terminus of the proteins, Cdh23 EC1-2, and Cdh23 EC1-3 individually, to the coverslips using sortagging chemistry as reported [20,21]. All protein constructs were recombinantly modified with the sortase recognition sequence (-LPETGSS) at the C terminus. Prior to the surface attachment, proteins were modified with a donor (D) dye, Cyanine3 (Cy3, k ex = 545 nm; k em = 560 nm), at the N terminus. For N-terminal modification, we recombinantly mutated the valine at position 3 to cysteine (V3C) and attached Cy3 using the thiol-maleimide Michael addition reaction. Thus, irrespective of the Cdh23 constructs, the surface-anchored proteins always had donor dyes at the N terminus. Next, the protein modified surface was incubated with the second batch of proteins for 30 min for the homodimers to form, followed by vigorous washing to remove nonspecific attachments. The protein in solution was tagged with acceptor (A) dye, Cyanine5 (Cy5, k ex = 645 nm; k em = 660 nm), either at the C terminus or N terminus. The N-terminal modification was done using V3C protein constructs, whereas the C-terminal modification was done using sortagging followed by the thiol-maleimide Michael addition (Materials and methods). For all the constructs, the final dye to protein ratio was maintained at 1 : 1. smFRET was measured for two different combinations of each protein construct, N D N A and N D C A . N D N A is N-terminal donor (D) with N-terminal acceptor (A) dye, and N D C A is N-terminal donor (D) with C-terminal acceptor (A) dye. Altogether, we have four such combinations, N D N A and N D C A for Cdh23 EC1-3 homodimers ( Fig. 2A(i)) and Cdh23 EC1-2 ( Fig. 2A (ii)) homodimers. We excited the donor molecules with a 532 nm laser and subsequentially monitored emissions in donor and acceptor channels for 400 s using an EMCCD (Materials and methods). We used ISMS software (The Birkedal lab, Aarhus Universitet) for the analysis of FRET traces at each colocalized spot of single donor and single acceptor (Materials and methods) [22]. The FRET efficiency (E FRET ) was estimated from the intensity ratios between acceptor and donor and plotted as distributions (Fig. 2B, left panel). The unimodal distribution of E FRET for all the combinations indicates a single conformational population of the dimer (Fig. 2B, right panel). We used the peak maxima of the distributions as most probable efficiency (E mp FRET ) and estimated the donor-acceptor separations for each E mp FRET . The maximum E FRET was measured for the N D C A combination of Cdh23 EC1-2 homodimers with an E mp FRET of 0.60 AE 0.007, indicating the closest proximity between the N terminus and C terminus of Cdh23 EC1-2 constructs (Fig. 2C). Interestingly for all other combinations, we measured lower E mp FRET values with negligible differences. The closer association of the N D C A than the N D N A termini in Cdh23 EC1-2 confirms the trans-conformation of the dimer (Fig. 2C). The lower E mp FRET for the N D C A combination of Cdh23 EC1-3 than Cdh23 EC1-2 indicates that the overlap extends only up to EC1-2 domains. Finally, no significant differences in the E FRET distributions of Cdh23 EC1-3 for N D C A and N D N A combinations support an extended overlap of the EC1-2 domains in the transhomodimer as depicted in the models ( Fig. 2A, Table S1). The comparable E mp FRET of N D N A for Cdh23 EC1-2 with Cdh23 EC1-3 further supports the model of extended overlap between EC1-2 domains in the trans-homodimer ( Fig. 2A(i) and Table S1). Thus, we conclude from our smFRET measurements and SEC results that Cdh23 forms trans-homodimer with the two outermost domains (EC1-2) alone. In all our following experiments, we used Cdh23 EC1-2 only, unless mentioned.
Cdh23 EC1-2 can mediate cell-cell adhesion A549 cells endogenously express Cdh23 (full-length, variant-1) [6]. To check the functional significance of Cdh23 EC1-2 concerning the full-length construct, we measured the in vitro binding of A549 live cells on Cdh23 EC1-2-modified surfaces. For the cell adhesion experiment, C terminus of the Cdh23 EC1-2 proteins was covalently attached on the coverslip using sortagging and then incubated with~10 4 numbers of A549 live cells for 2 h, followed by repeated gentle washes with a buffer containing 2 mM of Ca 2+ -ions. We subsequently imaged the coverslips under bright field and observed a large number of cells remained adherent to the surface (Fig. 2D, left panel). Since the cadherinmediated interactions are Ca 2+ -dependent, as a control for the specificity of the cell surface interactions, we monitored the number of cells adhered to surfaces after washing with Ca 2+ -free EGTA buffer (Fig. 2D, right panel). We noticed~86% reduction in the number of cells adhered to the surface (Fig. 2D, Inset) as expected. We further quantified the number of live cells adhered to the surface by colorimetric MTT assay and measured 55.8% fewer cells for EGTA-washed coverslips (Fig. 2E). These studies corroborate with the smFRET results, supporting that EC1-2 domains of Cdh23 are enough to mediate the Ca 2+ -dependent cell-cell adhesion. We extended the demonstration from the aggregation of live cells (HEK-293) transfected transiently with Cdh23 EC1-2 [23]. Untransfected HEK-293 cells were used as a control. For both, cells were pretreated with Cdh23-specific siRNA to silence the endogenous expression of Cdh23. We observed that cells transfected with Cdh23 EC1-2 formed aggregates within 120 min ( Fig. 2F(i)), while the untransfected HEK-293 cells did not show any aggregates ( Fig. 2F(ii)).

SAXS envelope portrays a compact extendedhandshake conformation for the dimer
To reveal the shape of the Cdh23 EC1-2 trans-homodimer in solution, we performed SAXS measurements with proteins at 10 mgÁmL À1 (Fig. 3A), for a q-range of 0.01 to 0. 45 A À1 using SAXSpace instrument (Anton Paar GmbH, Graz, Austria) ( Table S2). The monodispersity of the sample was verified from the linearity of the Guinier plot (Fig. 3A, inset). The folding of the proteins was confirmed from the strong agreement between the normalized Kratky plots (I(q)* (q*R g ) 2 /I(0) vs q*R g ) of the experimental data and the theoretical SAXS profiles of the crystal structures/ models (Fig. 3B, inset). It is pertinent to mention here that the peaks of the normalized Kratky plots deviate from the value of 1.73. This can be attributed to the rod-like shape of the protein, as the theoretical SAXS profile of the docked model also showed the same profile (Fig. 3B, inset) [24]. The molecular weight was estimated as 54 kDa from the volume of correlation (V c )  [25] and 60 kDa using lysozyme as standard. From the indirect Fourier transformation of the SAXS profile using GNOM [26], we generated the pair distribution function (P(r)), which provided the frequency of the interatomic vectors inside the predominant shape of the proteins in real space(r). P(r) curves estimated for dimer showed a maximum linear dimension (D max ) and R g values of 11.6 nm and 3.2 AE 0.2 nm, respectively (Fig. 3B). The D max further supports the extended-handshake conformation for the dimer as predicted from smFRET. The peak position and shoulder profile of the computed P(r) supported the multidomain shape of Cdh23 connected by a nonflexible linker.
To obtain a three-dimensional shape of the dimer, we generated ten independent dummy residue models using DAMMIF [27], averaged using DAMAVER, refined using DAMMIN [28], and compared with each other by calculating the normalized spatial discrepancy (NSD). NSD is a measure of the similarity in the shapes of the models with a value of < 1 indicating that the models are relatively similar and values above 1 indicating that the models are systematically different from each other. The mean NSD between the 10 models was 0.602 with a standard deviation of 0.027, indicating that all the models were very similar to each other. We averaged all ten models and obtained an envelope with dimensions of 14(L) 9 5.76(W) 9 4.1 (H) nm (Fig. 3C). We repeated the modeling protocol several times and validated the protocol as robust and reproducible.

Homodimerization of Cdh23 EC1-2 is not mediated by tryptophan
Next was to identify a dimer structure that could fit the SAXS envelope. We first considered the W-conformation of Cdh23 dimer proposed previously [19]. It was predicted that Cdh23 EC1-2 might form a trans-homodimer through p-stacking of the indole ring of the sole tryptophan at 66th position (W66). The driving force behind such W-mediated interactions is the switch of the W-environment from hydrophilic to more hydrophobic. Since W is known to feature a solvatochromic shift in fluorescence, we designed experiments to probe the W-emission of the monomer and dimer and decipher its role in dimerization. Accordingly, we monitored the steady-state emission of W66 (k ex = 295 nm). As the protein concentration increased from a monomer concentration to beyond the K D (~18 µM) of the dimer (Fig. 3E), we did not observe any shift in W-emission, excluding its possible involvement in dimerization. Further, we examined the mobility of W66 in the monomer and dimer using timeresolved fluorescence anisotropy. Trp in a protein can have two rotational components, a fast rotation around its axis and a relatively slower rotation along with the protein. Dimerization via the p-stacking of the W residues is expected to constrain the free local rotation around its axis and therefore delay the anisotropy decay of the fast component. We probed this anisotropy decay of W66 at two concentrations of proteins ranging from monomer populated solution to dimer dominated solution (Fig. 3F) and observed no change in the decay rate of the faster components This indicated a lack of constraints on the local mobility of W66 and thus confirming that W66 did not play a role in dimerization. However, we observed a significant increase in the decay for the global rotation of W66 upon dilution indicating a decrease in overall size. The decay at low concentration matches with the global rotation of Cdh23 in the presence of EGTA, which is expected to be in the monomeric form ( Fig. 3F and Table S3). Once we confirmed the W-conformation cannot be the dimer, we focused on searching for new structures by docking Cdh23 EC1-2 (PDB ID: 2WHV) using PatchDock. Of the first ten models based on dockscoring, seven models showed docking through the N terminus of the protein and the others through the C terminus. We considered these seven models and superimposed them with the SAXS envelope using SUPCOMB, which showed comparable NSD values varying from 0.7 to 0.8 for all the structures. However, rank 1 appeared as the best fit model from the z-tests based on R g and FRET-based end-to-end distances (P = 0.20) (Fig. 3D and Table S4).
We next performed molecular dynamics (MD) simulations of the rank 1 for 100 ns using GROMACS and delineated residue-wise interactions (Materials and methods, Table S5). Stability of the dimer was achieved within 2 ns of the simulations to an average  (Fig. 4A,B). To identify the residues that are involved in dimerization, we compared the root mean square fluctuations (RMSF) of all residues between the monomer and dimer anticipating that residues buried within interacting surfaces may fluctuate less (Fig. 4C). The residues other than glycine that showed significant fluctuation differences are highlighted (Fig. 4C, inset).
The time trace analysis of the MD results also indicated these residues responsible for the interactions. We divided these residues into two main interacting interfaces based on their positions in the EC domains ( Fig. 4E-J). Interface one is dominated by electrostatic interactions that are mediated by the antiparallel overlap of strand F of the EC1 repeats. The residues involved in the interactions are K76, S77, E78, N97, and Q99, which are conserved in human, mouse, rat, and zebrafish (Fig. 5). Interestingly, S77 was found mutated into L(leucine) in patients who have cutaneous cancer. Some of these residues are also instrumental in heterodimerization with Pcdh15 at tip links. The other interface, which is amphiphilic, is formed by anchoring of the elongated N-terminal strand of EC1 of one monomer to the b-strand (G-strand) of EC2 of the other protein. None of these residues are occluded by glycosylation, indicating that the interfaces are physiologically relevant (Fig. 6A). In vitro cell-binding assays also showed the arrest of A549 cell lines on surfaces coated with Cdh23 EC1-2 wild-type (WT) with no post-translational modifications. To understand the stability of the interface, we introduced a single point mutation at E78 to a positively charged residue, K (E78K). Proper folding of the mutant was verified using SEC and circular dichroism (Fig. 6B,C). To check the formation of the homodimer with mutants on a surface, we performed a live-cell binding assay using A549 cells incubated on a glass-surface 96-well plate premodified with Cdh23 EC1-2 (E78K) as described before (Materials and methods). We observed a significant decrease (78% drop) in the number of cells adhered to Cdh23 EC1-2 (E78K) modified surfaces than WT (Fig. 6D). Quantitative estimation using MTT assay also measured a 35.8% decrease in the total number of viable cells bound to surfaces than Cdh23 EC1-2 (WT)-coated surfaces, suggesting the lack of homophilic interactions for the mutant (Fig. 2E). Similarly, we observed the complete disruption of the dimer structure for the E78K mutant in solution even at a concentration of 10 mgÁmL À1 (440 µM). The R g of E78K obtained from SAXS is 3.0 AE 0.2 nm, corresponded to a monomer (Fig. 7A). The indirect Fourier transform gave a D max of 10 nm (Fig. 7B). Further, we were able to fit the SAXS envelope (Fig. 7C) obtained for E78K to the crystal structure of the monomer (Fig. 7D). These values correlated well with the R g and D max of the monomeric WT protein (2.9 and 10.0 nm, respectively) as observed in the X-ray crystal structure (PDB ID: 2WHV). Also, the molecular weight was estimated to be 23 kDa based on V c and 30 kDa based on the scattering intensity of lysozyme. The deviation from the expected mass while using lysozyme as standard might be arising from an error in concentration. More importantly, both the methods estimated the molecular weight of Cdh23 EC1-2 (E78K) to nearly half of the native, Cdh23 EC1-2 dimer. We were able to fit the SAXS envelopes (Fig. 7C) obtained for E78K to the crystal structure of the monomer and the docked structure by aligning their inertial axis using SUP-COMB (Fig. 7D).
The trans-homodimer of Cdh23 EC1-2 has a high dissociation constant, however, is long-lived Once we deciphered the molecular structure of the homodimer of Cdh23, we were interested to measure the kinetic and thermodynamic parameters of the complex. We performed sedimentation velocity (SV) experiments with Cdh23 EC1-2 (WT) (Fig. 7E,I) using analytical ultracentrifugation (AUC) at two protein concentrations, 10 µM, and 36 µM (Fig. 8A-D), with a rotor speed of 142 000 g at 20°C. The SV run at the higher concentration produced two distributions of sedimentation coefficient (C(s)). The peak that appeared at 2.1 s corresponds to the monomer, while the peak at 4.77 s with a lower frictional coefficient ratio (f/f 0 ) of 1.34 corresponds to the dimer (Fig. 7E (i)). Further, the dimer was confirmed from the estimation of the molecular weight (52 kDa) and R H (4.1 AE 0.3 nm). The similar SV run for Cdh23 EC1-2 (E78K) (Figs 7E(ii) and 8E,F), however, featured only single distribution for a monomer even at a concentration of~440 µM (Fig. 7E(ii)). The R H estimated for Cdh23 EC1-2 (E78K) was 2.9 AE 0.1 nm (Fig. 7E(ii)). We then performed sedimentation equilibrium (SE) using AUC with 40 µM of Cdh23 EC1-2 (WT) protein (and 60 µM) at three different rotational speeds (46 000, 63 000, and 82 000 g) and estimated the K D of monomer-dimer equilibrium as 18 AE 4 µM from the global fitting of the equilibrium curves using free software SEDPHAT (Fig. 8I) [29]. The K D obtained for Cdh23 EC1-2 (WT) is significantly lower than E-cadherin (96.5 AE 10.6 µM) [30], however, comparable with Ncadherin (25.8 AE 1.5 µM) [30] and R-cadherin (13.7 AE 0.2 µM) [31]. Interestingly, we observed only one dominating peak with a C(s) of 1.96 s and a f/f0 of 1.51 corresponding to a monomer when we ran SV with 10 µM for Cdh23 EC1-2 (WT) (Fig. 8C,D), even though the dimer-to-monomer ratio at 10 µM of protein is nearly 1/4 th for a homodimer with K D of 18 AE 4 µM.
No detection of the dimer here could be an artifact from the detection limit of the instrument or could be due to slow binding on-rate of Cdh23 EC1-2 (WT) toward dimerization [32]. Moreover, it is well documented that the K D obtained from SV or SE varies widely for the same proteins and these variations are mainly attributed to the differences in internal pressure in AUC cell arising from different run-time [33][34][35][36].
Since the interface one is mediated by EC1 alone, we checked whether the interface one interacts independently to form the homodimer. As expected, SV with Cdh23 EC1 (WT) showed a population of both monomers and dimers indicating that EC1 alone can form a homodimer with a lower affinity (K D~5 2 AE 7 µM, Fig. 8G,H,J) than EC1-2. Interestingly, we did not observe any dimer in SEC when injected 100 µM of Cdh23 EC1 which should contain 80% of the dimer. The nonappearance of the dimer peak in SEC could be due to in-column dilution of the protein before elution.
Finally, we performed SMFS with Cdh23 EC1-2 (WT) to estimate the strength of the interfaces as in cellulo studies measured strong adhesive properties of Cdh23 that suppresses tumor metastasis. For SMFS, we covalently attached the C terminus of the protein to AFM cantilevers (Si 3 N 4 ) and glass coverslips using the sortagging protocol [37] (Fig. 9A, Materials and methods). A typical single-molecule unbinding force curve (unbinding event, black line) and a no-event (blue) curve that is obtained for SMFS  Fig. 9B. Overall, we observed 8-10% unbinding events, 97% of which featured single unbinding force curves that fit the freely jointed chain (FJC) model (Fig. 9B). These data corroborate well with the Poisson statistics used for single-molecule sorting. We plotted the maximum unbinding forces for seven different loading rates as histograms (Fig. 9C). The contour length (L c ) was estimated as an FJC fitting parameter for each of the individual unbinding curves, which distributed as a single Gaussian centered at 58.7 AE 1.0 nm irrespective of the pulling rate (Fig. 9B, inset). This finding is typical for the stretching of two PEG molecules of 5 kDa molecular weight in series. The unbinding force distributions for Cdh23 EC1-2 (WT), however, showed two well-separated binomial distributions (Fig. 9C). We hypothesized that the two distributions correspond to two different binding states: The low-force distribution is due to fast binding by either of the two interacting interfaces, and the high-force distribution is due to a complete handshake binding between Cdh23 EC1-2 (WT) partners. For verification, we repeated the force spectroscopy with Cdh23-EC1(WT) alone and measured the strength of interface one. As confirmation, the unbinding force distribution obtained for EC1 alone (red, Fig. 9C) perfectly overlaid with the low-force distribution of Cdh23 EC1-2. We also measured a higher probability of events for Cdh23-EC1 (WT) (~10%) than for EC1-2 (WT) (~6%). This may refer to a higher binding on-rate of Cdh23-EC1 (WT) than Cdh23 EC1-2 (WT).
We then plotted the most probable forces (F mp ) obtained from the peak maxima of the distributions with increasing loading rates. The loading rate for each velocity was estimated by considering the molecular tether and cantilever in series (Materials and methods) [38,39]. The data were fitted to the Bell-Evans model (Eq. 6) [40,41], and the intrinsic lifetime (s 0 )  (Fig. 9D,E). For Cdh23 EC1-2 (WT), we measured two F mp values for two force distributions at each loading rate. Our results indicate a very high s 0 of 1224.2 s for Cdh23 EC1-2 (WT) for the high-force distribution and x b of 0.31 nm. The s 0 and x b obtained for the low-force distributions are 1.86 s and 0.55 nm, respectively, which corresponded to the values obtained for Cdh23-EC1 (WT) alone with a 95% confidence interval, corroborating with the SV run. Next, we repeated the force spectroscopy experiments with Cdh23 EC1-2 (E78K) mutant and measured very low frequency of events similar to nonspecific measurements confirming again that Cdh23 EC1-2 (E78K) mutant does not interact with each other. Fig. 9. SMFS of Cdh23 EC1-2 dimers reveals two binding conformations. (A) Schematics show the AFM cantilever and coverslip functionalized with proteins for the single-molecule dynamic force spectroscopy experiments. The C terminus of the protein is covalently attached to the cantilever and coverslip using sortase enzyme chemistry. A mixture of monofunctional and bifunctional polyethylene glycol (PEG, 5 kDa) is used as a spacer to minimize nonspecific and multiple events. (B) A typical single-molecule unbinding force curve featuring a characteristic stretching of PEG is shown in the black solid line. The fit to the freely jointed chain (FJC) model is shown in red. The rupture force was estimated from the peak maximum and contour length (L c ) from the FJC fit. The inset shows a Gaussian distribution for L c for the measurements performed at a 13841 pN/s loading rate. The mean L c was obtained from the Gaussian fit (solid line). The blue line represents no unbinding events. (C) The histograms of unbinding forces with increasing loading rates show a bi-Gaussian distribution for Cdh23 EC1-2 (gray) and uni-Gaussian for Cdh23 EC1 (red). Solid black curves were generated from the bi-Gaussian fitting of the distribution and used as a visual guide. From the peak maxima, we obtain the two most likely forces (F mp ) for Cdh23 EC1-2 and one for Cdh23 EC1 at each loading rate. (D) The dependence of F mp on ln(loading rate) is shown for Cdh23 EC1-2 (black circles for the high-force regime and black triangles for the low-force regime) and Cdh23 EC1 (red circles). The error bars represent the SEM estimated using data from five different experiments (N = 5). Each set of data is fitted to a Bell-Evans model (superimposed solid lines). Dotted lines indicate the 95% confidence intervals of the data. (E) The table summarizes the parameters, x b (maximum distance to unbinding in the reaction coordinate) and s 0 (lifetime at zero force) for EC1 and EC1-2 obtained from the Bell-Evans model fit.

Discussion
Cdh23 mediates stronger cell-cell adhesion via homophilic trans-binding than classical E-cadherin [10]. The implication of such strong adhesion by Cdh23 in physiology is metastatic suppression [6]. It, therefore, became imperative to decipher the structural basis of the strong adhesion. From a barrage of biophysical and biochemical data, here we presented the first view of the homophilic trans-binding structure of Cdh23. The binding interface is mediated by the extended antiparallel overlap of the two terminal domains covering a surface area of 1796.3 A 2 per monomer. Experiments including analytical SEC at 4°C, smFRET at 20°C, AUC at 20°C, SAXS at 20°C, DLS at 20°C were performed to verify and model the homodimer. Irrespective of the methods and their working temperatures, the physical parameters like molecular weight (MW), the radius of gyration (R g ), the radius of hydration (R H ), estimated from the different methods have all converged (Table S6). Even the geometry of the dimer estimated from the R g /R H values from different techniques predicted a nonspherical geometry, as obtained from the SAXS (Table S7). Molecular details from in sillico MD studies further revealed two distinct interfaces in the homodimer: electrostatic interactions between EC1 and EC1 and hydrophobic interactions between EC1 and EC2 of the opponent partners. The extended overlap of domains may impose stronger adhesion as observed in cellulo than the single domain overlap in classical cadherins. In classical cadherins, the thermodynamically stable trans-dimer is formed by the exchange of the N-terminal b-strands of the distal domain alone.
To quantify the adhesive strength of Cdh23 homodimers, we measured the lifetime (s 0 = 1/k off ) of the complex from SMFS experiments using AFM. Two binding conformations were identified: one with interface one or EC1 alone, at a lower force range but with higher binding probability, and the other at a higher force range comprising both interfaces. The strength for interface one was weak (s 0 = 1.86 s) and comparable to that of classical cadherins (Table S8). The strength of the final structure (s 0 = 1224 s) is among the strong interactions between cadherins. The longer lifetime of the trans-dimer of Cdh23 was expected from the extended binding interface. The long lifetime of the Cdh23 homodimer thus clarifies the strong aggregation index measured for cells expressing Cdh23 than the cells adhered via classical E-cadherin. However, the K D measured for Cdh23 is in the micromolar range, comparable to the classical cadherins (Table S8). The similar trend was also observed for nonclustered protocadherins (protocadherin-19, for example), which too forms homodimers with extended overlap between multiple EC domains with K D in micromolar ranges [2]. The disparity in the K D values with the area of the interacting interface may be attributed to the slow on-rate (~10 6 M À1 s À1 ) of the interactions which can be computed from the ratio of the offrate and K D .
The interface of Cdh23 mediated trans-homodimer is amphiphilic and involves residues that are conserved across a variety of species (Fig. S2). A missense mutation at the binding interface was identified in patients who have cutaneous cancer, indicating a physiological relevance of the interface. Further, the interface was validated by introducing a single point mutation (E78K), which impaired the dimer complex.
Deciphering the contribution of nonclassical cadherins in cell adhesion is gaining interests, especially the giant cadherins with long extracellular domains [42][43][44]. Primary focus on this area has been to identify nonclassical cadherin mediating cell adhesion junctions [45][46][47], understand their packing conformations at the junction [42,48], the function of the junction [44,49,50], and the molecular structure of the junction [8]. Deciphering the molecular structure of the transhomodimer of Cdh23 is an essential and timely observation in this context, which may pave the way for understanding the molecular details of the strong cell adhesion junctions by atypical cadherins.

Analytical size exclusion chromatography
We injected 100 lM of purified Cdh23 EC constructs on Superdex 200 increase 10/300 column (GE Healthcare) at a flow rate of 0.3 mLÁmin À1 . Prior to the loading of the protein, the column was washed with degassed ultrapure water and equilibrated with degassed SEC buffer (25 mM HEPES, 25 mM KCl, 50 mM NaCl, 2 mM CaCl 2 , pH 7.5). The column was calibrated using standard protein mix kit (Merck-Sigma) with proteins, cytochrome-C (12 000 Da), carbonic anhydrase (29 000 Da), albumin (66 000 Da), and alcohol dehydrogenase (76 000 Da). The standard calibration curve was generated by estimating the K av (partition coefficient) using the equation, VeÀV0 VcÀV0 , where V e is the elution volume (V e ), V 0 is void volume, 8 mL for a 24 mL (V c ) Sephadex 200 Increase size exclusion column. The eluted fractions for all the proteins were run on SDS/PAGE for molecular weight confirmation.

Single-molecule FRET measurements
For N-terminal labeling with dyes, we recombinantly modified valine at position 3 to cysteine (V3C) and attached to maleimide dyes (supplied by Lumiprobe) using the thiolmaleimide Michael addition reaction. The unreacted dye was removed using spin columns (10 kDa MWCO). The labeling ratio was measured from the absorbance at 280 nm for protein, 545 nm for Cy3, and 645 nm for Cy5. For C-terminal labeling, we followed sortase A (srtA)-mediated enzymatic reaction. For sortagging, we recombinantly modified the C-terminal of the proteins with -LPETGGS. SrtA recognizes the sequence and inserts polyglycine by cleaving the T-G peptide bond. To introduce the dye, we used dye-modified polyglycine (GGGC-dye) [31].
smFRET measurements were performed using IX83 P2ZF inverted microscope (Olympus, Shinjuku, Tokyo, Japan) combined with IX3 TIR MITICO TIRF illuminator equipped with 532 nm diode laser system for cy3 excitation and 645 nm diode laser system for cy5 excitation. Fluorescence was collected using an oil-immersion objective (60X, NA 1.45, Olympus) into an EMCCD camera (Q-Imaging Roller Thunder, Surrey, BC, Canada). Image acquisition and processing were performed using CellSens Dimension (Olympus) software. iSMS software was used to localize the single-molecule dye pairs on the surface and then analyzed for their intensity profiles. The background subtraction and drift correction were also done using the same software. The FRET efficiency for each pair of donor-acceptor proteins was estimated using the same software. From the efficiency distributions, we estimated the distance between the FRET pairs using the following equation, For smFRET on surfaces, glass coverslips were freshly cleaned, silanized, pegylated, and finally modified with Cterminal of proteins specifically following a sortagging protocol as described elsewhere [31].

Live-cell binding and cell-cell aggregation assays
For the live-cell binding to surfaces, glass-bottom 96-well plates were cleaned, silanized, pegylated, and modified with proteins specifically. The C-terminal of the protein was attached covalently to the surfaces using sortagging as described before. Protein-coated surfaces were incubated with live A549 cells (10 4 cells) per well in 2 mM Ca 2+ buffer. After an incubation of 2 h, the surfaces were gently washed twice for 5 min with HEPES-Ca 2+ buffer and imaged for assessing cell density. The Ca 2+ from the surface was chelated by incubating with 1 mM EGTA, and again, the cell density was monitored. To number of live cells adhered to surfaces was quantified using MTT reagent. MTT reagent was added to each well and incubated for an hour. The cells were then lysed with DMSO, and the OD was monitored at 570 nm.
For cell-cell aggregation assay, the cells 36 h after the transfection with Cdh23 were counted and resuspended in HBSS buffer supplemented with Ca 2+ ions to a final cell count of~10 5 cells. The aggregation was initiated by incubating the cells at 80 rpm and then imaged using bright field at 109 magnification using a Leica microscope.

SAXS data acquisition and analysis
The SAXS data were acquired for a q-range of 0.1 to 4 A À1 on a SAXSpace instrument (Anton Paar GmbH). The X-ray scattering setup had a slit-collimated X-ray source with a wavelength of 0.154 nm. The data were collected on a Mythen (Dectris, Baden-Daettwil, Switzerland) detector placed at a distance of 317.6 mm from the sample for 60 min (20 min 9 3 frames). SAXStreat software was used to calibrate the data for the beam position. The SAXS-QUANT software was then used to subtract buffer contribution, set the usable q-range, and desmear the data using the beam profile. For each experiment, 100 µL of protein solution (Cdh23 EC1-2 and its mutant E78K) and their corresponding buffers were exposed to X-rays in a quartz capillary at a temperature of 10°C. Data processing provided the scattering intensity (I) as a function of momentum transfer vector q (q = 4psinh/k, where h and k are the scattering angle and the X-ray wavelength, respectively). The normalized Kratky plots (I(q) 9 q 2 (q 9 R g ) 2 /I(0) vs. q 9 R g )) were made from the SAXS data using the program SCATTER (http://www.bioisis.net/) to interpret whether the protein remains folded during the SAXS data collection. The Guinier approximation was carried out using PRI-MUSQT [24] of ATSAS 2.7 suite of programs [51] to estimate the radius of gyration (R g ) of the major scattering species. Using GNOM program [26], we carried out the indirect Fourier transformation of SAXS data to obtain the probability distribution of the pairwise vectors (P(r) curve) arising from scattering of the protein molecule in solution. The P(r) curve analysis was done to acquire the maximum linear dimension (D max ) and the R g in real space.

Shape reconstruction
Ten independent models were generated using DAMMIF [27] program. The models were aligned, averaged, and filtered using DAMAVER [52] suite of programs. The averaged envelope was further refined using DAMMIN [28] program. This procedure provided an envelope that reflected the shape of a protein molecule in solution.

Protein docking
PatchDock is an online tool that follows rigid-body docking optimization with shape complementarity and twopoint interactions between hot-spots. Hot-spots are decided based on residues that are conserved in protein-protein interaction surfaces and mediate salt-bridge type interactions, H-bonding, hydrophobic interactions, aromatic pistacking etc.
The available crystal structure of monomer Cdh23 EC1-2 was docked using PatchDock server [53] to generate the homodimer. Z-test was performed using SAXS-based parameters like linear dimension and R g and FRET-based end-to-end distances between the docked models and SAXS-based envelop (Table S7). The structures which agreed the most with the SAXS-based envelope of the protein were overlaid by computationally aligning using SUP-COMB [54] program. Program PYMOL [55] was used for graphical analysis and figure generation.

Steady-state fluorescence and time-resolved anisotropy experiments
The fluorescence properties of Trp in proteins were probed by exciting at 295 nm and monitoring the emission for 310-450 nm (k em = 338 nm) at 25°C using Jobin Yvon Fluoromax-4 spectrofluorometer equipped with a PMT detector. The slit width (2 nm), step size (0.1 nm), and integration time (0.05 s) were maintained for all experiments.
Fluorescence anisotropy decay measurements were performed using TCSPC (Fluorocube, Horiba Jobin Yvon, Kyoto, Japan). For decay measurements, the emission polarizer was set to 0°with respect to excitation polarizer for parallel measurements and at 90°for perpendicular measurements. A 293 nm laser diode was used as an excitation source, and the emission monochromator for tryptophan was fixed at 342 nm, at a slit width of 8 nm. The instrument response function was measured using 2% LUDOX (Sigma-Aldrich).
The anisotropy decay was calculated using Equation (2) and was fitted to biexponential decay fit to determine the values of rotational correlation times using Equation (3). The values are tabulated in Table S6.
where I ⊥ is the vertical emission and I ‖ is the horizontal emission.

MD simulations
MD simulations were performed on the in-house workstation. The crystal structure of the docked model which agreed the most with the SAXS-based envelop was used for simulations. Simulations were performed with GRO-MACS 5.0.1 using All-atom OPLS force field and TIP4P water model. All analyses were performed using VMD. For MD simulations, the Cdh23-dimer model was placed at the center of a 13.6 9 5.7 9 5.8 nm triclinic box filled with four-point charge water molecules such that no atom of the protein was closer than 1 nm from the walls of the box. The MD equilibration system consisted at an average of 54 435 atoms. System charge neutrality of the system was maintained by adding Na + counterions to the box as needed with buffer ions (50 mM NaCl, 50 mM KCl, and 2 mM CaCl 2 ). To ensure that the solvated Cdh23-dimer model has no steric clashes or inappropriate geometries, we first energy minimized the system. Periodic boundary conditions were assumed in all simulations. A cutoff of 1 nm was used for van der Waals interactions. Electrostatic interactions were calculated with a particle mesh technique for Ewald summations with a 1 nm cutoff. We equilibrated the water molecules and ions around the Cdh23-dimer model in two phases. In the canonical (NVT) phase, the system was established at a constant reference temperature of 300 K. The pressure of the system was then stabilized under isothermal-isobaric (NPT) conditions. Following equilibration, 100 ns MD simulations were run at 2 fs integration steps and frames were recorded at 1-ps interval. We repeated the simulations 3

Analytical ultracentrifugation
Analytical ultracentrifugation (AUC) experiments were carried out using a Beckman XLA/I ultracentrifuge and equipped with a Ti50An rotor using 12-mm six-channel cell centerpieces with sapphire windows and detection by UV at 280 nm. Both sedimentation velocity and equilibrium experiments were performed at 20°C at pH 7.6 buffer containing 25 mM HEPES, 50 mM NaCl, 25 mM KCl, and 2 mM CaCl 2 . Prior to each experiment, the protein sample was dialyzed in the buffer for 18 h. Sedimentation velocity experiments were performed for all proteins at a rotor speed of 142 000 g. 300 scans were taken consecutively. Data were analyzed using SEDFIT software following continuous C(s) distribution, C(M) distribution, and C(s), (f/f 0 ) models. For equilibrium, samples were subjected to fast spins at 99 000 g for 20 h to achieve rapid equilibrium. Then, we took one scan after each one-hour interval for 5 h and checked the rmsd fluctuations to test whether samples had reached equilibrium or not. Then, we decreased the rotor speed to 32 000 rpm and wait for 4 h before taking 3 scans consecutively. The same procedure has been followed for two other speeds 28 000 and 24 000 rpm, respectively. Buffer viscosity and density were measured using SEDNTERP (http://sednterp.unh.edu/). SEDPHAT was used to estimate the dissociation constant. We performed global fitting with mass conservation following monomer-dimer association model keeping baseline, meniscus, bottom, binding affinity as floating parameters.
For quantitative estimation of force from each experiment, the spring constant of the cantilever was measured from the thermal noise using thermal fluctuation methods [56]. For dynamic force spectroscopy, numerous force-distance curves were recorded at different pulling rates (500, 750, 2000, 5000, 7500, 10 000, and 15 000 nm/s) while keeping approach and retract distance of 200 nm at 6 kHz sampling rate and a contact time of 500 ms constant. A total of around 6000 curves were recorded at each velocity.
The analyses for the plotted force curves were done in MATLAB with home-written programs. The single-molecule events were selected from the fit to freely joint chain (FJC) model using the following equation: where a is Kuhn length, L c is contour length (CL), and l(F) is stretching of PEG at every force. The unbinding forces were estimated from the peak maxima of the singlemolecule unbinding events for each loading rate and plotted as distribution. Bin width was estimated from Scott's method [57]. Each force distribution was hence fitted to Gaussian distribution, and the most probable force (F) for each loading rate was obtained from the fit. Loading rate at each velocity was calculated using the following equation [39] 1 where a is Kuhn length, v is pulling velocity, k c spring constant of the cantilever, L c is the contour length, and v F is the loading rate. The most probable force (F) with loading rate (v F ) was fitted to Bell-Evans model [40,41] and estimated the kinetic parameters like off-rate (k off ), transition distance (x b ) using the equation: Nonspecific binding rates were estimated in two different ways: by modifying either of the two surfaces (the cantilever or the coverslip) identically but without attaching proteins or by performing the same single-molecule protein-protein unbinding experiments in the absence of Ca 2+ ions in the Chelex buffer and EGTA buffer. In both cases, the nonspecific events accounted for < 0.5% of the total number of total selected PEG stretching events. for funding. J.S.S. is thankful to the Centre of Excellence (COE) in Frontier Areas of Science and Technology (FAST) program of the Ministry of Human Resource Development, Government of India for financial support. M.K.S. thanks the Wellcome Trust/ DBT India Alliance.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Multiple sequence alignment showing key residues driving homodimerization in type I and type II cadherins versus Cdh23  Table S1. End to end distances obtained from smFRET data analysis along with photo-physical properties of fluorophores. Table S2. Data collection parameters are tabulated along with the software used for analysing the scattering data. Table S3. Determination of the rotational correlation decay time of Trp66 (W66) at three different concentrations of Cdh23 EC1-2 WT. Table S4. Quantitative comparison of different Patch-Dock structures. Table S5. Parameters used during MD simulations. Table S6. Physical parameters of the trans-homodimer of Cdh23 EC1-2 (WT), estimated from various techniques. Table S7. Estimation of R g /R H of the trans-homodimer of Cdh23 EC1-2 (WT) obtained from various methods. Table S8. A comparison of dissociation constants and off-rate values for first two domains of various cadherins.