Three‐dimensional structures of avian beta‐microseminoproteins: insight from the chicken egg‐specific beta‐microseminoprotein 3 paralog

Beta‐microseminoproteins (MSMBs) are small disulfide‐rich proteins that are conserved among vertebrates. These proteins exhibit diverse biological activities and were mainly reported to play a role in male fertility, immunity, and embryogenesis. In this work, we focused on the chicken MSMB3 protein that was previously depicted as an egg antibacterial protein. We report that MSMB3 protein is exclusively expressed in the reproductive tissues of laying hens (in contrast to chicken MSMB1 and MSMB2 paralogs), to be incorporated in the egg white during the process of egg formation. We also showed that chicken MSMB3 possesses highly conserved orthologs in bird species, including Neognathae and Palaeognathae. Chicken MSMB3 was purified from egg white using heparin affinity chromatography and was analyzed by top‐down and bottom‐up proteomics. Several proteoforms could be characterized, and a homodimer was further evidenced by NMR spectroscopy. The X‐ray structure of chicken MSMB3 was solved for the first time, revealing that this protein adopts a novel dimeric arrangement. The highly cationic MSMB3 protein exhibits a distinct electrostatic distribution compared with chicken MSMB1 and MSMB2 structural models, and with published mammalian MSMB structures. The specific incorporation of MSMB3 paralog in the egg, and its phylogenetic conservation in birds together with its peculiar homodimer arrangement and physicochemical properties, suggests that the MSMB3 protein has evolved to play a critical role during the embryonic development of avian species. These new data are likely to stimulate research to elucidate the structure/function relationships of MSMB paralogs and orthologs in the animal kingdom.

Beta-microseminoproteins (MSMBs) are small disulfide-rich proteins that are conserved among vertebrates. These proteins exhibit diverse biological activities and were mainly reported to play a role in male fertility, immunity, and embryogenesis. In this work, we focused on the chicken MSMB3 protein that was previously depicted as an egg antibacterial protein. We report that MSMB3 protein is exclusively expressed in the reproductive tissues of laying hens (in contrast to chicken MSMB1 and MSMB2 paralogs), to be incorporated in the egg white during the process of egg formation. We also showed that chicken MSMB3 possesses highly conserved orthologs in bird species, including Neognathae and Palaeognathae. Chicken MSMB3 was purified from egg white using heparin affinity chromatography and was analyzed by top-down and bottom-up proteomics. Several proteoforms could be characterized, and a homodimer was further evidenced by NMR spectroscopy. The X-ray structure of chicken MSMB3 was solved for the first time, revealing that this protein adopts a novel dimeric arrangement. The highly cationic MSMB3 protein exhibits a distinct electrostatic distribution compared with chicken MSMB1 and MSMB2 structural models, and with published mammalian MSMB structures. The specific incorporation of MSMB3 paralog in the egg, and its phylogenetic conservation in birds together with its peculiar homodimer arrangement and physicochemical properties, suggests that the MSMB3 protein has evolved to play a critical role during the embryonic development of avian species. These new data are likely to stimulate research to elucidate the structure/function relationships of MSMB paralogs and orthologs in the animal kingdom.
The human prostate secretory protein 94 (PSP94, also named beta-MSMB, or beta-inhibin) is considered as the archetype of MSMB proteins. Its amino acid sequence has been published in 1985 [13]. The human PSP94 is a sperm-coating antigen isolated from human seminal plasma, and its high expression in the prostate gland [14] has motivated many researchers to evaluate its relevance as a biomarker of prostate cancer [15]. Regardless of the species, MSMB proteins have been essentially identified in mucous glands and secretions [16]. Their known biological functions are essentially associated with male fertility and include spermatozoon maturation/capacitation (by binding to cysteinerich secretory protein, CRISP3) and acrosome reaction [17][18][19][20][21][22][23][24]. Only one article reports a role in female reproduction [25]. Besides these physiological functions, some MSMB proteins are assumed to participate in innate immunity, considering their activity against Candida pathogenic yeasts and bacteria [26,27], while other MSMB proteins were reported to display lymphocyte-stimulating activities [28,29]. In parallel, some members of this protein family bear antitoxin properties, through the binding to secretory toxins that are present in snake venoms [10,21]. In avian species, a MSMB protein has been identified in the pituitary gland of ostrich, but its physiological function has not been characterized yet [7]. Three chicken paralogs named MSMB1, MSMB2, and MSMB3 localized on chromosome 6 and flanked by WASHC2C (alias FAM21C) and NPY4R (alias PPYR1) genes have been described previously [1]. The function and the tissue distribution of chicken MSMB1 (LOC101750594) are not known. In contrast, chicken MSMB2 (LOC100858647) has been identified in the eggshell [30] and in both sperm and seminal plasma of male chickens [31]. The localization of chicken MSMB2 in male semen is consistent with a potential role of chicken MSMB2 in male fertility, similarly to mammalian MSMBs. Chicken MSMB3 (LOC101750704) was first purified from egg white and was reported to exhibit antibacterial activity against Listeria monocytogenes and Salmonella enterica Enteritidis [26,32]. To our knowledge, chicken MSMB1 and MSMB2 have never been identified in egg white, nor in egg yolk. From these scarce data in avian species, the functions of chicken MSMBs in male reproduction and immunity resemble those described for mammalian MSMBs. Interestingly, some published articles underlined a potential role of chicken MSMB proteins in the early stages of chicken embryonic development, specifically during the formation of mesodermal structures [33]. In addition, a homolog of chicken MSMB2 that was characterized in amphioxus (29% protein sequence identity) was reported to be potentially involved in the differentiation of ectoderm during embryonic development [34], and likewise, in Xenopus, a MSMB protein was shown to be essential to regulate neural crest migration [35]. The high variability in MSMB protein sequences that has arisen during speciation is likely associated with distinct physicochemical properties and potentially distinct tridimensional structures, which may ultimately result in diverse biological activities. As an example, the heparin-binding domain of chicken MSMB3 seems to be involved in the antibacterial activity of the protein [26].
In the present article, we focused on the three chicken MSMB paralogs. We first evaluated the tissue specificity of the three paralogs in male and female chicken tissues. We also compared the chicken MSMB1, MSMB2, and MSMB3 protein sequences and searched for MSMB3 orthologs in other avian species. We showed that MSMB3 protein sequence is highly conserved in bird species, in contrast to the other MSMB proteins that are present in many vertebrates. MSMB3 purified from chicken egg white has been analyzed by mass spectrometry to verify its protein sequence and identify proteoforms, and by NMR to assess its behavior in solution. The X-ray structure of MSMB3 has been solved and compared to (a) published structures and (b) chicken MSMB1 and MSMB2 model structures built by homology modeling. Altogether, our data highlight some MSMB3specific features, which suggest that this protein plays a crucial role in avian reproduction, although its precise function in the egg remains puzzling.

Results
The chicken genome contains three MSMB genes localized on chromosome 6 The exact localization of MSMB1, MSMB2, and MSMB3 on chicken chromosome 6 remains controversial. MSMB1 (LOC101750594) and MSMB2 (LOC100858647) are co-localized within a 30 to 35 kb locus on chromosome 6 (6:18655953-18660462 and 6:18666862-8670817), regardless of the genome assembly version. However, MSMB3 was identified in Gal-lus_gallus 4.0 assembly but was withdrawn in the latest genome assemblies (5.0 and 6.0; Fig. 1), due to lack of supporting evidence in the current genome build. But using Batch Coordinate Conversion/liftOver [36], the genomic region corresponding to MSMB3 gene in Gal-lus_gallus 4.0 assembly (chr6:17390530-17392831) could still be identified in the genomic region corresponding to chr6:18677529-18679830 in Gallus_gallus 6.0 assembly/ GCA_000002315.5, but lacks annotation in the assembly currently available.
The protein sequence of chicken MSMB3 is highly conserved in avian species The comparison of protein sequences of the three chicken paralogs indicates a moderate sequence identity ranging from 34% (MSMB2 and MSMB3) to 42% (MSMB1 and MSMB2; Fig. 2A). Among chicken paralogs, chicken MSMB2 possesses the highest percentage of sequence identity with human and porcine MSMBs (46.8% and 46.1%, respectively; Fig. 2A) as well as conserved motifs such as DXKG, HXXN, ISCC, VVEKXD, KTC, CYFXP, and the last W residue that are not recovered in chicken MSMB1 and MSMB3. These observations suggest that chicken MSMB2, porcine, and human MSMBs may be orthologous genes. As expected, using BlastP program [37], the highest sequence similarities with chicken MSMB3 were found with proteins from bird species. Figure 2B illustrates the alignment of six sequences of avian MSMB3 encompassing Neognathae (two Galloanserae and two Neoaves) and Palaeognathae (including two flightless lineage ratites) subclasses. The percent identity matrix comparing these avian MSMB3 sequences reveals that they share at least 80% sequence identity, which indicates that this protein is highly conserved among bird species. To our knowledge, there are no MSMB3 orthologs in nonavian species.
Chicken MSMB3 is essentially expressed in the magnum tissue that is responsible for egg white formation The expression of chicken MSMB genes was analyzed in various reproductive and nonreproductive tissues of hens (adult female chickens), in the liver, and in reproductive tissues of roosters (adult male chicken), to appreciate their tissue specificity. MSMB1 gene is almost exclusively expressed in the liver of both male and female chickens (Fig. 3A,D), while it is barely detectable in reproductive tissues. Among the three chicken MSMB genes, MSMB2 seems to have the broader tissue expression pattern, as its expression is detected in several tissues [duodenum, lung, female reproductive organs, including theca, magnum, white isthmus, uterus, and vagina ( Fig. 3B)]; however, a very weak expression of MSMB2 is observed in the male reproductive tract (Fig. 3E). In contrast to the two other MSMB paralogs, MSMB3 is expressed in the magnum and to a lesser extent in the white isthmus, which are involved in the secretion of proteins composing the egg white and the eggshell membranes, respectively (Fig. 3C). It is noteworthy that the expression of MSMB3 is barely detectable in male tissues (Fig. 3F).

Mass spectrometry analysis unveiled the presence of MSMB3 homodimer
Chicken egg MSMB3 purified from egg white was analyzed by top-down proteomics using matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry and micro-liquid chromatography coupled to high-resolution mass spectrometry (µLC-MS).
Firstly, top-down analysis was performed without any reduction or alkylation in order to characterize whole and intact proteoforms. The MALDI-TOF spectrum of the MSMB3 fraction showed several m/z peaks related to MSMB3 protein (Fig. 4). Interestingly, the major m/z peak at 9858.88 is not consistent with the theoretical [M + H] + average mass of MSMB3, considering the five disulfide bridges (9916.66, Table 1). This value would rather coincide with the monocharged average mass of MSMB3 sequence with five disulfide bridges, lacking a G residue at its carboxy-terminal extremity (MSMB3(-G)) that displays a theoretical average mass at 9859.61 Da (  Fig. 4, insert). In addition, it is noticeable that the peak at 19716.31 m/z is consistent with a dimer of MSMB3(-G).
The chicken egg MSMB3 fraction was further analyzed by µLC-MS to characterize intact MSMB3 proteoforms with a better mass accuracy (< 100 p.p.m. as observed by MALDI-TOF). From the chromatogram (Fig. 5A), combined spectra obtained between 30 and 32, 32 and 33, and 33 and 34 min of retention time ranges were allowed to characterize the MSMB3(-G) and the dimer, the MSMB3(-G) and MSMB3, all proteoforms with salt adducts that include one sodium and two potassium ions, respectively ( Fig. 5B1-B3). However, modification with a delta mass of −1 Da on two lysines and especially on K63 corresponding to a potential oxidation of the lysine residue was identified (Fig. S3). These mass deviations may explain the delta mass of 2 Da that was observed for the homodimer by MALDI-TOF and µLC-MS (see previous paragraph).

MSMB3 homodimer is confirmed in solution by NMR experiments
The homodimeric state of MSMB3 in solution was investigated by NMR spectroscopy. First, we checked the sample quality by one-dimensional 1 H NMR and 2D 1 H total correlation spectroscopy (TOCSY; Fig.  S4A,B, respectively). The 2D 1 H TOCSY spectrum  showed good dispersion in the amide region, which indicates a highly structured protein. 1 H α chemical shifts observed in the 4.5-5.7 p.p.m. region support that the protein is mainly in a β-sheet conformation. However, two types of correlation peaks coexisted in the amide region: Some are very thin and intense, and some are very broad. This feature can be related to an exchange process at the NMR intermediate timescale, which can be due to conformational changes within the protein, or to a monomer-dimer exchange that can only be detected on the residues involved at the dimer interface. Determination of the translational diffusion coefficient from pulsed-field gradient NMR experiments (DOSY) enabled us to determine that the protein exhibited a diffusion coefficient of 1.29 10 −-10Ám2 Ás −1 . As compared with the diffusion coefficient values found in the literature (Table S1), this value is compatible with a dimeric form of the protein. Since high concentration could stabilize a dimeric conformation, the sample was then diluted to a 50 μM concentration. No quantitative impact on the relative intensities of the peaks was observed on the 2D TOCSY experiment and on the measured diffusion coefficient. To ensure that the dimeric conformation is present in solution, we built a calibration curve (Fig.  S4C) and compared it to the same curve obtained with the values from the literature (Fig. S4D). Both value sets can be represented by the power law 3 10 −9 Mw −0.344 and 3 10 −9 Mw −0.347 with a R 2 of 0.997 and 0.976, respectively. As shown in Fig. S4C, the measured diffusion coefficient for the MSMB3 is on the calibration curve for a molecular mass corresponding to a dimeric state of the protein. It clearly supports that MSMB3 protein adopts a dimeric conformation in solution.
The X-ray structure of chicken MSMB3 shows the homodimer structure and reveals subtle structural differences compared with MSMBs from nonavian species The X-ray 3D structure of MSMB3 purified from chicken egg white was solved at 2.14Å resolution ( Table 2). We observe the presence of two molecules in the asymmetric unit (named monomer A and monomer B) sharing a similar fold (RMSD = 1.4Å on all Cα atoms). MSMB3 consists of two domains (aminoand carboxy-terminal) linked together by a linker peptide. The spatial arrangement of the two domains is likely to be stabilized by the disulfide bridge (C30-C66; Fig. 6A). The amino-terminal domain (residues 1-45) possesses a four-stranded antiparallel βsheet (β1, β4, β5, and β6), which topological structure corresponds to a Greek key motif, and a small twostranded β-sheet (β2 and β3) inserted between β1 and β4 strands. The four-stranded β-sheet is stabilized by two disulfide bonds that connect the β6 strand to β1 (C2-C43) and β5 (C33-C42). A third disulfide bond (C12-C35) contributes to constrain the small twostranded β-sheet (β2 and β3). The carboxy-terminal domain (residues 46-86) has one antiparallel β-sheet composed of two strands (β7 and β8) and one disulfide bond (C57-C80) linking the carboxy-terminal residues to the β-sheet. Electron density was missing for the last residues G85 and V86 of MSMB3(-G) likely due to their high mobility. The 3D structure of chicken MSMB3 is globally similar to human (PSP94, hMSP) and porcine (pMSP) MSMBs (RMSD = 1.68, 2.49, and 3.00 Angströms, respectively; Fig. 6B). The main difference resides in the carboxy-terminal domain in which only one double-stranded β-sheet is present in MSMB3, whereas the human and the porcine proteins have two doublestranded antiparallel β-sheets. This missing β-strand is due to the predicted mobility of the last two MSMB3 residues G85 and V86 (G87 is missing in MSMB(-G)). However, the last three residues of PSP94 carboxy-terminal extremity are W92, I93, and I94 with W92 (W85, I86, and I87 in Fig. 7C) lying over the top of two hydrophobic residues (P56 and V89). This particularity is assumed to stabilize the carboxy-terminal residues of PSP94. Despite the difference in the number of secondary structures, the fold of the carboxy-terminal domain of MSMB3 is conserved between species (Fig. 6B). The absence of this β-sheet in MSMB3 structure is also seen for Viperidae snake small serum protein 2 (SSP-2), which has a shorter carboxy-terminal domain than the other members of the MSMB family.
As two MSMB3 molecules were present in the asymmetric unit of the MSMB3 crystals, we investigated the protein quaternary assembly using the PISA server (https://www.ebi.ac.uk/pdbe/pisa/). Results show that monomer A and monomer B may form a stable homodimer in solution with a buried area of 867Å 2 that is stabilized by a total of eight H-bonds, including one salt bridge. The two monomers are held together mainly via their respective carboxy-terminal domains, primarily through a cluster of hydrophobic residues (V46, I48, V58, L60, F61, and I68; Fig. 7A). Four H-bonds are detected between the main chain atoms of residues G59 and F61. Another H-bond between atoms P49(A) O and K63(B) Nζ reinforces the homodimer stability. Concerning the amino-terminal part of the dimer, fewer contacts are found between these two domains. Two H-bonds involve the residues Y31 and Y3, and one salt bridge is found between residues R1(A) and D29(B) (Fig. S5). Most of those residues are well conserved within the MSMB3 family but not in other MSMB proteins (Figs 2 and  7C).
The MSMB3 dimer observed here is different from the one seen in PSP94 crystals (Fig. 7B,C). In PSP94, the dimer buried area is about 930Å 2 and the dimeric association requires the β10 strand, which is absent in MSMB3. In addition, three sulfate ions arising from the crystallization conditions and bridging the carboxy-terminal domains of the two monomers were observed in the electron density maps (Fig. S6). Sulfate  The chicken MSMB1 and MSMB2 models (built by homology modeling) display distinct electrostatic potentials compared with chicken MSMB3 Because the calculated isoelectric point (pI) values of the three chicken paralogs are very different (pI = 4.7 for MSMB2, pI = 8.4 for MSMB1, and pI = 9.3 for MSMB3), we expect differences in the electrostatic distribution at the molecular surface of these proteins. To achieve this comparison and because the 3D structures of MSMB1 and MSMB2 were not available, we built the 3D models of both proteins by homology modeling using X-ray MSMB3 as the template. Figure 8 illustrates the solvent-accessible surface of MSMB proteins colored according to electrostatic potential values (blue: positive charges; red: negative charges). Unlike PSP94, pMSP, and SSP-2 where the positive and negative charges are clustered, the positive charges of MSMB3 and the negative charges of MSMB2 are evenly distributed on the protein surfaces. To a lesser extent as compared with MSMB3, MSMB1 also exhibits an even distribution of positive charges at its molecular surface.

Discussion
Microseminoprotein proteins are widely distributed in the animal kingdom and display a broad range of biological activities. MSMB genes are present in one single copy in most mammals, while multiple copies have been identified in marsupials (14 paralogs (three paralogs), and chicken (three paralogs) [1,4]. The biological significance of these numerous MSMB proteins in some species is still obscure. Chicken MSMB3 paralog has been recently purified, and its protein sequence has been corroborated by mass spectrometry [26]. However, the genomic localization of the related gene remains controversial because of the withdrawal of MSMB3 gene in the two last chicken genome assemblies (Fig. 1). This inconsistency will need to be further corrected as the absence of this chicken MSMB3 in protein databases, including the National Center for Biotechnology Information databank, may introduce some bias when performing proteomics on egg-derived samples. Therefore, the presence and relative abundance of chicken MSMB3 in chicken egg may have been underestimated in the last decade (corresponding to the release of chicken genome assemblies 5.0 and 6.0). The present article provides further compelling evidence that MSMB3 gene and protein product both exist in the chicken species and, likely, in many other avian species. Indeed, this work reveals not only that MSMB3 orthologs are present in other bird species, including Palaeognathae and Neognathae with a high percentage of protein sequence identity (at least 80%, Fig. 2B), but also that MSMB3 gene is highly expressed in the chicken oviduct (Fig. 3) related to egg formation and avian reproduction. More importantly, MSMB3 protein has been purified from chicken egg white (this report and [26]), its sequence with several modifications has been characterized by proteomics, and its X-ray 3D structure has been solved (Figs 6 and 7).
The three chicken paralogs MSMB1, MSMB2, and MSMB3 share relatively low percentage of sequence identity (34.5% between MSMB2 and MSMB3, 40.7% between MSMB1 and MSMB3, and 42% between MSMB1 and MSMB2), which is supposed to affect their respective biological functions. To better appreciate the specificities of each chicken MSMB genes, we first investigated their relative expression in various chicken female and male tissues. We show that all three chicken paralogs have a distinct pattern of expression with MSMB1 gene being essentially expressed in the liver, MSMB2 displaying a more ubiquitous expression, while MSMB3 being almost exclusively expressed in the female reproductive tissue (magnum) that secretes egg white proteins. The high expression of MSMB1 in the liver of both male and female chickens suggests a crucial physiological role in the liver metabolism that, however, remains to be defined. Concerning MSMB2, the detectable (although low) expression of this paralog in the testis, ductus deferens, and epididymis (Fig. 3E) of chickens is in accordance with the identification of MSMB2 protein in male reproductive secretions [31]. Thus, chicken MSMB2 may have a role in chicken male reproduction, similarly to the human homolog PSP94. However, it is noteworthy that MSMB2 is more widely expressed than human PSP94 whose expression is specific to the prostate tissue [38]. The expression of chicken MSMB2 in the white isthmus, the uterus, and vagina, which is corroborated by the identification of this protein in the eggshell and eggshell membranes [39,40], also suggests a function in eggshell formation and structure. Finally, MSMB2 was shown to be highly expressed in the duodenum, lung (Fig. 3B), and to a lesser extent in the liver (Fig. 3B,E), and consequently, it may have other physiological functions, besides reproduction. Compared to MSMB1 and MSMB2, MSMB3 is merely expressed in the reproductive tissues of female chickens, especially in the magnum (strong expression) and in the white isthmus (low expression), which is in accordance with the identification of this MSMB3 protein specifically in egg white and eggshell membranes, respectively [26,41]. The tissue specificity of MSMB3 strongly supports a role in chicken embryonic development. Recently, a msmb3 gene from Xenopus was described as a potent regulator of neural crest migration [35]. However, the mature form protein corresponding to Xenopus msmb3 gene as identified in databank only shares 38.37% sequence identity with mature chicken MSMB3, while it shows 47.67% and 46.67% sequence identity with chicken MSMB1 and MSMB2, respectively. Such an observation suggests that Xenopus msmb3 gene is rather orthologous to chicken MSMB1 or MSMB2 genes. Because of its antibacterial activity, we have previously suggested that chicken MSMB3 may participate in the protection of the embryo [26,32], together with the numerous antibacterial proteins and peptides that have been identified in the egg [42]. Interestingly enough, this protein persists in the egg even in the late stages of embryonic development [32], when the egg white is transferred into the amniotic fluid to be subsequently swallowed by the embryo. This intriguing MSMB3 stability throughout embryonic development is likely due to the presence of numerous protease inhibitors in the egg white [43] that prevent proteolysis, but also to the predicted resistance of disulfide-rich proteins to proteolysis [44]. The fate of MSMB3 protein, once orally absorbed by the embryo, remains unknown, but it may participate in the gut immunity of chicks as long as it resists degradation by digestive enzymes [32]. To conclude, the very diverse profile of expression of all three paralogs is assumed to reflect diverging functions, which is further supported by the moderate percentage of sequence identity between all three paralogs ( Fig. 2A). Chicken MSMB3 was purified from egg white using heparin affinity chromatography and exclusion A B Fig. 6. X-ray 3D structure of chicken MSMB3. (A) Overview of MSMB3 X-ray structure. (B) Superimposition of chicken MSMB3 (blue) and MSMB homologs in human by X-ray and NMR (PSP94, PDB:3ix0 [12], and hMSP, PBD:2iz3 [9], in yellow and orange, respectively), in pig by NMR (pMSP, PDB: 2iz4 [9], pink), and in snake by X-ray (SSP-2, PDB:6imf [10], green) with chromatography [26] prior to mass spectrometry, insolution NMR spectroscopy, and crystallization for Xray diffraction analysis. Mass spectrometry analyses reveal that the major form of purified MSMB3 lacks the C-terminal Gly residue at the carboxy-terminal extremity, although the whole form remains detectable at a much lower abundance (Figs 4 and 5). The hydrolytic mechanism underlying this glycine removal is not known yet. Indeed, only few proteases and peptidases at a very low abundance were identified in the egg white, and none of them possess the substrate specificity required to cleave this carboxy-terminal residue (aminopeptidase Ey, renin, similar to transmembrane protease, serine 9, similar to carboxypeptidase D, similar to aminopeptidase A, aminopeptidase) [45]. In addition, the presence of very abundant protease inhibitors restricts such proteolytic events in egg white [43]. In the future, it might be interesting to analyze whether secreted chicken MSMB1 and MSMB2 also lack the carboxy-terminal glycine.
In addition to the post-translational modification of MSMB3 sequence, a m/z peak of low amplitude corresponding to a MSMB3 potential homodimer could be detected on mass spectra (Figs 4 and 5). It is noteworthy that such a technique usually triggers dissociation of noncovalent protein complexes. Thus, this molecular species is likely underestimated. The presence of a dimer was further confirmed by analyzing NMR spectroscopy data and by X-ray diffraction. Both approaches concurred to conclude that the major form of MSMB3 is a homodimer. It is not known whether chicken MSMB1 and chicken MSMB2 are also prone to similar homodimerization arrangement, considering their relatively low sequence identity with chicken MSMB3 (Fig. 2A).
The 2.14Å crystal structure of chicken MSMB3 reveals that the monomer adopts a fold that resembles those determined for human, porcine, and snake MSMBs. Chicken MSMB3 consists of an amino-terminal domain having a Greek key fold and a doublestranded beta-sheet carboxy-terminal domain held together by a disulfide bond. Compared with human and porcine MSMBs, chicken MSMB3 lacks the last β-sheet within the C-terminal domain, likely due to the high mobility of its carboxy-terminal residues (G85, V86). This absence of the two beta-strands in MSMB3 results in a different pattern of dimerization compared with human MSMB (PSP94) where the two monomers endorse an edge-to-edge association ( [12], Fig. 7B). Although MSMBs can be found as homodimers in solution, it is commonly assumed that the monomeric proteins are believed to be the relevant biologically active species [46]. As an example, the monomeric PSP94 protein was found to interact with monoclonal natural killer-associated anti-Leu-111b antibody, whereas the dimeric form was inactive [47,48]. Moreover, the 3D structure of the SSP-2-triflin complex revealed that the venom inhibitor SSP-2 interacts as a monomer with the cysteine-rich secretory protein [10].
When comparing the distribution of the negative and positive charges at the surface of chicken MSMB3 protein with the other available MSMB proteins, we observed a very different pattern of charge distribution on chicken MSMB3. Furthermore, chicken MSMB3 protein is characterized by a high cationicity (pI = 9.3), as opposed to human, porcine, and snake MSMBs Fig. 8. Molecular surface of various MSMB structures (monomers) colored according to electrostatic potential values. Electrostatic potentials were calculated using the APBS server (http://www.poissonboltzmann.org/) [59,60] using default parameters. The same electrostatic potential energy scale was used for all representations in Chimera. Upper orientations are the same as in Fig. 6. Red, negative potential; white, neutral potential; blue, positive potential. Theoretical pI values were calculated using ProtParam tool of the expasy server (https://web.expasy.org/protparam/) using sequences extracted from PDB files. (acidic or neutral pI). Using chicken MSMB1 and MSMB2 structural models, we also showed that this feature is specific to the MSMB3 paralog: The distribution of positive charges onto MSMB1 model is more diffused despite a cationic pI (pI = 8.4), while the surface of MSMB2 model (pI = 4.7) is essentially anionic. Such a high cationicity of the chicken MSMB3 protein together with the presence of numerous clusters of positive charges may explain its affinity to heparin, a negatively charged glycosaminoglycan [26]. In addition, the peculiar conformation of MSMB3 homodimer may also contribute to form additional conformational positive clusters to enhance heparin binding. Fortunately, MSMB3 was crystallized using buffer containing sulfate ions, which could be easily assigned into electron density maps. The X-ray 3D structure revealed that residues R54, K63, and K64 are involved in H-bonds with the sulfate ions, which suggests that these residues might be important for heparin binding and activity.
Knowing that glycosaminoglycans are tightly associated with extracellular matrix proteins and cells, the affinity of MSMB3 for glycosaminoglycans may reinforce the hypothesis of a role during embryonic development [33]. It might also play a role in the progression of extra-embryonic structures (such as cell migration), namely the yolk sac, onto the inner surface of the perivitelline layer that encloses the yolk [49].

Concluding remarks
To conclude, given its specific expression in the female reproductive oviduct, its physiochemical and structural specificities, and its high sequence conservation among birds, MSMB3 protein is believed to play a crucial role in the reproduction of avian species. The next challenges will be to decipher its exact underlying physiological functions and regulation during incubation of fertilized eggs and to investigate whether all three paralogs are expressed during embryonic development, by the embryo itself and/or by the extra-embryonic annexes. The divergent features of chicken MSMB paralogs and the selectivity of their tissue expression that are likely associated with a specific function might motivate further studies on the relationships between structure of MSMB proteins and their respective activity in vertebrates.

Purification of MSMB3 from egg white
Briefly, egg whites were collected from freshly laid eggs (Isa-Hendrix, St Brieuc, France). Egg whites were homogenized, sampled, and kept frozen until further use. Chicken egg MSMB3 was further purified by heparin affinity chromatography followed by gel filtration, as described previously [26]. The protein concentration of MSMB3 was measured using absorbance at 280 nm, considering its specific extinction coefficient (E1% = 15.18) [26]. The purity of purified MSMB3 was assessed by SDS/PAGE and mass spectrometry analysis.

Sequence analyses
Alignments were performed using T-Coffee (5 µg) was reverse-transcribed using the Superscript II Kit (Invitrogen, Cergy Pontoise, France) and oligo(dT) (Promega, Madison, WI, USA), and stored at −80°C until further use. Total RNA and cDNA from liver, ductus deferens, epididymis, and testis from breeder males were prepared as previously described [50].

Mass spectrometry analysis of egg MSMB3 samples
Bottom-up proteomic approach was performed to characterize sequence and potential post-translational modifications. From 20 µg of purified MSMB3, reduction and cysteine alkylation were applied by successive incubations at final concentration of 5 mM dithiothreitol in 50 mM NH 4 HCO 3 (30 min, 56°C), and then at final concentration of 12.5 mM iodoacetamide in 50 mM NH 4 HCO 3 (20 min, room temperature, in dark). In-solution proteolytic digestion was carried out at 37°C overnight using a final enzyme:substrate ratio of 1 : 100 with bovine trypsin (Sequencing Grade; Roche Diagnostics, Paris, France). The hydrolytic peptides were incubated with formic acid at a 1% final concentration. The resulting peptide mixture was concentrated and desalted with ziptips C18 (Millipore, Merck KGaA, Darmstadt, Germany) before analysis by on-line nanoflow liquid chromatography-tandem mass spectrometry (nanoLC-MS/MS).
All experiments were performed on a dual linear ion trap Fourier Transform Mass Spectrometer (FT-MS) LTQ Orbitrap Velos Pro (Thermo Fisher Scientific, Bremen, Germany) coupled to an Ultimate ® 3000 RSLC Ultra High-Pressure Liquid Chromatographer (Thermo Fisher Scientific, Bremen, Germany), as previously described [31]. MS/MS ion searches were performed using Mascot search engine version 2.7.0.1 (Matrix Science, London, UK) via PROTEOME DISCOVERER 2.1 software (Thermo Fisher Scientific) using a homemade database that includes the chicken MSMB3 amino acid sequence. The selected parameters included trypsin as a protease with two allowed missed cleavages and carbamidomethylcysteine (+57 Da), methionine oxidation (+16 Da), amidation (−1), and acetylation (+42 Da) of amino-terminal protein as variable modifications. The tolerance of the ions was set to 5 p.p.m. for parent and 0.8 Da for fragment ion matches.
Mascot results were subjected to SCAFFOLD software (v 4.11.1 Proteome Software; Portland, OR, USA) using the protein cluster analysis option (assemblage of proteins into clusters based on shared peptide evidence). Peptide identification and protein identification were validated by the peptide and protein prophet algorithms with specified probability greater than 99.9 and 99%, respectively.
Top-down proteomic approach was performed from crude MSMB3 or after reduction and alkylation followed by desalting using Ziptip C4 (Millipore). On-line microliquid chromatography-tandem mass spectrometry (µLC-MS/ MS) was carried out using the LTQ Orbitrap Velos Mass Mass spectrometry/MS spectra of interest were manually extracted from Xcalibur and integrated to PROSIGHT PC software v 4.0 (Thermo Fisher, San Jose, CA, USA) for deconvolution using THRASH (signal/noise : 2) and submitted against the two MSMB3 sequences (with or without G terminal). Lists of monoisotopic masses of ions fragments were extracted and submitted to ProSight Lite tool with a mass tolerance at 10 ppm (at the fragment ion level) in order to note delta mass and scores.
Raw results from bottom-up and top-down mass spectrometry are available as Tables S2 and S3.
Ubiquitin, lysozyme, myoglobin, and albumin were purchased from Sigma-Aldrich and dissolved at a concentration of 100 µM in the same buffer as the MSMB3 protein.
All NMR experiments were performed on an Avance III HD Bruker 700 MHz Spectrometer equipped with a cryoprobe. NMR data were processed using Bruker's Topspin 3.2 ™ (Billerica, MA, USA) at 298 K. 1H and 2D 1H TOCSY (T m = 80 ms) spectra were acquired on both samples of MSMB3 in order to assess whether the protein was highly structured.
DOSY experiments were acquired on all NMR samples using a standard Bruker sequence and diffusion protocol described in the NMR user manual.
Calibration of the gradient strength was performed on a 99.9% D 2 O/0.1% GdCl3 sample. The physical observable that can be derived from the diffusion NMR experiment is the diffusion coefficient D, which is sensitive to the molecular mass of the molecular species. The empirically derived power law (Eqn 1) is probably the most powerful relation, which correlates the MW and the diffusion coefficient.
in which K is a molecule-dependent constant and α is a coefficient that depends highly on the shape and type of the molecular species. Plotting the experimental diffusion coefficients of molecules or proteins, measured in the same buffer at the same temperature (Table S3), versus their known molecular mass in log-log scale, allowed us to obtain a calibration curve (Fig. S3C). Errors bars were estimated to be 7% for all measurements and reflect the experimental errors effectively measured on the diffusion coefficient of the Tris molecule present in all samples. A similar curve has been plotted with some diffusion coefficients found in the literature (Fig. S3D).

X-ray structure of MSMB3
MSMB3 was concentrated to 10 mgÁmL −1 before initial sparse matrix crystallization screening using a Mosquito nanoliter pipetting robot (TTP Labtech Ltd., Melbourn, UK). The crystallization conditions were then manually refined using hanging drops to the final condition: 100 mM phosphate/citrate buffer pH 3.7, 100 mM Li 2 SO 4 , and 22% PEG1000. The crystals were grown at 20°C for 2 weeks and then transferred into mother liquor supplemented with 25% ethylene glycol and flash-frozen in liquid nitrogen. 100 K X-ray data were collected at ESRF beamline ID29 and processed using XDS [52] and AIMLESS [53]. The 3D crystal structure of MSMB3 was determined at 2.14Å resolution by molecular replacement with Phaser [54] of the Phenix suite [55] and using protein data bank (PDB) id 3ix0 (PSP94) as a search model. Atomic model was refined using phenix.refine and manually improved using COOT [56]. Data collection and refinement statistics are listed in Table 2. Molecular graphics images were produced using UCSF Chimera [57].

MSMB1 and MSMB2 homology modeling
3D structure models of MSMB1 and MSMB2 were built from their primary protein sequence by homology modeling using I-TASSER server [58]. All disulfide bonds were assigned as additional restraints to guide I-TASSER modeling.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Multicharged-ion spectra for chicken MSMB3 proteoform lacking a G residue. Fig. S2. Multicharged-ion spectra for native chicken MSMB3. Fig. S3. Sequences and fragmentation patterns of the chicken egg purified MSMB3 proteoforms. Fig. S4. Oligomeric state of MSMB3 in solution. Fig. S5. Residue-residue interactions across dimer interfaces. Fig. S6. Interactions of sulfate ions with MSMB3 homodimer. Table S1. Values of diffusion coefficient for various proteins in water. Table S2. Bottom-up data.