Analysis of nucleotide diphosphate sugar dehydrogenases reveals family and group‐specific relationships

UDP‐glucose dehydrogenase (UDPGDH), UDP‐N‐acetyl‐mannosamine dehydrogenase (UDPNAMDH) and GDP‐mannose dehydrogenase (GDPMDH) belong to a family of NAD +‐linked 4‐electron‐transfering oxidoreductases called nucleotide diphosphate sugar dehydrogenases (NDP‐SDHs). UDPGDH is an enzyme responsible for converting UDP‐d‐glucose to UDP‐d‐glucuronic acid, a product that has different roles depending on the organism in which it is found. UDPNAMDH and GDPMDH convert UDP‐N‐acetyl‐mannosamine to UDP‐N‐acetyl‐mannosaminuronic acid and GDP‐mannose to GDP‐mannuronic acid, respectively, by a similar mechanism to UDPGDH. Their products are used as essential building blocks for the exopolysaccharides found in organisms like Pseudomonas aeruginosa and Staphylococcus aureus. Few studies have investigated the relationships between these enzymes. This study reveals the relationships between the three enzymes by analysing 229 amino acid sequences. Eighteen invariant and several other highly conserved residues were identified, each serving critical roles in maintaining enzyme structure, coenzyme binding or catalytic function. Also, 10 conserved motifs that included most of the conserved residues were identified and their roles proposed. A phylogenetic tree demonstrated relationships between each group and verified group assignment. Finally, group entropy analysis identified novel conservations unique to each NDP‐SDH group, including residue positions critical to NDP‐sugar substrate interaction, enzyme structure and intersubunit contact. These positions may serve as targets for future research. Enzymes UDP‐glucose dehydrogenase (UDPGDH, EC 1.1.1.22).

UDP-glucose dehydrogenase (UDPGDH), UDP-N-acetyl-mannosamine dehydrogenase (UDPNAMDH) and GDP-mannose dehydrogenase (GDPMDH) belong to a family of NAD + -linked 4-electron-transfering oxidoreductases called nucleotide diphosphate sugar dehydrogenases (NDP-SDHs). UDPGDH is an enzyme responsible for converting UDP-Dglucose to UDP-D-glucuronic acid, a product that has different roles depending on the organism in which it is found. UDPNAMDH and GDPMDH convert UDP-N-acetyl-mannosamine to UDP-N-acetyl-mannosaminuronic acid and GDP-mannose to GDP-mannuronic acid, respectively, by a similar mechanism to UDPGDH. Their products are used as essential building blocks for the exopolysaccharides found in organisms like Pseudomonas aeruginosa and Staphylococcus aureus. Few studies have investigated the relationships between these enzymes. This study reveals the relationships between the three enzymes by analysing 229 amino acid sequences. Eighteen invariant and several other highly conserved residues were identified, each serving critical roles in maintaining enzyme structure, coenzyme binding or catalytic function. Also, 10 conserved motifs that included most of the conserved residues were identified and their roles proposed. A phylogenetic tree demonstrated relationships between each group and verified group assignment. Finally, group entropy analysis identified novel conservations unique to each NDP-SDH group, including residue positions critical to NDP-sugar substrate interaction, enzyme structure and intersubunit contact. These positions may serve as targets for future research. and GDP-mannose dehydrogenase (GDPMDH) belong to a small group of NAD + -linked 4-electron-transfering oxidoreductases termed nucleotide diphosphate sugar dehydrogenases (NDP-SDHs) [1]. UDPGDH was first detected in bovine liver in 1954 [2]. It was subsequently purified in 1969 [3] and sequenced in 1994 [4]. UDPGDH has since been iden-tified as the rate determining step in the conversion of UDP-D-glucose (UDP-Glc) to UDP-D-glucuronic acid (UDP-GlcUA) by reducing two molecules of NAD + to NADH through two cycles of oxidation [4].
UDPGDH is found in a variety of different organisms from bacteria to plants and animals, and maintains consistency in its mechanism for converting UDP-Glc to UDP-GlcUA. Nevertheless, UDPGDH has different quaternary structure in unlike organisms. In the bacteria Streptococcus pyogenes UDPGDH (SpUDPGDH) exists as a homodimer [5], whereas studies report its existence in bovines and humans (hUDPGDH) as a homohexamer with 'half-of-the sites' reactivity, essentially acting as a trimer of dimers [1,6]. Similarly, UDP-GlcUA has different fates according to the organism in which it is found. In several strains of Streptococcus UDP-GlcUA is the substrate for the production of polysaccharides that comprise the organism's capsule, which aids in surface attachment, increases antibiotic resistance and protects against phagocytosis [7]. UDP-GlcUA is also used by Burkholdaria cepacia to synthesize the exopolysaccharide cepacian, a major virulence factor [8]. In mammals UDP-GlcUA serves as precursor to hyaluronan and various glycosaminoglycans. Hyaluronan is found in the extracellular matrix and plays a role in promoting cell growth and migration [9]. Interfering with proteoglycan synthesis reduces tumour growth and development [10,11]. Hence, glycosaminoglycans are associated with cancer metastasis [12]. Loss of UDPGDH function leads to major problems in embryogenesis, such as heart valve defects in zebrafish and vulval morphogenesis in Caenorhabditis elegans [13,14], while UDPGDH overexpression can lead to chondrogenesis [15]. Additionally, UDP-GlcUA in Drosophila melanogaster acts as a modifier for proteins involved in wing formation [16]. Another role of UDP-GlcUA is the glucuronidation of molecules in the liver that targets these compounds for excretion [17]. Some lung and colon cancers have actually taken advantage of this activity for drug resistance [18,19]. UDP-GlcUA also serves as a precursor to UDP-xylose, a critical component of plant cell wall polysaccharides such as pectin and hemicellulose [20,21]. Amazingly, UDPGDH from Sphingomonas elodea has even been shown to also exhibit ribonuclease activity [22].
GDPMDH uses a similar mechanism to convert GDP-mannose (GDP-Man) to GDP-mannuronic acid (GDP-ManUA) while in turn reducing two molecules of NAD + to NADH. This enzyme in Pseudomonas aeruginosa (PaGDPMDH) is the rate-limiting step in the synthesis of alginate, an exopolysaccharide that protects the organism from antibiotics and host defences and allows P. aeruginosa to act as an opportunistic pathogen. There is no equivalent enzyme in humans. PaGDPMDH shares only about 22% identity to SpUDPGDH and, unlike other NDP-SDHs, has a domain-swapped dimeric structure [23].
UDPNAMDH converts UDP-N-acetyl-mannosamine (UDP-ManNAc) to UDP-N-acetyl-mannosaminuronic acid (UDP-ManNAcA) while in turn reducing two molecules of NAD + to NADH. The Cap50 enzyme in Staphylococcus aureus (SaUDPNAMDH) is a UDPNAMDH responsible for synthesizing UDP-ManNAcA for incorporation into S. aureus's polysaccharide capsule. SaUDPNAMDH only shares approximately 20% identity to SpUDPGDH and PaGDPMDH. SaUDPNAMDH possess a dimeric organization similar to that of SpUDPGDH. Also, tyrosine phosphorylation, most likely on Tyr89, has been shown to increase the activity of SaUDP-NAMDH, similar to what has previously been demonstrated for UDPGDHs from E. coli and Bacillus subtilis [24][25][26].
The mechanism for UDPGDH, which is common to the other NDP-SDHs, proceeds by a Bi-Uni-Uni-Bi Ping Pong mechanism [27]. It begins with an aspartate residue (Asp264 in SpUDPGDH) acting as a general base by activating a water molecule [28]. This proceeds to the oxidation of the C6″ hydroxyl of UDP-Glc to form an aldehyde intermediate and the transfer of the pro-R hydride to NAD + to form NADH [29]. Secondly, a cysteine (Cys260 in SpUDPGDH) acts as a nucleophile by attacking the aldehyde, yielding a covalent thiohemiacetal intermediate [30,31]. This is followed by the transfer of the remaining hydride (pro-S) at the C6″ position to a second NAD + to again form NADH. The final, rate-limiting step of the UDPGDH mechanism is the hydrolysis of the remaining thioester intermediate, which is catalysed by Tyr10 in SpUDPGDH, to yield UDP-GlcUA [5,8,28].
The current era of genomics has yielded a large number of sequences for each type of NDP-SDH. In addition, tertiary structures are now available for each enzyme. Relationships between each enzyme have scarcely been addressed in previous studies. The goal of this research was to align a large number of protein sequences for each NDP-SDH homologue and to identify and confirm the structural and functional roles of residues and sequence motifs in UDPGDHs, UDPNAMDHs and GDPMDHs. Group entropy analysis was also performed to identify group-specific conservations for each NDP-SDH homolog, yielding new insights into the unique function of each enzyme.

Structure and residue conservations
A total of 229 amino acid sequences were aligned ( Fig. 1) using tertiary structural alignment as a guide. The full alignment is available in Fig. S1. The sequences used included 92 bacterial and archaeal UDPGDHs, 55 eukaryotic UDPGDHs, 38 UDP-NAMDHs and 44 GDPMDHs sequences. Despite only about 20% sequence identity between each enzyme in the family, their tertiary structures are well conserved [5,6,23,24]. Eukaryotic UDPGDHs have a slightly longer loop after b-5 (alignment indices 179-183) as compared to the other NDP-SDHs. GDPMDHs have an extended loop after a-5 (alignment indices 202-206) and also after a-10 (alignment indices 386-387). UDPNAMDHs have an extended loop after a-6 (alignment indices 236-243).
The side chain hydroxyl of Thr83{173} hydrogen bonds to the 3 0 hydroxyl of the nicotinamide ribose of NAD + . Thr118{217} interacts with a water molecule which in turn hydrogen bonds to the 2 0 hydroxyl of the nicotinamide ribose of NAD + . This same water molecule is activated as a nucleophile by Asp264{415}, the general base, to initiate the catalytic mechanism [8]. Lys204{352} and Asn208{356} are 3.0 A from the O6″ carboxylate oxygen of UDP-GlcUA and are involved in the electrostatic stabilization of the substrate in the first oxidation step [5,28].
Of the remaining invariant residues, the side chain of Ser117{216} is in the NAD + -binding pocket, but lies nearly 4.0 A from the nicotinamide ribose. Pro140{257} positions Glu141{258} so that its side chain carbonyl can hydrogen bond with the main chain nitrogen of Leu143{260} (77% conserved). This interaction holds the loop between b-8 and a-7 in place. The main chain carbonyl of Glu141{258} also forms an ionic bond with the side chain of Lys204{352}, noted above. The side chain carboxyl of Glu201{349} is 2.7 A from the main chain nitrogen of Gly122{221} (69% conserved), maintaining enzyme structure [5]. Interestingly, an E201D mutation in Streptococcus pneumoniae causes the lack of a capsule [7]. The amide nitrogen of the side chain of Asn287{442} is 2.9 A from the main chain nitrogen of Tyr256{407} (not conserved). The side chain carboxyl of Asn219{367} is 2.8 A from the main chain nitrogen of Ser253{404} (not conserved). Both asparagines (219 & 287) hold the loop between a-10 and a-11 in position; a-11 includes the catalytic residues Cys260{411} and Asp264{415}. Lastly, Lys320{486} coordinates diphosphate bridge of UDP-glucose [5].
In addition to the 18 invariant residues, 20 residues were conserved in at least 80% of the aligned sequences and 52 additional residues were at least 60% conserved. With 90 residue positions conserved in NDP-SDH sequences that are roughly 425 amino acids in length, this represents a fairly high degree of conservation despite each different enzyme using a different nucleotide-sugar substrate. Other highly conserved residues that play functional roles in UDPGDH include the following: Tyr10{92}, Asp29{114}, Asn39 {124}, Glu145{262}, Arg244{393}, Asn325{491}, Arg327{493} and Ser329{495}. Tyr10{92} is 98% conserved in our alignment and catalyses the final hydrolysis of the enzymatic thioester intermediate [8]. Asp29 {114} (99% conserved) coordinates both the 2 0 and 3 0 hydroxyls of the adenosine ribose of NAD + . Asn39 {124} (83%), which is in a-2, hydrogen bonds via its side chain amide nitrogen to the main chain carbonyl oxygen of Ala63{181}, which is not conserved and lies in b-3. Glu145{262} (76%) is 4.7 A from the UDPglucose diphosphate bridge. Arg244{393} (76%) from subunit b of the dimer is 3.1 A from 2″ hydroxyl group of glucose in UDP-glucose bound in subunit a, and vice versa [5]. This intersubunit contact may participate in the communication that results in half-sites reactivity in mammalian UDPGDHs [6]. Asn325{491} in SpUDPGDH is at a position in the alignment that is 80% aspartate. The side chain amide nitrogen of Asn325{491} is 2. 8 A from the side chain carbonyl oxygen of Glu145{262}. Arg327{493} (99%) forms a salt bridge with the pyrophosphate of NAD + . The side chain hydroxyl of Ser329{495} (81%) is 2.8 A from the main chain carbonyl oxygen of Leu317{483} (64%). In a large alignment, the inclusion of even one or a few sequences with variations can lead to critical residues no longer being invariant, but this does not diminish their critical roles, as was demonstrated in ALDHs [32]. Three sequences (MCIThaHYPO, SalPa-cUGD and UncBacHYPO) lacked tyrosine at index 92, while one sequence (NatGarNSD) lacked an aspartate at index 114 and one sequence (UncBacHYPO) that lacked an arginine at index 493. Of these four sequences, two were from uncultured bacteria from metagenomic studies [37,38] and the other two lacked a reference. Thus, none had proven enzymatic function.

Tyrosine phosphorylation
It has been revealed that phosphorylation of a tyrosine at index 157 in the alignment in UDPGDHs from E. coli (Tyr71) and Bacillus subtilis (Tyr70) causes an increase in enzymatic activity [25,26]. Modelling in B. subtilis UDPGDH places this tyrosine at the surface near the NAD + -binding site. It has been suggested that phosphorylation of this tyrosine might make this binding site more accessible [25]. Tyrosine is not conserved at index 157, with only 8 out of 92 bacterial and archaeal UDPGDHs having tyrosine. Eukaryotic UDPGDHs, where phosphorylation has not been witnessed, have mostly a hydrophobic isoleucine or valine at index 157, while GDPMDHs have valines or phenylalanines here and UDPNAMDHs have an indel at this index.
SaUDPNAMDH is also activated by phosphorylation on Tyr89{183}. This tyrosine in SaUDPNAMDH lies at the bend in a long loop between b-D and a-4 near the enzyme surface. This residue lies before Cys92 {186}, which may also be involved in regulation by forming a disulphide with the catalytic Cys258{411} [24]. Similar to UDPGDHs, this tyrosine at index 183 is not conserved, with only 3 of the 38 UDPNAMDHs aligned having a tyrosine at this position. Thus, it appears that tyrosine phosphorylation did not evolve at conserved tyrosine positions and therefore may not occur in all organisms or enzymes.

Conserved motifs
The 10 most well conserved sequence motifs were statistically identified using the MEME program. Seventeen of the 18 invariant residues cluster into 7 of the 10 conserved motifs (Table 1). Both the Rossmann fold, found between b-1 and a-1 in SpUDPGDH, and Tyr10 are located in Motif 5. The Rossmann fold allows close interaction with the adenosine ribose of NAD + [39]. In addition to Motif 5, Motifs 4 and 7 also contribute to the N-terminal NAD + -binding domain of NDP-SDHs ( Fig. 2A). Motif 4 contains invariant residues Pro140{257} and Glu141{258}. Motif 7 includes the fully conserved Thr83{173}.

Phylogenetic analysis
An unrooted bootstrapped phylogenetic tree of NDP-SDHs (Fig. 3) was generated using the neighbour-joining method. This method was chosen as maximum likelihood and parsimony methods are computationally prohibitive for larger data sets and as other studies have indicated that the neighbour-joining method has yielded quality evolutionary relationships in some families [40]. In fact, a bootstrapped parsimony tree using only 300 data sets (Fig. S2) was highly comparable to the neighbour-joining tree using 1000 replicates. The tree was used to support assignment of each NDP-SDH sequence into an appropriate group for group entropy analysis. The tree indicates that prokaryotic UDPGDHs are the most diverse group of sequences used. Eukaryotic UDPGDHs (823), UDP-NAMDHs (995) and GDPMDHs (988) form distinct clades within the phylogenetic tree with high bootstrapping values (in parentheses). The eukaryotic UDPGDHs distinctly cluster within the more diverse prokaryotic UDPGDHs. The UDPNAMDHs and GDPMDHs cluster closely together on the tree, perhaps due to the fact that both substrates involve a mannose sugar (UDP-ManNAc and GDP-Man respec- tively). Among the clade containing 38 UDPNAMDH sequences, there were eight sequences identified as prokaryotic UDPGDHs and two GDPMDHs found in that clade. Literature investigation revealed that all 10 of these sequences resulted from genome sequencing studies, indicating that these outliers could possibly be misidentified without a proven enzymatic function [41][42][43][44][45][46].

Group entropy analysis of GDPMDHs
The GEnt program was developed as an algorithm to detect amino acid residues that are characteristic of an individual protein family from an alignment with other related proteins. The program calculates a 'Group Entropy' value that represents the degree of residue con-servation at that position within the designated group and a 'Family Entropy' value that represents the degree of residue conservation at that position within the entire alignment. Residue conservations unique to and critical to the designated group of proteins would have a high Group Entropy value for a specific residue position, indicating it is highly conserved in that group of sequences, while also having a low Family Entropy value, indicating that that position is not as well conserved in the entire alignment. These positions would plot to the upper left quadrant of a Group Entropy vs. Family Entropy plot (Fig. 4). Initial use of the GEnt program was used to identify critical, family-specific conservations in class 3 ALDHs [47]. GEnt was used here to identify novel residue positions important to the unique function each NDP-SDH homolog.  GEnt analysis of GDPMDHs revealed that residues His217{359}, Leu126{219}, Ala263{406}, Arg122 {215} and Cys213{355} (PaGDPMDH residue identities with the alignment index position in curly brackets) have the highest Group Entropy scores ( Table 2), indicating that these positions are specifically conserved in the GDPMDHs. The order the residues are listed is descending from highest group entropy. The full GEnt results are available in Table S1. His217 {359} was found throughout 43 of the 45 GDPMDH sequences aligned. However, it is possible that these two sequences lacking histidine, GDPMDHs from Bacillus thuringiensis and Vibrio crassostreae, are misidentified as GDPMDHs, for they both replace histidine at this position with arginine which is invariant in the UDPNAMDHs at this same position. The ND1 position of the His217{359} side chain is 2.9 A from the 2″-hydroxyl of the mannose moiety of GDP-Man, coordinating the substrate [23]. Next, Leu126{219} is highly conserved (35 out of 45) in GDPMDHs and is located in a loop following b-5 which is distant from the active site. With some variations, all other NDP-SDH groups replace leucine with a proline suggesting that Leu126{219} plays a role in the structure of that loop. With the exception of the same two sequences mentioned above, Ala263{406} is also conserved in 43 of 45 GDPMDHs. All other NDP-SDH families replace this alanine with a glycine that is 80% conserved in the in the entire alignment. The main chain carbonyl oxygen of Ala263{406} forms a watermediated contact to the 2 0 hydroxyl of the guanosine ribose of GDP-Man [23]. In SpUDPGDH index 406 is occupied by Gly255{406}, which is 5. 6 A from the oxygen at position 2 of the uridine ring in UDP-Glc. Thus, Ala263{406} allows for the proper shape of the binding pocket for the nucleotide of GDP-Man in GDPMDHs. Like His217{359} and Ala263{406}, Arg122{215} was found to be almost fully conserved (43 of 45 sequences) in the GDPMDH group, with the same two previously noted sequences having glutamate at this position. Lysines are found predominantly at index 215 in the rest of the NDP-SDH alignment. In PaGDPMDH the side chain of Arg122{215} is 2.1 A from the hydroxyl group of Tyr191{315} (78% conserved in entire alignment and 93% conserved in GDPMDHs) and 2.1 A from the side chain amide oxygen of Asn155{256}, which lies before the invariant proline-glutamate sequence (Pro140{257}-Glu141{258} in SpUDPGDH), aiding in the positioning of that conserved sequence. Lastly, Cys213{355} is partially conserved (25 out of 45) in the GDPMDH group. It is found at the dimer interface region and appears to be associated with subunit contact, as it is located 3.9 A from Ile245{388} on the neighbouring subunit. Cys213 {355} is also 7.3 A from the 3″ hydroxyl of the bound GDP-Man.

Group entropy analysis of UDPNAMDHs
Group entropy analysis of the UDPNAMDH group (Table 3) found that residues Arg152{259}, Pro155 {262}, Arg211{359}, Val261{414}, Val254{407},  Table S2. Arg152{259} is invariant in UDPNAMDHs and is replaced with a phenylalanine in all other groups. Overall, phenylalanine is 78% conserved in the entire alignment at this position. In SaUDPNAMDH the side chain of Arg152{259} is 2.9 A from the [Eu(DPA) 3 ] 3À complex bound in the substrate site, which superimposes where the substrate sugar is bound in SpUDPGDH [24]. A recent publication of a UDPNAMDH from Pyrococcus horikoshii also indicates that Arg152{259} is also found in the substrate-binding site [46]. The main chain carbonyl of Phe142{259} in SpUDPGDH is 4.0 A from the 4″hydroxyl of UDP-GlcUA in UDPGDH and is located in the glucose-1-phosphate-binding pocket [5]. Because of its positioning, Arg152{259} might accommodate for binding a different sugar substrate in UDP-NAMDHs. Phe158{259} in PaGDPMDH at this same index position lies at the dimer interface. Next, Pro155 {262} is fully conserved within the UDPNAMDH group and in the two possibly misidentified GDPMDH sequences noted above. It is located in a loop following Arg152{259} that makes up the sugarbinding site, thus possibly providing an altered struc-ture to accommodate a different sugar substrate as well. In all other families, glutamate (76% conserved in the entire alignment) replaces proline at this position. In SpUDPGDH the main chain nitrogen of Glu145{262} at this index position is 2. 8 A from an oxygen atom on the beta phosphate of UDP-GlcUA. The next residue, Arg211{359}, is also fully conserved in the UDPNAMDH group. The side chain of Arg211 {359} is found in the sugar-binding site of UDP-NAMDH. In the P. horikoshii UDPNAMDH the NE atom of Arg211{359} hydrogen bonds to the O2A atom of UDP-ManNAcA [48]. In fact two arginines, Arg152{259} and Arg211{359}, identified by GEnt in UDPNAMDHs are both involved in substrate specificity. In support of this observation, a R152F/R211L double mutant of SaUDPNAMDH is unable to oxidize the normal UDP-ManNAc substrate [24]. The next residue identified by GEnt in UDP-NAMDHs is Val261{414} which is found in 34 out of 38 UDPNAMDH sequences (89% conserved), with the other four UDPNAMDH sequences having either leucine or isoleucine. This index position is mostly replaced by lysine in the entire alignment (78% conserved). This position is adjacent to fully conserved Asp262{415} and is in close proximity to the conserved 'GGXC' sequence involving the catalytic thiol. The side chain of Val261{414} does not face the cat-  alytic site and lies about 5.5 A from the side chain of Ile224{372} from the neighbouring subunit. Hence, this position in UDPNAMDH may play a role in intersubunit contact. In SpUDPGDH Val261{414} is replaced by Lys263{414}, which is 2.9 A from the 2 0 hydroxyl group of the nicotinamide ribose of NAD + . Another valine identified by GEnt in SaUDPNAMDH is Val254{407} which is found in 35 out of 38 UDP-NAMDH sequences, with the other three UDP-NAMDH sequences having leucine. In the P. horikoshii UDPNAMDH Val254{407} lines the pocket where the uridine group of UDP-ManNAcA is located [48]. Similarly, the a-carbons of Tyr256{407} in SpUDPGDH and Phe264{407} in PaGDPMDH are within 5 A of C1D of the ribose ring of UDP-xylopyranose and GDP-mannopyranosyl ester, respectively. Hence, the main chain position of this residue contributes to NDP binding in the substrate. Next, His257 {410}, which is invariant in UDPNAMDHs, is located between two invariant glycines and the invariant cysteine in the sequence 'GGHC'. The side chain of His257{410} is about 4 A from the nicotinamide ring of NAD + and approximately 5.5 A from the 2 0 and 3 0 hydroxyls of the nicotinamide ribose in SaUDP-NAMDH. In the P. horikoshii UDPNAMDH His257 {410} side chain hydrogen bonds to a water molecule that in turn is bonded to the O2B atom of UDP-Man-NAcA [48]. Tyr259{410} at the corresponding position in SpUDPGDH is approximately 4.5 A from the 2 0 and 3 0 hydroxyl groups of the nicotinamide ribose of NAD + . The next residue identified by GEnt in UDP-NAMDH is Glu117{215}, which is invariant in the UDPNAMDH group and is mostly replaced with lysine or arginine in other groups. The side chain carbonyl of Glu117{215} in SaUDPNAMDH is 2.6 A from the side chain hydroxyl of Tyr184{315}, which is also invariant in UDPNAMDH. The side chain amine of Lys116{215} in SpUDPGDH is 2.8 A from the main chain carbonyl of invariant Pro140{257}, which is located in the loop between b-8 and a-7. Next, His242{391} is found in 35 out of 38 UDPNAMDH sequences (92% conserved) and is replaced in other NDP-SDH groups by aspartate, which is 78% conserved in the entire alignment. The side chain of His242{391} forms an ion pair with Glu207{355} from the neighbouring subunit [48] [48]. Interestingly, Arg244{393} in SpUDPGDH interacts with the 2″ hydroxyl of glucose in UDP-Glc. Glucose and mannose are epimers at the 2″ positions. GDPMDH have Lys250{393} at this index position, but it does not form intersubunit contacts due to the domainswapped structure of GDPMDH [23]. Lastly, the side chain of Phe265{418} in subunit b of SaUDP-NAMDH is 4.3 A from the side chain of Ile224{372} from subunit a. Thus, in UDPNAMDHs this position plays a role in intersubunit contact. Phenylalanine is found at index 418 in 35 out of 38 UDPNAMDH sequences, with the other three UDPNAMDH sequences having tyrosine. However, Gln267{418} in SpUDPGDH, Ala275{418} in PaGDPMDH and Asn283{418} in hUDPGDH at this index position all lie in the middle of a-11 and do not form any apparent intermolecular contacts.

Group entropy analysis of UDPGDHs
The UDPGDH GEnt analysis (Table 4) indicated that the residues Leu211{359}, Phe218{366}, Ile27{112}, Lys116{215}, Ser8{90}, Ala207{355} and Gly238{387} in SpUDPGDH (index positions in curly brackets) are uniquely conserved in the UDPGDH group. The full GEnt results are available in Table S3. Leu211{359}, which is 93% conserved in the UDPGDHs, forms the pocket for the sugar group of UDP-Glc, making van der Waals contact with the C2″ ring position [23]. Second, Phe218{366} in the UDPGDHs lies at a hydrophobic position. The side chain of Phe218{366} is roughly 4.0 A from Ile245{394} indicating that its function might be hydrophobic packing. Next, Ile27 {112} is commonly replaced with alanine, cysteine and valine in the UDPGDH group. It is located 5.0 A from the Rossmann fold and likely serves in structural positioning. The side chain hydroxyl group of Ser8 {90} is 4.5 A from the gamma carbon of Ile27{112}. This close interaction may lead to a compensatory change in other NDP-SDHs with index 90 being a hydrophobic amino acid, often leucine, and index 112 being a smaller residue, often glycine, to facilitate packing interactions. For example, PaGDPMDH has a hydrophobic leucine (Leu8) at index 90 and a glycine (Gly28) at index position 112.
The side chain amine of Lys116{215}, which is 92% conserved in the UDPGDH group, is 2. 8 A from the main chain carbonyl of the invariant Pro140{257}. This interaction likely coordinates the position of the critical loop that contains both Pro140{257} and Glu141{258}. Next, the side chain of Ala207{355}, which is 81% conserved in UDPGDHs, is 6.8 A from the 3″ hydroxyl of the bound UDP-xylopyranose in SpUDPGDH. This position in PaGDPMDH is occupied by Cys213{355} which lies at the dimer interface region. Lastly, Gly238{387} is mostly glycine and alanine in the bacterial and archaeal UDPGDHs and eukaryotic UDPGDHs, respectively. This index position is exchanged with mostly hydrophobic residues in other NDP-SDH groups. Val244{387} lies at this position in PaGDPMDH and is involved in subunit interactions.

Common group entropy positions
Several index positions identified by GEnt demonstrated group-specific conservations in multiple NDP-SDH groups, yielding novel insights into the critical differences between each enzyme. First, index position 359 is the highest scoring position for group entropy in GDPMDHs and UDPGDHs, and is also highly scoring in UDPNAMDHs. This position is clearly responsible for substrate specificity, as was initially proposed by Snook and colleagues [23]. The side chain of His217{359} in PaGDPMDH is 2.9 A from the 2″hydroxyl of the mannose moiety of GDP-Man, coordinating the substrate [23]. Leu211{359} in SpUDPGDH forms the pocket for the sugar group of UDP-glucose, making van der Waals contact with the C2″ ring position [23]. As previously noted, glucose and mannose are epimers at the 2″ positions, and the N-acetyl group of UDP-ManNAc is also attached to the 2″ position. Hence, this should be a key location for determining substrate specificity. The side chain of Arg211{359} in SaUDPNAMDH is also found in the sugar-binding site of UDPNAMDH. However, the specific substrate interaction that Arg211{359} has is not clear, as a [Eu (DPA) 3 ] 3À complex, instead specific substrate, was crystalized. The recently published P. horikoshii UDP-NAMDH structure shows that the guanidinium group of Arg211{359} hydrogen bonds to O2A and O1A of the a-phosphate of UDP-ManNAcA, while a different arginine, the conserved Arg244{393}, hydrogen bonds to the carbonyl oxygen of the N-acetyl group of UDP-ManNAcA [48].
Second, index position 215 was also identified by GEnt with high group entropy scores in all three NDP-SDH groups. This position apparently serves to maintain critical enzyme structure in each group by interacting with a conserved tyrosine. The side chain of Arg122 {215} in PaGDPMDH is 2.1 A from the hydroxyl group of Tyr191{315} (78% conserved in entire alignment) and 2.1 A from the side chain amide oxygen of Asn155{256}, which lies before the invariant proline {257}-glutamate{258} sequence, aiding in the positioning of that conserved loop between b-8 and a-7. The side chain carbonyl of Glu117{215} in SaUDP-NAMDH is 2. 6 A from the side chain hydroxyl of Tyr184{315}, which is also invariant in UDPNAMDH. In SpUDPGDH the side chain of Lys116{215} is 2.8 A from the main chain carbonyl of invariant Pro140 {257}, also holding the same loop in place. Lys116 {215} does not interact with the conserved tyrosine at index 315, however, as it is replaced by Leu181{315} in SpUDPGDH. However, in hUDPGDH the side chain Lys129{215} does interact with the side chain of Tyr199{315}, as seen in these other NDP-SDHs.
Lastly, index 355 is identified in the top six group entropy scores for both GDPMDH and UDPGDH, and is the sixteenth highest group entropy score in UDPNAMDH. Index 355 appears critical for intersubunit contact. The side chain of Ala207{355} in the monomeric SpUDPGDH structure lies 6.8 A from the 3″ hydroxyl of the bound UDP-xylopyranose. The side chain of the equivalent residue in hUDPGDH, Ala223 {355}, is 7.0 A from the 3 0 hydroxyl of UDP-Glc, but is 4.0 A from Ile255{388} in the neighbouring subunit. In PaGDPMDH the side chain of Cys213{355} is  A from Ile245{388} on the neighbouring subunit and is also 7.3 A from the 3″ hydroxyl of the bound GDP-Man. In SaUDPNAMDH the side chain carbonyl of Glu207{355} is 4. 5 A from the [Eu (DPA) 3 ] 3À complex, which sits in the substrate-binding site, and is 3.9 A from His242{391}, also identified by GEnt (see above), from the neighbouring subunit. A glutamate at index 355 also lies in the binding site for UDP-ManNAcA in the P. horikoshii UDPNAMDH. The overall and group-specific conservations identified here could definitely serve as interesting targets for site-directed mutagenesis by other researchers. The identification of these positions may also aid in drug discovery for bacterial isoforms that assist in capsule formation.

Materials and methods
The project initially began by obtaining the amino acid sequence of UDPGDH from Streptococcus pyogenes (PDB entries 1DLJ and 1DLI) from the RCSB Protein Data Bank. The sequence was then used to perform a PSI-BLAST [49] search of the nonredundant protein database at the National Center for Biotechnology Information (NCBI). 229 related UDPGDH, GDPMDH and UDPNAMDH amino acid sequences were collected with per cent identities ranging from 99% to 15%. These sequences were initially aligned using T-Coffee [50]. To improve alignment quality, the alignment was manually adjusted using tertiary structure comparison through the RCSB PDB Protein Comparison Tool-jFATCAT method [51,52] as a guide, comparing Streptococcus pyogenes UDPGDH (SpUDPGDH, PDB entry 1DLJ), Pseudomonas aeruginosa GDPMDH (PaGDPMDH, PDB entry 1MV8), human UDPGDH (hUDPGDH, PDB entry 3TDK) and Staphylococcus aureus UDPNAMDH (SaUDP-NAMDH, PDB entry 3OJL). The alignment editor used was GENEDOC [53]. Conservations within the alignment were analysed for structural or functional significance. Molecular visualization was performed using RASMOL [54]. Analysis of conserved sequence motifs was facilitated by MEME program [55]. Group entropy analysis (GEnt) [47] was performed to compare UDPGDH, UDPNAMDH and GDPMDH groups to each other.
The PHYLIP suite of programs was used to generate the phylogenetic tree [56]. First, the alignment was trimmed using TrimAl [57]. 1000 Bootstrapped data sets of the trimmed alignment were then generated using the SEQBOOT program. Next, distances for the data sets were determined by the PROTDIST program using the Jones-Taylor-Thornton matrix. Phylogenetic trees for each data set were generated using the NEIGHBOR program. Lastly, the unrooted consensus tree was generated using the CONSENSE program. The tree graphic was generated using FIGTREE (available at http://tree.bio.ed.ac.uk/software/figtree).

Supporting information
Additional supporting information may be found in the online version of this article at the publisher's web site: Fig. S1. Complete alignment of 229 NDP-SDHs sequences (MSF format). Fig. S2. Bootstrapped parsimony tree of NDP-SDHs.