In silico analysis of heme oxygenase structural homologues identifies group‐specific conservations

Heme oxygenases (HO) catalyze the breakdown of heme, aiding the recycling of its components. Several other enzymes have homologous tertiary structures to HOs, while sharing little sequence homology. These homologues include thiaminases, the hydroxylase component of methane monooxygenases, and the R2 component of Class I ribonucleotide reductases (RNR). This study compared these structural homologues of HO, using a large number of protein sequences for each homologue. Alignment of a total of 472 sequences showed little sequence conservation, with no residues having conservation in more than 80% of aligned sequences and only five residues conserved in at least 60% of the sequences. Fourteen additional positions, most of which were critical for hydrophobic packing, displayed amino acid similarity of 60% or higher. Ten conserved sequence motifs were identified in HOs and RNRs. Phylogenetic analysis verified the existence of the four distinct groups of HO homologues, which were then analyzed by group entropy analysis to identify residues critical to the unique function of each enzyme. Other methods for determining functional residues were also performed. Several common index positions identified represent critical evolutionary changes that resulted in the unique function of each enzyme, suggesting potential targets for site‐directed mutagenesis. These positions included residues that coordinate ligands, form the active sites, and maintain enzyme structure. Enzymes Heme oxygenase (EC 1.14.14.18), methane monooxygenase (EC 1.14.13.25), ribonucleotide reductase (EC 1.17.4.1), thiaminase II (EC 3.5.99.2).

Heme oxygenases (HO) catalyze the breakdown of heme, aiding the recycling of its components. Several other enzymes have homologous tertiary structures to HOs, while sharing little sequence homology. These homologues include thiaminases, the hydroxylase component of methane monooxygenases, and the R2 component of Class I ribonucleotide reductases (RNR). This study compared these structural homologues of HO, using a large number of protein sequences for each homologue. Alignment of a total of 472 sequences showed little sequence conservation, with no residues having conservation in more than 80% of aligned sequences and only five residues conserved in at least 60% of the sequences. Fourteen additional positions, most of which were critical for hydrophobic packing, displayed amino acid similarity of 60% or higher. Ten conserved sequence motifs were identified in HOs and RNRs. Phylogenetic analysis verified the existence of the four distinct groups of HO homologues, which were then analyzed by group entropy analysis to identify residues critical to the unique function of each enzyme. Other methods for determining functional residues were also performed. Several common index positions identified represent critical evolutionary changes that resulted in the unique function of each enzyme, suggesting potential targets for site-directed mutagenesis. These positions included residues that coordinate ligands, form the active sites, and maintain enzyme structure.
Heme oxygenase (HO, EC 1.14.14.18) aids in recycling iron and porphyrin rings by catalyzing the decomposition of heme to biliverdin, CO, and free iron. The reaction requires three molecules of oxygen, NADPH, and cytochrome P450 reductase. In animals, biliverdin is converted to bilirubin, which acts as an antioxidantprotecting neurons [1] and which is later combined with glucuronic acid and excreted [2]. CO functions as a signaling molecule via guanylate cyclase, playing several roles in vasoregulation [3]. The iron released is the main source of metal in regenerating heme and nonheme cofactors [4,5]. Two HO isoforms exist in vertebrates, heme oxygenase-1 (HO1) and heme oxygenase-2 (HO2). HO1 and HO2 are both membrane-bound proteins and share 55% identity in humans [6]. Human HO2 is expressed constitutively in the liver, brain, and testes. Human HO1 is an inducible protein found in most tissues, especially the spleen and liver [7,8]. HO1 is also expressed in cancer cells. HO1 has been associated with adhesion and signaling in cancer cells, enabling growth and spread of tumors [9]. In addition, elevated HO1 levels in mice appear to correlate with insulin resistance, while HO1-knockout mice were insusceptible to insulin resistance and exhibited lower levels of inflammation [10]. A heme oxygenase-3 enzyme has been identified in humans, but is an inferior catalyst for heme breakdown [8]. In Arabidopsis, which possesses four HO genes, HO is essential for the synthesis of light-sensing phytochrome chromophores [11,12]. Corynebacterium diphtheriae uses a HO homologue (HmuO), which shares 33% sequence identity to human HO1, to acquire iron for growth from its host's own heme molecules [13].
There are several different enzymes with diverse functions that share structural homology to HOs. One of these enzymes is another oxidoreductase, methane monooxygenase (MMO, EC 1.14.13.25). MMO is found in aerobic methanogens and catalyzes the NAD (P)H-dependent conversion of methane and oxygen to methanol [14], although it can act upon slightly longer hydrocarbons [15]. There are three components to the structure of a soluble MMO: a reductase, a regulatory B component, and a hydroxylase (MMOH). The structure of MMOH has three subunits (a, b, and c) and a hydroxo-bridged diiron cluster. This is found in the a subunit only near the hydrophobic pocket that is the active site and is positioned between helices aB, aC, aE, and aF. Interaction of the b subunit with the reductase component of MMO affects the redox potential of this diiron cluster in the a subunit [16]. Both the a and b subunits of MMOH show homology (approximately 12% sequence identity) to HOs and the other homologues addressed here [17].
Ribonucleotide reductases (RNRs, EC 1.17.4.1) are catalysts in the process of forming deoxyribonucleotides from ribonucleotides. The four deoxyribonucleotides that are produced are important for DNA synthesis in living organisms. RNRs have been subdivided into three classes based upon the way they form a radical in the reaction mechanism. Class I RNRs have a diiron cluster, while Class II RNRs use adenosylcobalamin and Class III RNRs utilize a iron-sulfur cluster and S-adenosylmethionine. While members of the three classes share little sequence homology, there is a core tertiary structure common to each [18]. However, it has been noted that Class I RNRs also share structural homology to HOs, thiaminases, and MMOHs [17]. Class I RNR in Escherichia coli is made up of two dimeric proteins, R1 and R2, which differ in structure. R1's structure includes binding sites for substrates, allosteric sites (the specificity and activity sites), and critical cysteine residues. R2, which is a homologue of HOs, has a highly oxidized redox center that is formed by a diiron cluster and a critical tyrosyl radical on Tyr122 that activates the substrate ribose. The tyrosyl radical is needed for the protein to act as an enzyme [19]. Like MMOH, it utilizes oxygen and a diiron cluster; however, RNRs reduce hydroxyls on ribose, while MMOH acts to form a hydroxyl on methane. Class I RNRs have been subdivided into two groups: Ia, which are represented by E. coli R2 subunits, and Ib, which are represented by the R2F subunit from RNR from Salmonella typhimurium. Ib RNRs (RNR1Bs) appear to be solely in bacteria, while Ia RNRs (RNR1As) are in both bacteria and eukaryotes. While sharing a similar structure to Ia, Ib differs functionally through a lack of allosteric regulation and the need for small redox protein produced from the same operon as the RNR [20].
TenA modifies the aminopyrimidine to hydroxypyrimidine and is found throughout all three domains of life [21]. The TenA gene product was initially identified as a transcriptional regulator (Treg) of exoenzyme secretion in Bacillus subtilis [22]. However, the TenA homologue in Pyrococcus furiosus lacks appropriate surface charges and therefore may not participate with DNA interactions [17]. Pyrococcus furiosus TenA only shares 14% sequence identity to human HO1 [17]. The Saccharomyces cerevisiae THI20 protein is actually a trifunctional enzyme which includes a C-terminal tetrameric TenA-like domain fused to an N-terminal ThiD domain which catalyzes another step in thiamin metabolism [23]. French et al. [23] also identified two structural subclasses of TenAs: one group similar to B. subtilis TenA with a conserved cysteine and the other similar to P. furiosus TenA with a conserved glutamate instead.
There has not been an extensive comparative study performed on HO and its homologues: thiaminases, MMOHs, and Class I RNRs. The goal of this research was to align a large number of HO protein sequences and those for these related homologues. We then attempted to identify the structural and possible functional roles of conserved residues and sequence motifs in these enzymes. Group entropy analysis and other methods indicated group-specific conservations for each homologue, identifying key positions that may contribute to the unique function of each enzyme.

Results and Discussion
Structure and residue conservations A total of 472 sequences were aligned (Fig. 1) using tertiary structural alignment as a guide (entire alignment available in Fig. S1). Above each amino acid position column is an index number, which is numbered concurrently from the beginning of the alignment; these index numbers will be used to reference each position throughout this manuscript. Two hundred of the sequences analyzed were HOs from animal, plant, and bacterial species; 141 of the sequences were thiaminases; 41 of the sequences were MMOHs (30 alpha chains and 11 beta chains); and 90 of the sequences were Class I RNRs. MAPSCI structural alignment [24] indicates that the tertiary structures of all four homologues (human HO1, PDB ID: 1NI6; P. furiosus thiaminase, PDB ID: 1RTW; Methylosinus trichosporium MMOH alpha chain, PDB ID: 1MHY; and RNR beta chain from E. coli, PDB ID: 1XIK) are conserved with a core RMSD value of 1. 37 A and core size of 77, which represents the percentage of the length of the shortest protein aligned (1RTW at 201 amino acids) that have 100% conservation and are within 4.0 A of each other in the final tertiary alignment. Interestingly, one can see that the active sites of HOs and thiaminases utilize different spatial coordination of the hydrophobic pocket, perhaps due to the differences in their enzymatic function (Fig. 2).
The overall residue conservation within the HO structural homologues was low. No residues showed conservation of 80% or higher in the entire alignment, five residues exhibited conservation of 60% or higher, and 40 additional residues displayed conservation of 40% or higher. The majority of these conservations were found within the a-helices. ConSurf analysis [25] was used to visualize these evolutionary conservations. In human HO1 (hHO1, sequence HomSapHOx1, PDB ID: 1N45) and the thiaminase from P. furiosus (PfThiaminase, sequence tactPyrFur, PDB ID: 1RTW), highly conserved residues were located in the core of the structure (Fig. 3). In hHO1, these residues conserved in the entire alignment line the heme-binding site (Fig. 3A). Alignment index positions ( Fig. 1) for each residue are provided in curved brackets. These conservations include His25{142} which coordinates the heme iron (Fig. 3C). Tyr134{325} and Arg183 {384} form hydrogen bonds to the porphyrin ring. Mutations of the equivalent residue at index 384 in Corynebacterium diphtherae HO, Arg177{384}, caused a significant decrease in catalytic rate [27]. Ala28{145} and Gly143{335} make hydrophobic contacts to the porphyrin ring. A G143H mutation inactivated both mouse and human HO1 [28,29]. Tyr58{184} is part of a larger hydrogen bonding network, which will be discussed below [30]. Note that the degree of conservation in the entire alignment, highlighted in Fig. 1, is influenced by the number of sequences for each homologue. For example, HOs represent 200 of the total 472 amino acid sequences aligned. In PfThiaminase, several conserved residues also lined the binding site for HMP-P (Fig. 3B). These residues include Phe22 {151} which packs with Phe36{163} near the end of the active site pocket. The hydroxyl of Tyr105{268}, which is near the back of the active site pocket, hydrogen bonds with the side chain carboxylate of Glu198 {423}, which coordinates the bound HMP-P ( Fig. 3D) [17]. Y112F and E205A mutations at indices 268 and 423, respectively, in B. subtilis TenA (BsThiaminase) both caused significant decreases in catalytic function [21]. None of the other residues that contact HMP-P are conserved above 60% in the entire alignment, perhaps because thiaminases represent only a quarter of all aligned sequences.
Five residue positions are conserved above 60% in our entire alignment: Leu17{132}, Phe33{151}, Gly40 {166}, Phe47{173}, and Tyr114{268} (hHO1 residue identity with alignment index position in curved brackets). Four of five residues are involved in hydrophobic packing, as observed in the hHO1 structure [30]. Leu17{132} is 69% conserved overall and was leucine in thiaminases, RNR1As, and HOs except plant species. This position also has 84% similarity with aliphatic hydrophobic residues (I, V, L, and M) throughout the entire alignment. This residue is found in a-1 and is involved with hydrophobic packing with Ala203 {416} (not conserved) and Tyr134{325} (44% conserved), which is located in the heme-binding pocket of HO. Phe33{151} is 63% conserved overall and is found in thiaminases and HOs except plant species. This residue is found in the proximal helix of HOs and functions in hydrophobic packing with Val50{176} (not conserved). Gly40{166} is 63% conserved overall and is found in thiaminases and HOs except plant species. This residue is found in the turn between a-1 and a-2 [30]. Conserved glycines are often found in turns in enzyme structures or at positions where a side chain would lead to steric conflicts in many enzyme families [31][32][33][34]. Phe47{173} is 71% conserved overall and is found in RNR1As, most thiaminases, and HOs except Fig. 1. Summary alignment showing a representative sequence for each group of HO homologues: human HO1 (HomSapHOx1) as a HO, Pyrococcus furiosus thiaminase (tactPyrFur) as a thiaminase, the alpha (MMOAMetTri) and beta chains (MMOxMetTri) of the MMOH from Methylosinus trichosporium as MMOH chains, the RNR beta chain from Escherichia coli (RDR1Esccol) as a RNR1A, and the RNR R2F protein from Salmonella typhimurium (RRR2SalEnt) as a RNR1B. The entire alignment, which contains 472 protein sequences, is found in Fig. S1. Residue positions are colored based upon their conservation in the entire alignment (Fig. S1) as follows: green = 60-79% conserved and blue = 40-59% conserved. Indel (gap) positions from the entire alignment ( Fig. S1) are retained to allow correlation with index position numbers (numbers shown above the alignment columns) that are noted within the text. plant species. This residue is found in the beginning of a-2 and is involved in hydrophobic packing with Ala151{348} (not conserved) that is found in the distal helix of HOs. Tyr114{268} is 66% conserved overall and is found in all HOs and most thiaminases except proteins identified as Tregs. This residue is found in a-4 in hHO1 [30]. In hHO1, the side chain hydroxyl of Tyr114{268} is 3.1 A from NH1 of Arg136{327} (not conserved) from the distal helix of HOs, and 3.0 A from ND1 of Asn210{423}. In PfThiaminase, the side chain hydroxyl of Tyr105{268}, which is part of the catalytic triad [21], is 2.7 A from the side chain carboxylate of Glu198{423}, which contacts the bound HMP-P [17]. Glutamate is 46% conserved at index 423 in the alignment, primarily in all RNRs and thiaminases except Tregs.
Fourteen additional residue positions also display amino acid similarity of 60% or higher within these Evolutionarily conserved residue positions as determined by the CONSURF program [25]. Shown are human HO1 (PDB ID: 1N45) with bound heme ring in black (A) and TenA thiaminase from Pyrococcus furiosus (PDB ID: 1RTW) with bound HMP-P in black (B). Residue conservation scale is from CONSURF website. Note how most conserved positions surround the active site and protein core. The thiaminase is shown as a ribbon diagram as the HMP-P is more buried and would be harder to see. LIGPLOT+ diagrams [26] are shown to highlight residues that contact heme (designated Hem300) in human HO1 (C) and HMP-P (designated Mp51213) in P. furiosus thiaminase (D) (similar to figure previously published [17]). Blue boxes surrounding the residue names indicate that these residues are at least 40% conserved in the entire amino acid sequence alignment. Most of the residues in each active site are not conserved in the entire alignment due to the diversity of the structural homologues aligned.
structural homologues. All but one of these positions are hydrophobic residues involved in hydrophobic packing. Index position 152 (Met34 in hHO1) exhibits 79% similarity of aliphatic hydrophobic residues (I, V, L, and M) in HOs except plant species, thiaminases except Tregs, and RNR1As. Met34{152} faces ameso-face of heme in the hHO1 structure [30]. Index 168 (Val42 in hHO1) shows 62% similarity of aliphatic residues in HOs except plants species and most thiaminases. Index 177 (Met51 in hHO1) displays 63% similarity of aliphatic residues in a few HOs, all thiaminases, and RNR1As. Index 230 (Leu93 in hHO1) shows 71% similarity of aliphatic residues in all HOs, a few thiaminases, MMOHs, and all RNRs. Index 272 (Leu118 in hHO1) exhibits 93% similarity of aliphatic residues in all HOs, most thiaminases, and all RNRs. Index 320 (Val130 in hHO1) shows 84% similarity of aliphatic residues in most HOs, most thiaminases, the beta chain of MMOHs, and RNR1As. Index 340 (Leu147 in hHO1) shows 68% of aliphatic residues in all HOs, some thiaminases, and the alpha chain of MMOHs. This residue is located on the distal helix and contacts with heme in HOs [30]. The CD2 atom of Leu147{340} forms a hydrogen bond with the NE atom of Gln38{156} [35]. Val139{340} in PfThiaminase is about 3.6 A from Leu23{152}. Index 367 (Phe166 in hHO1) displays 71% similarity of aromatic residues with mostly phenylalanines in HOs and tryptophans in thiaminases. Index 387 (Met186 in hHO1) exhibits 85% similarity of aliphatic residues in HOs, MMOHs, RNR1As, and some thiaminases. Met186 {387} is located about 6.6 A from the heme molecule in hHO1. In PfThiaminase, Leu172{387} is located 4.2 A from Phe122{320}. Index 388 (Asn187 in hHO1) shows 82% similarity of aspartate or asparagine (asparagine is 58% conserved in the entire alignment) in HOs, RNR1As, and some thiaminases. Index 388 appears to maintain protein folding. The side chain OD1 atom of Asn187 {388} is 3.2 A away from the main chain nitrogen of Leu13{128} in hHO1. In PfThiaminase, the ND atom of Asn173{388} is 3.0 A from the hydroxyl of Ser3 {129}, and the OD1 atom is 2.8 A from the main chain nitrogen of Ser3{129}. Index 391 (Leu189 in hHO1) displays 85% similarity of aliphatic residues in HOs, MMOHs, RNR1As, and some thiaminases. Index 412 (Val199 in hHO1) exhibits 61% similarity of aliphatic residues in most HOs, some thiaminases, and the alpha chain of MMOHs. Index 427 (Phe214 in hHO1) shows 63% similarity of aromatic residues (phenylalanine or tryptophan) in HOs except plant species and most thiaminases. It is found on the wall opposite the a-meso-face of heme in hHO1 [35]. Index 430 (Leu217 in hHO1) displays 74% similarity of aliphatic residues in HOs, MMOHs, RNRs, and a few thiaminases.
MAST was used to search for matches in the protein database using the motifs. The strongest matches were RNRs (lowest E-score = 3.4e-154), followed by HOs (lowest E-score = 7.5e-30). As noted, the ten motifs identified in this study were only present in RNRs and HOs. Two weaker scoring matches that occurred multiple times were the prokaryotic MreB rod-shape determining bacterial actin protein (A0A1H1VW64_9PSED) and the SecB protein-export protein (A0A062MXT3_9-GAMM). MreB showed homology to motifs 3 and 10; however, the E-score was only 0.068. The MreB protein from Caulobacter crescentus (PDB ID: 4CZM) [37] did not show any structural homology to HOs or its related homologues. SecB showed homology to motifs 6 and 10; however, the E-score was only 1.7. The SecB chaperone protein from E. coli (PDB ID: 1QYN) [38] also did not show any structural homology to HOs or its related homologues.

Phylogenetic analysis
An unrooted neighbor-joining phylogenetic tree of the HO structural homologues (Fig. 5) was generated from the entire amino acid sequence alignment. This neighbor-joining method was chosen because it has been shown to yield excellent evolutionary relationships in some families [34,39]. Maximum-likelihood and parsimony methods require excessive computations for large data sets. A bootstrapped parsimony tree using only 50 data sets (Fig. S2) had similar sequence groupings and group arrangements to the neighbor-joining tree using 250 replicates. The neighbor-joining tree was subsequently used as the basis for assigning each sequence into a group for group entropy analysis. HOs, thiaminases, MMOHs, and RNRs cluster into distinct clades and these four groups were used for group entropy analysis. Groups were named based upon the representative tertiary structures present in each clade. MMOHs and RNRs are more closely related to each other, but both are more distantly related to HOs.
Distinct subgroups were also evident for each group. For example, in HOs vertebrate HO1s and HO2s, as well as plant and bacterial HOs, were isolated in distinct clades. In thiaminases, three distinct subgroups were apparent. There was a group of sequences similar to PfThiaminase and another group similar to BsThiaminase, as previously observed [23]. There was also a subgroup of Tregs identified in the thiaminase group. It is possible that these Tregs may not possess the enzymatic function of other thiaminases as they are missing certain residues found in other thiaminases, such as Glu205{418} and Cys135{322} in BsThiaminase [21], where Tregs have threonine and alanine in those positions, respectively. In MMOHs, the alpha and beta chains cluster separately. In the RNRs, both RNR1As and RNR1Bs are located in distinct clades.

Group entropy analysis of HOs
The GEnt program [40] was used to calculate 'Group Entropy' and 'Family Entropy' values for each position in the alignment. Residue positions studied here  had the highest Group Entropy scores within each homologue group, indicating they are well conserved within its group, and low Family Entropy scores, indicating they are not well conserved throughout the entire alignment. These residues would indicate unique residue conservations within the enzyme group. For the current study, residue positions examined had a Group Entropy score of at least 10.0, with Family Entropy score of no more than 3.0. The GEnt program utilizes the Kullback-Leibler method to calculate a divergence measure to detect covariance. It was chosen as it allows the user to define each group, to which the user can assign sequences from an alignment. Also, GEnt does not consider positions with gaps [40]. The GEnt program has been used to identify group-specific conservations in Class 3 aldehyde dehydrogenases [40] and NDPsugar dehydrogenases [34]. Eight residues were found to have the highest Group Entropy scores in the HO group (Table 2, complete GEnt results can be found in Table S1). Examination of residues was made with hHO1 (sequence HomSa-pHOx1, PDB ID: 1N45). The residue position with the highest Group Entropy score is index 142. His25{142}, which is fully conserved on the proximal helix in all HOs, is the critical heme iron ligand [30] (Fig. 3C). A H25A mutation in hHO1 led to inactivation of the enzyme [41]. Next, Tyr58{184} interacts with several polar residues near pyrrole B [30]. In hHO1, the hydroxyl of Tyr58{184} hydrogen bonds with the side chain carboxylate of Asp140{332} and is 3.5 A from the NE atom of Arg136{327}. Index 322 had the third highest Group Entropy score. The NE2 position of His132{322}, which is invariant in HOs, forms a salt bridge to the side chain carboxylate of Glu202 {415} [30]. Next, Arg136{327} is a residue that is invariant in the HOs of vertebrates but is isoleucine for plant species. The side chain of Arg136{327} is part of a hydrogen bonding network that also includes Asp140{332}, Asn210{423}, Tyr58{184} (also identified by GEnt above), and Tyr114{268}, which is 66% conserved in the entire alignment. This network may serve to help activate the oxygen in the catalytic mechanism [35]. A D140A mutation in hHO1 also converts the enzyme into a peroxidase, illustrating the critical nature of this hydrogen bond network [42].
Glu62{188} is conserved in all but one HO sequence, which has a conservative replacement with an aspartate. The side chain carboxylate of Glu62 {188} is located 3.4 A from the main chain nitrogen of the conserved Arg85{222}, acting to maintain enzyme structure. Next, Tyr182{383} is conserved in the HOs of vertebrates but is valine in plants. Its side chain hydroxyl is nearly 4 A from the main chain nitrogen of Tyr134{325} and the main chain oxygens of Ala133{323} and Val77{205}. Tyr134{325}, which is conserved in all HOs, is located on the distal helix and contacts the beta edge of heme [30,35]. Lastly, Pro170{371} lies on a surface loop and is a hydrophobic residue in the other groups.

Group entropy analysis of thiaminases
Eight residue positions in thiaminases had the highest Group Entropy scores (Table 3, complete GEnt results can be found in Table S2). Residues noted here were from PfThiaminase (sequence tactPyrFur, PDB ID: 1RTW). Index 142 had the highest Group Entropy score in thiaminases. Trp14{142} is involved in hydrophobic packing and is located where the hemebinding site is in HOs. The NE1 position of Trp14 {142} also hydrogen bonds to the side chain hydroxyl of Ser195{420}. Next, Val47{184} forms the wall of the HMP-P binding pocket [17] (Fig. 3D). The side chain carboxylate of Asp43{180} forms hydrogen bonds with the amino groups of the HMP-P ligand. A D44A mutation at index 180 in BsThiaminase caused a significant decrease in catalytic function [21]. Index 371 had the fourth highest Group Entropy score. Trp156{371} also lines the HMP-P binding pocket, making hydrophobic contact with the ligand. Next, Phe46{183} is at an index position of aromatic residues in thiaminases. The pyrimidine ring of HMP-P is located between the side chains of Tyr132{332} and Phe46{183} [17]. Interestingly, a Y47F mutation at index 183 in BsThiaminase leads to a significant impairment of catalytic function [21]. Also, a F47Y mutation at index 183 in Helicobacter pylori TenA disrupts the helical folding of helices D and E [43].
Ala205{430} is near the C terminus on the surface of the protein, but not at the subunit interface. Index 430 is a position that is an aliphatic residue in 74% of all aligned sequences. Next, Phe152{367} lies at a position in thiaminases that is predominantly aromatic. Its side chain is nearly 8. 6 A from the bound HMP-P. Lastly, His20{149}, which is 90% conserved in the aligned thiaminases, forms a salt bridge with the side chain carboxylate of Asp203{428} [17].

Group entropy analysis of MMOHs
GEnt analysis of MMOHs revealed that ten residues have the highest Group Entropy scores (Table 4, complete GEnt results can be found in Table S3). Residues from both the alpha (sequence MMOAMetTri) and beta chains (sequence MMOxMetTri) were researched in the MMOH from M. trichosporium (PDB ID: 1MHY). Several residues identified here make intersubunit contact. The highest scoring index position is index 265. Trp181{265} from the alpha chain makes hydrophobic contact with Pro61{74} from the beta chain. The residue from the beta chain at index 265 is Tyr227{265}, which is involved in hydrophobic packing within the beta chain. Next, Leu82{142} is conserved in all MMOHs except for two that have alanine. This index position is involved in hydrophobic packing for both alpha and beta chains. The third highest Group Entropy position is Met184{268} and it lines the hydrophobic cavity where the diiron cluster is located in the alpha chain [16]. Index 268 is a tyrosine in 66% of all sequences in the entire alignment. Glu114{184}, which is fully conserved in MMOHs, coordinates the FE1 iron atom and is one of four residues that coordinate the iron and hydroxo ligands [16]. The fifth highest Group Entropy score is Tyr115 {185}, which is invariant in MMOHs. The hydroxyl of Tyr115{185} in the alpha chain is located 3.4 A from the main chain nitrogen of Asp176{202} in the beta chain, making intersubunit contact. Tyr160{180} in the beta chain is about 5 A from Trp37{74} and Phe35{72} in the alpha chain.
Next, Phe153{229} in the alpha chain is involved in hydrophobic packing with Leu55{68} from the beta chain, making intersubunit contact. The seventh highest scoring index position is Pro233{365}, which is fully conserved in both alpha and beta chains of MMOHs. This residue is at the beginning of a-F that lines the diiron cluster-binding site, likely positioning the helix. Next, the side chain of Arg98{168} in the alpha chain functions by forming a salt bridge to Asp365{515}. In the beta chain, Trp143{168} is involved in hydrophobic packing. The ninth highest scoring index position is index 169. Trp99{169} is a residue that is involved in hydrophobic packing in the alpha chain. The last residue examined is Gly113 {183}. This residue is fully conserved in the alpha chain and lines the pocket for the diiron cluster. The lack of a side chain on this residue could be important for positioning of the iron atoms.

Group entropy analysis of RNRs
Eleven index positions displayed the highest Group Entropy scores in RNRs (Table 5, complete GEnt results can be found in Table S4). Residues were examined using EcRNR1A (sequence RDR1Esccol, PDB ID: 1XIK). Trp49{138} had the highest Group Entropy score and is invariant among all RNRs. Trp49{138} likely serves as the tryptophan radical, providing an electron to completely reduce the dioxygen. It also hydrogen bonds to the iron ligand His118{223} via Asp237{375} [19]. Arg149{268} is the next residue and is also fully conserved in RNR1As but is lysine in RNR1Bs. The NE position of Arg149{268} is located 3.3 A from the side chain carboxylate of Glu283{423}. Arg149{268} also stacks against Trp286{426} [19], which is also identified by GEnt. Trp286{426} is invariant in RNR1As but is tyrosine in RNR1Bs. The NE1 position of Trp286{426} hydrogen bonds with the hydroxyl of Ser75{175}. Trp286{426} is also involved in hydrogen bond changes in a S211A mutant [19].
Phe212{336}, which is fully conserved in all RNRs, forms a conserved hydrophobic pocket with residues Phe208{332} and Ile234{371} for the binding of dioxygen [19]. Next, Asp84{184} is invariant in all RNRs and coordinates one of the iron ligands [19]. The position with the sixth highest Group Entropy score is Glu52 {142}, which is on the surface of the molecule near the subunit interface. Analysis of an E52Q mutant in the beta chain of RNR1A appears to suggest that Glu52 {142} is critical for the conformational change at the a/ b subunit interface that initiates radical formation [44]. Next, Phe47{135}, which is fully conserved in all RNRs, ring stacks with Trp286{426} between subunits. Another residue identified by GEnt that is invariant in all RNRs is Gln87{187}. The OE1 position of Gln87 {187} is located 2.9 A from the side chain carboxylate of Glu204{328}, which is an iron ligand [19]. Next, the hydroxyl of Tyr79{179} hydrogen bonds to the side chain carboxylate Glu283{423}, which also interacts with Arg149{268} noted above. The Ca of Thr81{181}, which is fully conserved in all RNRs, is 3.8 A away from the side chain of Tyr122{227}, which is suggested to be the radical in the enzyme mechanism [19]. Lastly, Leu203{327} is involved in hydrophobic packing with the side chain of Gln87{187}, also identified by GEnt.

Alternate methods for determining functional residues
The evolutionary trace method was developed to identify critical residues in active sites and clusters of residues at functional interfaces in proteins. The program generates a dendrogram of aligned sequences and creates evolutionary partitions that divide the tree into distinct families and subgroups. Residues unique to each group, both buried and on the surface, can be identified [45,46]. For HOs, partition 2 groups all HOs in one group, but does not identify any group-specific residues. It is not until partition 8, where HOs are divided into 16 subgroups, that six group-specific residues are identified ( Table 6). Of these, only His25{142} and Tyr58{184} are among the residues in HOs with the highest Group Entropy scores (above). After the tenth and final partition, all but one, Pro170{371}, of the highest scoring residues in GEnt are identified by evolutionary trace, but there are an additional 28 residues also identified in the 21 HO subgroups. This highlights one of the advantages of GEnt that one can define each group with representative sequences, instead of following arbitrary partitions.
Evolutionary trace also does not result in any group-specific residues from partition 2 of thiaminases, where thiaminases were divided into only two groups. It is not until partition 8, where thiaminases are divided into 20 subgroups, that six group-specific residues are identified ( Table 7). Of these, only Trp14{142} and Val47{184} are among the residues in thiaminases with the highest Group Entropy scores (above). It is also interesting to note that the residue positions identified by evolutionary trace in HOs are the same ones identified in thiaminases. This coincides with the fact that HOs and thiaminases are on neighboring clades in the phylogenetic tree (Fig. 5).
Evolutionary trace partition 2 places MMOH alpha and beta chains in separate groups. Evolutionary trace partition 4, the last partition to have the alpha and beta chains in one group each, identified five groupspecific positions in the beta chain: Trp221{258}, Trp238{276}, Phe259{331}, Phe267{339}, and Asp277 {351}. However, Trp221{258}, Trp238{276}, and Phe267{339} are at alignment positions that are primarily indels in other families. In the M. trichosporium, MMOH alpha chain partition 4 identified 18 positions (Table 8). However, 15 of these residues are at or above alignment index 468, which constitute the C terminus, and residues that extend beyond the length of all of the other homologues (beyond index 540 of the summary alignment in Fig. 1). Thus, similar to the residues in the beta chain at indel positions, most of these positions do not have residues in other homologues to which to compare. Of the residues identified in both chains by evolutionary trace, only Leu82{142} and Glu114{184} identified in the alpha chain score highly for Group Entropy in GEnt. Another feature of GEnt analysis is that it does not consider positions with gaps. Thus, most positions identified by evolutionary trace for MMOHs were not identified by GEnt.
Evolutionary trace partition 2 places all RNRs in the same group, while partition 8 is the last to group RNR1A and RNR1B in one group each. Partition 2 did not identify any group-specific residues in all RNRs. Partition 8 identified 12 group-specific residues in RNR1As (Table 9). Partition 8 identified 11 groupspecific residues in RNR1Bs, all of which were also identified in RNR1As (Table 9). Of the residue positions identified in both RNR groups, indices 142, 179, and 184 were highly scoring for Group Entropy for all RNRs in GEnt.
In addition, six other algorithms were used to identify functional residues in HOs, thiaminases, MMOHs, and RNRs: Jensen-Shannon divergence, property entropy, VN entropy, relative entropy, Shannon entropy, and sum of pairs analysis [47]. Each algorithm was used with combinations of all possible backgrounds and matrices. Only residues that were identified for all 21 combinations of backgrounds and matrices were reported as results for each method (Tables 6-9).
Index 184 had the second highest Group Entropy score in HOs and thiaminases, fourth highest in MMOHs, and fifth highest in RNRs. In HOs, Tyr58 {184} is involved in a hydrogen bonding network close to pyrrole B that may activate the oxygen in the catalytic mechanism [35]. In thiaminases, Val47{184} forms the wall of the HMP-P binding pocket [17]. The residues in MMOH alpha chain, Glu114{184}, and RNR1A, Asp84{184}, both coordinate iron ligands [16,19]. Thus, this index position is critical for the unique function of each homologue.
the critical nature of this network. However in RNRs, Leu203{327} is involved in hydrophobic packing with the side chain of Gln87{187}, another residue identified by GEnt in RNRs, perhaps indicating a compensatory mutation. Although index 327 did not score highly for Group Entropy for thiaminases in GEnt, Cys135{327} in BsThiaminase serves as the catalytic nucleophile for the enzyme. A C135A mutation completely inactivates the enzyme [21].

Conclusions
While sharing some structural homology, HOs, thiaminases, methane monooxygenase hydrolases, and Class I RNRs share little sequence homology, with only five residue positions conserved in at least 60% of all 472 aligned sequences and fourteen more with amino acid similarities above 60% conservation. The most wellconserved sequence motifs were present in HOs and RNRs. Phylogenetic analysis revealed that the four homologues were distinct, and these four groups were used to perform group entropy analysis, as well as several other methods to predict functional residues. Despite the amino acid sequence and functional differences in each homologue, several common residue positions appeared critical in determining the unique function of each homologue. In HOs, the residue at index 142 coordinated the heme iron, while residues at indices 184, 268, 327, and 332 all participated in an essential hydrogen bonding network. Residues at indices 173 and 328 helped to maintain enzyme structure, and the residues at indices 179 and 183 line the heme-binding pocket [35]. In thiaminases, the residues at indices 142 and 173 are involved in hydrophobic packing. Index 179 maintains enzyme structure through a hydrogen bond. The residue at index 184 lines HMP-P binding site. The residues at indices 183, 268, 328, and 332 help to bind HMP-P, while the residue at index 327 acts as the catalytic nucleophile [17,21].
In MMOHs, the residues at indices 142, 173, and 179 contribute to hydrophobic packing. Residues at indices 184 and 328 coordinate the iron atoms, while residues at indices 183, 268, and 332 line the active site [16].

Materials and methods
The procedure used here was similar to that previously published [34].  [48] searches of the nonredundant protein database at the National Center for Biotechnology Information (NCBI). 472 related HO, thiaminase, MMOH, and RNR amino acid sequences were collected with percentage identities ranging from 99% to 4%. These sequences were initially aligned using T-COFFEE [49]. The alignment was manually adjusted using tertiary structure comparison of all structures using MAPSCI (http://www.geom-comp.umn.edu/mapsci/) [24] and through the RCSB PDB Protein Comparison Tool-jFATCAT method [50,51] using pairs of structures as a guide. The alignment editor used was GENEDOC [52]. Tertiary structures were used to assess the structural or functional roles of conserved residues within the alignment. Molecular visualization was performed using RASMOL [53]. Analysis of conserved sequence motifs was facilitated by MEME program, and these motifs were searched against a protein database using MAST [36]. Group entropy analysis (GEnt ) [40] was performed on the entire alignment to compare HO, MMOH, RNR, and thiaminase groups to each other. Evolutionary trace (http://mor dred.bioc.cam.ac.uk/~jiye/evoltrace/evoltrace.html) [45,46] was also performed on the entire alignment. Protein residue conservation prediction (http://compbio.cs.princeton.edu/c onservation/score.html) [47] was performed on subalignments of HO, MMOH, RNR, and thiaminase groups. Each algorithm was used using combinations of all three possible backgrounds (BLOSUM62, SwissProt, and PF) and all seven possible matrices (BLOSUM62, BLOSUM35, BLOSUM40, BLOSUM45, BLOSUM50, BLOSUM80, and BLOSUM100) distributed with the program. The PHYLIP suite of programs was used to generate phylogenetic trees [54]. First, the alignment was trimmed using TRIMAL [55]. 250 bootstrapped data sets of the trimmed alignment were then generated using the SEQBOOT program. Next, evolutionary distances for each data set were determined by the PROTDIST program using the Jones-Taylor-Thornton matrix. Phylogenetic trees for each data set were generated using the NEIGHBOR program. Lastly, the unrooted consensus tree was generated using the CONSENSE program. The tree graphic was generated using FIGTREE (available at http://tree.bio.ed.ac. uk/software/figtree). A parsimony tree was generated using 50 bootstrapped data sets using the PROTPARS program, followed by CONSENSE.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Fig. S1. Complete alignment of 472 heme oxygenase homologue sequences (MSF format). Fig. S2. Unrooted bootstrapped parsimony tree of the heme oxygenase homologues. Branches are color-coded based on enzyme type: blue = HOs, purple = thiaminases, red = MMOHs and green = RNRs. Specific subgroups within each enzyme group are labeled. Table S1. Complete GEnt results of HOs. Table S2. Complete GEnt results of thiaminases. Table S3. Complete GEnt results of MMOHs. Table S4. Complete GEnt results of RNRs.