Methionine adenosyltransferase (MAT) catalyzes the biosynthesis of S-adenosyl methionine from l-methionine and ATP. MAT enzymes are ancient, believed to share a common ancestor, and are highly conserved in all three domains of life. However, the sequences of archaeal MATs show considerable divergence compared with their bacterial and eukaryotic counterparts. Furthermore, the structural significance and functional significance of this sequence divergence are not well understood. In the present study, we employed structural analysis and ancestral sequence reconstruction to investigate archaeal MAT divergence. We observed that the dimer interface containing the active site (which is usually well conserved) diverged considerably between the bacterial/eukaryotic MATs and archaeal MAT. A detailed investigation of the available structures supports the sequence analysis outcome: The protein domains and subdomains of bacterial and eukaryotic MAT are more similar than those of archaea. Finally, we resurrected archaeal MAT ancestors. Interestingly, archaeal MAT ancestors show substrate specificity, which is lost during evolution. This observation supports the hypothesis of a common MAT ancestor for the three domains of life. In conclusion, we have demonstrated that archaeal MAT is an ideal system for studying an enzyme family that evolved differently in one domain compared with others while maintaining the same catalytic activity.
- common archaeal ancestor
- ancestral sequence reconstruction
- approximately unbiased
- bacteria ancestors
- eukarya ancestors
- universal common ancestor
- methionine adenosyltransferase
- Methanococcus jannaschii
- nucleotide triphosphates
- Protein Data Bank
- Pyrococcus furiosus
- S-adenosyl methionine
- sequence similarity networks
Common descent is one of the fundamental aspects of Darwinian evolution. This theory emphasizes that modern-day species share a common ancestry []. The same principle applies to enzymes: modern enzyme superfamilies across the three domains of life evolved from a set of enzymes that were already present in a last universal common ancestor (LUCA) that can be dated to over 3.5 billion years ago [[2, 3]]. Many efforts have been made to infer the minimal set of LUCA enzymes [[3, 4]]. The main hypothesis is that the evolutionary trajectory of enzymes (i.e., gene trees) is closely associated with and influenced by the evolution of their respective host organisms (i.e., species trees) []. However, in reality, gene evolution is much more complicated and disagreement between species trees and gene trees (noncongruence) can occur because of a wide range of factors such as gene duplication, lateral gene transfer [[6, 7]], and hybridization []. Within this framework, a few studies have reported an unusual distribution of some enzymes among the three domains of life, whereby the archaeal enzyme shows a divergence from the more closely related eukaryotic and bacterial orthologs [[9-12]]. One example is the methionine adenosyltransferase (MAT) enzyme.
The unusual sequence similarity of MAT enzymes among the three domains is striking; the archaeal enzyme is almost equidistant from the bacterial and eukaryotic enzymes, with ˜ 20% sequence identity, while the bacterial and eukaryotic enzymes share greater than ˜ 60% identity [].
Methionine adenosyltransferase is a ubiquitous enzyme, present across all three domains of life, that catalyzes the biosynthesis of S-adenosyl methionine (SAM) from l-methionine and ATP. SAM is a primary methyl donor in nature, and it is also considered to be an ancient cofactor []. SAM-dependent enzymes are likely included in the LUCA []. Most of these ancient SAM-dependent enzymes are radical methyltransferases [[15-18]]. Nowadays, SAM is involved in four important functions (Scheme S1) []: (a) methylation of fundamental biomolecules (proteins, DNA, RNA) [[20, 21]]; (b) polyamine synthesis []; (c) glutathione biosynthesis reactions []; and (d) radical chemical reactions []. Moreover, methylated molecules or macromolecules by SAM play an important role in cellular processes such as epigenetics, signal transduction, protein function, and genetics []. The importance of these biological processes makes the SAM-synthesizing enzyme, namely MAT, an important and essential enzyme for cell growth and survival [].
The structure of the MAT enzyme is unique and classified by the SCOP database in α/β class []. Each monomer of MAT is composed of a βαββαβ secondary structure []. No sequence conservation is observed among the repeats. Like many other enzymes in nature, MAT is active as an oligomer (either as a homo-oligomer or as a hetero-oligomer; Fig. 1) []. The oligomeric state provides a clear advantage over the monomeric state in terms of fitness, stability, and functions [[30, 31]]. In dimeric MAT, helixes are exposed to the surface and strands form a hydrophobic interface []. The MAT homodimer consists of monomeric subunits (α-subunits) paired together along a large flat hydrophobic interface (henceforth referred to as large interface; Fig. 1) wherein the active sites are enclosed []. A homotetramer exists as a dimer of homodimers; therefore, it has been suggested that the preferred functional quaternary structure is an oligomer (possibly an obligate dimer) [].
It has been suggested that all extant MAT enzymes share a common ancestry [] despite expressing a highly divergent form in archaea, as mentioned above. This further indicates that the adaptation of enzymes during evolution can lead to the accumulation of mutations at the surface or interface regions, especially near the active sites. This phenomenon can allow for the divergence and acceptance of different substrates or functions [[34-39]]. Despite this contrast, it has also been established that archaeal MAT can still perform the same catalytic reaction as their orthologous counterparts (from bacteria and eukarya); therefore, the structural and functional implications of this divergence are not well understood. These observations prompted us to take a closer look at the evolutionary trends adapted by the active site residues and the large interface region of MAT. Therefore, we probed the putative evolutionary trajectory of archaeal MAT more systematically by conducting investigations at various levels, including sequence studies and structural comparisons, assessment of physiochemical properties, and ancestral sequence reconstruction (ASR). We further elucidated this potential evolutionary trajectory by resurrecting the ancestral archaeal MAT sequences and characterizing their biochemical properties. In conclusion, we showed that the archaeal MAT can display high degrees of divergence at the sequence/structural level and yet perform the same catalytic reactions as their orthologous counterparts.
Results and Discussion
Probing the MAT sequence space
In the present study, we conducted extensive searches to systematically probe the MAT sequence space across the three domains. The sequences were collected by combining the outcomes of different databases (nr db, eggnog, orthodb, etc.). We built sequence similarity networks (SSNs) upon compiling the first input dataset (˜800 sequences), which allowed us to check the MAT sequence space distribution across the three domains (Fig. 2A). During this analysis, we considered that the archaeal domain has not been sequenced as extensively as the other two domains of life []. Because phylogenetic trees may not be an optimum tool to highlight the overall sequence distribution pattern, we utilized two approaches, EFI and CLANS [], to visualize the sequence distribution pattern (Fig. S1). Despite the variation in the percentage sequence identity clustering ranging from 100% to 40% ID, the overall topology remained consistent with two major clusters representing: (a) ‘bacteria + eukarya’ and (b) archaea. The SSN outcomes reflected the sequence identities as observed in nature, with archaea found to be nearly equidistant from eukarya and bacteria, at ˜ 20% sequence identity. The SSN method provided an overall view of the topology for the dataset of interest and underlined the uniqueness of the archaeal domain.
Inferring the ancestral MAT sequences
We reconstructed the phylogenetic relationships by utilizing MAT sequences from the three domains of life (Fig. 2B). In the case of archaea, two major phyla [Crenarchaea and Euryarchaea (EuryAnc)] were tested for the overall tree topology based on the MAT sequences (gene tree), as well as corresponding 16s rRNA sequences (species tree) from the SILVA database (not shown here). Herein, the two topologies clearly indicated that the tree bifurcates into two distinct clades for Crenarchaea and EuryAnc; that is, the major phyla are well classified into their respective monophyletic groups. Subsequently, we prepared another set of trees by utilizing bacterial MAT sequences as an outgroup to extract the putative common archaeal ancestor ArchAnc (Fig. 3, violet star). We also extracted the ancestral sequences of Crenarchaea (CrenAnc) and EuryAnc MAT at the designated nodes highlighted by using a red and blue star, respectively (Fig. 3).
The sequence similarities of the ancestral sequences (including interface residues and active site residue comparisons) are detailed in the other sections.
For bacterial and eukarya MAT sequences, we prepared a separate phylogenetic tree (based only on MAT sequences). MAT sequences from these two domains constituted two distinct clusters (supported by principal component analysis clustering—data not shown), as observed in the tree topology as well (Fig. 4), which was then utilized to infer the common MAT ancestors for eukarya (EukaAnc) and bacteria (BactAnc).
Extant MAT sequences vs ancestors: active sites and evolutionary rates
We probed the evolutionary trend(s) for amino acid sites in MAT enzymes across the three domains of life. These trends were visualized by mapping the interface sites, as well as active sites on a representative X-ray structure (Fig. 5, Fig. S2). Additionally, we probed the level of conservation for active site residues in extant and ancestral sequences.
Active site residues
In the case of archaea, we observed that the MAT active site residues were highly conserved among the two major branches, that is, Crenarchaea and EuryAnc (residue numbers correspond to Pyrococcus furiosus (pfMAT) structure: Protein Data Bank (PDB): 6S83, []). These include the following residues involved in the interaction with SAM from chain A: H59, N61, D141, L142, and D280 and chain B: H30, D168, and Y268. Furthermore, these active site residues were completely conserved in the putative archaeal ancestor. A similar trend was observed in the cases of both eukarya and bacteria. This provides us with a key insight that despite observing differences in the active site residues, MAT enzymes from the three domains can still essentially retain the same catalytic reaction (Fig. 5).
To determine whether part of the protein evolved at a different rate compared with the rest of the structure, we determined the evolutionary rate distribution of all residues in the MAT structure (Fig. S2). Interestingly, most sites located along the large interface tend to evolve slowly (highlighted in blue), while the opposite trend is observed as we move further away from the large interface. For instance, some of the residues located along the surface α-helices and loop regions tend to experience higher evolutionary rates. This pattern can be seen in MAT enzymes from all three domains of life, thereby indicating that conservation at a large interface is probably critical for enzyme function despite the domain that the enzyme belongs to.
Large interface: charge and hydrophobicity distribution
In case of MAT enzymes, the constituent monomers pair up in an inverted configuration by exposing the α-helices toward the surface, while the β-strands interact to form a hydrophobic large interface that harbors the active site residues, thereby making this homodimer the obligate functional unit []. Herein, we noticed that divergence is not only limited to active residues, or the residues located close to the substrate, but most of the interdimer interface (large interface) residues also diverge across the three domains of life. For instance, in cases of bacteria (eMAT, PDB: 1RG9 []) and eukarya (hMAT1A, PDB: 6SW5 []) 38 out of the 51 aligned residues were identical. In contrast, archaea (pfMAT, PDB: 6S83) shares only 12 identical residues with bacteria (eMAT) and eukarya (hMAT1A; Figs S3 and S4). This further prompted us to perform a systematic analysis of the large interface region across the three domains of life, and thus, we assessed the charge and hydrophobicity distribution across the large interface and compared it with the ancestral sequences to gain more insights into the evolutionary patterns involved. We studied 17 experimentally solved MAT structures from PDB to probe the physiochemical properties, by creating two datasets: (a) for identification of 51 structurally aligned positions along the large interface (which includes up to five noninterface residues as well) and (b) for identification of 24 structurally aligned ‘interface-only’ residue positions.
Large interface region (extant)
Based on the 51 structurally aligned positions, even at the sequence level, the large interfaces of bacterial and eukarya MATs have a striking sequence identity of ˜ 50–70% (Figs S4 and S5). However, the archaeal interface sequences are nearly equidistant from the other two domains at ˜ 15–30% sequence identity, despite having a comparable size in terms of area (˜ 1660–2970 Å2). The hierarchical clustering plots (based on the pvclust package in r; Fig. 6 []) show that the sequence identities further dictate the physiochemical properties as well. For instance, a look at both the P-values provided by pvclust for comparison of the clusters, that is, approximately unbiased (AU) P-value and bp (bootstrap probability) value, reveals that the hydrophobicity and charge distribution are clustered together for bacterial and eukarya MAT structures, with the archaeal MAT structures clustering off separately. An additional analysis was also conducted for the second dataset, with an alignment of 24 ‘interface-only’ residues (Fig. S6), and in this case too, we observed similar results, with the archaeal MAT structures clustering off separately.
Ancestral vs extant (large interface)
In this study, we compared the large interface residues extracted from the ancestral sequences against their extant counterparts. A comparison of the large interface residues from the bacterial MAT ancestor with the corresponding residues from eMAT highlighted a difference of 13 out of 51 residues (Fig. S4). Likewise, we also observed similar changes in eukarya: hMAT1A vs eukarya MAT ancestor, with seven residue changes (out of 51 residues); and archaea: pfMAT vs archaeal MAT ancestor, with six changes (out of 51 residues).
In conclusion, these observations provide crucial insight that sequence identities further influence the distribution of corresponding physiochemical properties. Therefore, the hydrophobicity and charge distribution, as described in the hierarchical clustering (based on pvclust), are important in understanding the evolutionary trends adapted by the large interface. However, we observed that the active sites remained conserved to some extent across MAT enzymes.
Structural comparison supports sequence divergence
Structural comparisons of MAT full domains from archaea, bacteria, and eukarya were performed to identify their common and unique structural features. We noticed that at a structural level, bacteria and eukarya show a higher degree of structural similarity compared with archaea (Fig. S7). In addition, an N-terminal extension in archaea segment-swaps into the C-terminal subdomain in contrast to bacteria and eukarya, which lacks the N-terminal beta-strand. In return, bacterial and eukaryotic MATs display an extra C-terminal helix. These unique features may explain why archaeal MAT is not dissected into subdomains by either the SCOP or ECOD domain criteria. Topology connections according to PDBsum [] are shown in Fig. S7.
We further extended our structural comparisons to the subdomain level. In this analysis, constituent domains (according to ECOD [[46, 47]]) from bacterial and eukaryotic MATs were used as templates to manually dissect archaeal subdomains (Fig. S8). Constituent structural domains were then compared all-on-all within (Fig. S8), as well as across, archaea, bacteria, and eukarya (Fig. S9). Subdomain superposition showed that despite their identical topology, subdomains I to III within archaea, bacteria, and eukarya differ considerably in their secondary structure element orientation in space. In contrast, the subdomains across archaea, bacteria, and eukarya are much more similar. In this study, bacteria and eukarya displayed a higher degree of structural agreement within themselves than with archaea.
The structural comparison suggests that MAT emerged partially by duplications of subdomains within the same domain of life or those putatively shared by eukarya and bacteria, but not by archaeal enzymes. This observation supports the outcome of the analysis at the sequence level.
Change in substrate specificity along the archaeal phylogenetic tree
It has been previously reported that the archaeal Methanococcus jannaschii (MjMAT) is promiscuous toward different nucleotides; that is, it is able to accept different nucleotide triphosphates (NTPs), while the bacterial ortholog eMAT is specific for the double ring adenine nucleotide base [[13, 48]]. This further prompted us to study the specificity of the purine bases, ATP, and GTP within the archaeal phylogenetic tree and to determine whether this information could provide us with more clues toward the origin and evolution of MAT (Fig. 3). We expressed and tested the activity of the ancestral archaeal MAT sequences and compared them against the representative extant MAT sequences from the three domains of life (Figs 3 and 4). Data on the specificity of the representative bacterial and archaeal extant MAT have already been reported in literature [[13, 48]]. We chose hMAT1A as a representative of eukarya. Since our purpose was to analyze the change in specificity within archaeal domain, we expressed and tested the activity of the ancestral MAT sequences for EuryAnc, CrenAnc, and the common archaeal ancestor (ArchaeaAnc) (Figs S10–S13, Figs S14–S17). First, we performed one end-point assay with EuryAnc, CrenAnc ArchaeaAnc, and hMAT1A, where we observed CrenAnc displays similar promiscuities toward all the bases as the extant MAT enzyme (Fig. 3), while EuryAnc was losing promiscuity toward cytidine triphosphate (CTP). Interestingly, ArchaeaAnc acquires specificity for ATP and GTP (KM ATP ArchaeaAnc 0.15 mm, KM GTP ArcheaArc 1.63 mm; Fig. 3A and Figs S14 and S18). In contrast, eukarya hMAT1A shows specificity for the ATP substrate (KM ATP hMAT1A 0.47 mm (Fig. 4A and Figs S17 and S18). Unfortunately, the bacterial and eukaryotic ancestral MAT sequences could not be expressed in the soluble and insoluble fractions. Therefore, this could not be further studied. Furthermore, in the archaeal phylogenetic tree, we detected a change in specificity, starting from a more specific ancestor toward a promiscuous extant enzyme. We observed that the specificity for ATP vs GTP goes from a 10-fold difference in the ancestor ArchaeaAnc to no difference in MjMAT, which indicates that the ancestor was specific toward purine bases, but through the course of evolution, modern-day archaeal MAT became promiscuous toward different bases. Therefore, these data hint at the hypothesis that all the modern-day MAT enzymes from the three domains shared a common ancestor specific for the nucleobase and that archaeal MATs diverged in sequence, structure, and substrate specificity.
Additional factors that influence the evolutionary trajectory of MAT
In this study, we have shown that various factors could have contributed to the unusual evolutionary trajectory of the MAT enzyme across the three domains of life. However, some additional aspects could have played a significant role in shaping the evolutionary trajectory of the MAT enzyme in archaea, such as codon usage bias, tRNA bias, or an alternative/distinct SAM metabolism. It has been shown that in general, the archaeal genomes display a higher GC-rich tendency in contrast to bacterial and eukaryotic genomes. This tendency has a direct influence on codon usage. Codon usage bias also has a direct association with tRNA abundance for protein translation optimization [[49, 50]]. This includes avoiding slowly translated codons and utilizing codons with the most cognate abundant tRNAs from the genome. Additionally, in highly expressed genes, favored codons are easily recognized by abundant tRNA molecules [[51-53]]. This bias could be further augmented by the time and speed of gene expression, as it could also play a critical role in the presence of more abundant and less diverse tRNAs as well []. Another possible factor that might have influenced the divergence of archaeal MAT is the altered SAM metabolism. Although we acknowledge the presence of a different methyl donor in archaea, tetrahydromethanopterin [], no clear connection to the SAM metabolism was found to support the hypothesis of the alternative MAT evolutionary trajectory. Additionally, since their divergence from LUCA, the selection and adaptive forces operating on bacterial/eukaryotic and archaeal clades may have differed substantially due to a variety of factors, including different environmental and metabolic constraints, differences in the regulation of activity, changes in the oligomerization state, and intracellular turnover. It is well known that bacterial MAT is strictly dependent on GroEL [] and this dependency might play a role as an adaptive force; however, no information is available for the archaeal MAT. Therefore, we anticipate that these additional factors could also play a key role in guiding the evolution of highly diverged archaeal MAT, as well as other enzymes.
In brief, we expected the large interface region from MAT to be well conserved across the three domains of life, primarily because of the following reasons: (a) It accommodates the active site residues; (b) facilitates the same catalytic reaction, and (c) is composed of a large flat hydrophobic region composed of β-sheets, which are known to evolve more slowly than the α-helical regions localized at the surface in the case of MAT [[56, 57]]. However, our results clearly show that the active site interface is surprisingly divergent in the case of archaeal MAT, in contrast to bacterial/eukaryotic MAT. The present work highlights how these unique MAT features make this enzyme family an ideal system for studying one exceptional case where the canonical domain relationships are disobeyed.
Model data are available in the PMDB database under the accession numbers PM0083365, PM0083366, and PM0083367
ATP, GTP, CTP, uridine triphosphate (UTP), methionine, SAM, HEPES, MgCl2, KCl, IPTG, Tris/HCl, Na2HPO4, NaH2PO4, potassium phosphate, NaCl, imidazole, β-mercaptoethanol, DTT, kanamycin, glycerol, NaOH, HCl, ammonium acetate, Bacto agar, Bacto tryptone, and Bacto yeast extract all other chemicals, and HPLC grade solvents were purchased from commercial sources and used as supplied unless otherwise mentioned. Page Ruler prestained protein ladder, 10–180 kDa was purchased from Thermo Fisher Scientific (Waltham, MA, USA). BL21 (DE3) competent cells were bought from New England Biolabs (NEB, Ipswich, MA, USA). Benzonase and cOmplete His-Tag Purification Resin (NiNTA) were purchased from Sigma-Aldrich (St. Louis, MO, USA). Protein inhibitor cocktail (PIC) and lysozyme were purchased from Nacalai Tesque, Inc (Kyoto, Japan). 12% Mini-PROTEAN TGX Precast Protein Gels, 12-well, were purchased from Bio-Rad (Hercules, CA, USA). Amicon centrifugal filters were purchased from Merck. In-Fusion HD Cloning Kit was purchased from Takara Bio (Kusatsu, Japan). All experiments were performed using ultrapure water purification system from a Milli-Q Integral MT10 type 1 (Millipore, Burlington, MA, USA).
Sequence data collection
Methionine adenosyltransferase sequences from M. jannaschii (UniProt: Q58868, METE_METJA), Escherichia coli (UniProt: Q58605, METK_METJA), and MAT1A from Homo sapiens (UniProt: Q00266, METK1_HUMAN) were used to conduct NCBI BLAST [] searches across the nonredundant database. Furthermore, the database search conditions were filtered based on the NCBI recommended e-value cutoff, that is, 1e-5, and only search results with query coverage > 90% and sequence identity > 55% were considered in constructing the initial sequence dataset. The collected sequences were then subjected to reciprocal BLAST searches to confirm orthologous relationship(s), as it is a common computational method for predicting putative orthologs. METK sequences for archaea from the phylogenetic group asgard (superphylum) were not considered in this study owing to the lack of taxonomic classifications associated with the sequences that were reported as hits in the NCBI BLAST searches.
The CD-HIT program [] was utilized to reduce sequence redundancy, by clustering at an 80% sequence identity cutoff threshold, with default settings. The Gblocks program [] was implemented to identify highly conserved sites across sequence alignments, with assistance from the secondary structure information (from experimentally solved crystal structures) to guide the alignments. Sequence alignment was carried out using the MAFFT computer program [], followed by a manual check to identify any errors []. To reduce the sequence redundancy, we used CD-HIT, by clustering at 80% sequence identity cutoff threshold with default settings. The Gblocks program [] was used to identify highly conserved sites across sequence alignments with assistance from the secondary structure information (from experimentally solved crystal structures) to guide the alignments. Sequence alignment was carried out using the MAFFT [], followed by manual curation to check for any errors. Sequence alignments for WebLogo of active site residues and interface residues are provided in Appendices S1–S3.
Sequence similarity networks were constructed based on the output obtained from the EFI-Enzyme Similarity Tool (EST) server [], and subsequently, the cytoscape program (https://cytoscape.org/) was utilized to explore the SSNs via the organic layout. Here, we managed the initial dataset using CD-HIT-based clustering by selecting a sequence identity threshold of 80%. The ‘organic’ layout algorithm was chosen for graphical clustering.
Phylogenetic analysis and ancestral sequence reconstruction
ModelTest computer program [] was implemented to select the best evolutionary model for sequence alignments based on the Bayesian inference criterion. The LG model [] with invariant sites (+I) for discrete gamma categories (+G4) was selected to construct the maximum-likelihood (ML) phylogenetic trees. Initially, we constructed ML-based trees with the IQ-TREE program.
Phylogenetic analysis and ancestral sequence reconstruction
ModelTest [] was used to pick out the best evolutionary model for sequence alignments based on the Bayesian inference criterion. The LG model [] with invariant sites (+I) for discrete gamma categories (+G4) was selected to construct maximum likelihood (ML) phylogenetic trees. Initially, we constructed ML-based trees with the IQ-TREE program [] (using 10,000 bootstrap replicates) for the datasets, to inspect the topology distribution within the dataset. The confidence level for the nodes was assessed using Felsenstein's bootstrap method [], and the consensus tree was redrawn using FigTree (http://tree.bio.ed.ac.uk/software/figtree/). The tree file from the IQ-TREE program output was parsed to obtain the ancestral sequences at the highlighted nodes. In addition, the phylogenetic program MEGA X was consulted to check for ancestral sequences []. The NCBI taxonomy database was also consulted to check for major phylogenetic classifications [].
Molecular modeling and structural studies
Ancestral sequences (of interest) were modeled using the Swiss model at the ExPASy server (https://swissmodel.expasy.org/). 3D models were visualized using pymol (https://pymol.org/2), to identify, tabulate, and study the key interaction residues. The overall quality of the 3D models depends directly on the shared sequence identity between the target sequence and the sequence of the template structure; therefore, we utilized template structures with high sequence similarity. Interface residues were identified using pymol and the PISA server (https://www.ebi.ac.uk/pdbe/pisa/). Structural superposition studies were performed using the mtm-align server []. The 17 sequences for Fig. 6, Figs S4 and S5 were collected using blastp and the PDB database for sequence search. Escherichia coli (PDB 1RG9), human (PDB 4NDN), and pfMAT (PDB 6S83) sequences were used as input sequences for each search, and one representative sequence for each organism was manually selected. Two isoform structures were selected for human MAT, hMAT1A (6SW5), and hMAT2A (4NDN).
Charge and hydrophobicity calculations
Per-site hydrophobicity and hydrophilicity calculations were carried out using the ProtScale program [] from the Expasy server (https://web.expasy.org). We implemented the default ‘Kyte–Doolittle’ scale [] for the calculations, with a residue window size of three, while the rest of the settings were left as default. Charge distribution per-site was calculated using the charge server from EMBOSS (http://www.bioinformatics.nl/cgi-bin/emboss/charge). A window length of size 3 was implemented, and the rest of the settings were left as default.
Evolutionary rate calculations
The site-specific evolutionary rates were inferred using the IQ-TREE program, by implementing the estimated model parameters and applying an empirical Bayesian approach to assign site rates as mean over rate categories.
Hierarchical clustering with pvclust
An r language package was utilized to assess the hierarchical clustering pattern, as it provides statistical support for each cluster through P-values. It provides two types of P-values: AU P-value and BP value. The AU P-value, which is computed using multiscale bootstrap resampling, is a better approximation to the unbiased P-value than the BP value computed using normal bootstrap resampling.
Structural superpositions were carried out employing the PDBeFold [] server with default parameters, employing pdbs chain A of 1MXA, 2P02, and 4L4Q. Subdomain comparison was assessed following ECOD domain delimitation and manual dissection of archaeal MAT (Fig. S8).
CrenArc, EuryArc, ArcheaArc MAT, and the hMAT1A codon-optimized gene for E. coli were ordered from Twist Bioscience, San Francisco, and cloned into pet28a vector by infusion cloning using In-Fusion HD Cloning Kit by following the kit protocol. Primers used for cloning are listed in Table 1.
|Name||Froward primer||Reverse primer|
Protein expression and purification
CrenArc, EuryArc, and ArcheaArc MAT were expressed and purified following the previously published protocol []. Briefly, the plasmid encoding the genes for CrenArc, EuryArc, and ArcheaArc MAT in E. coli BL21(DE3) cells was growing at 37 °C in LB medium with 50 µg·mL−1 kanamycin. Once OD600 reached 0.6, culture was cooled down to 16 °C and induction was done with 1 mm IPTG overnight at 16 °C with shaking at 230 r.p.m. Cells were harvested by centrifugation at 4648 g for 20 min and stored at −80 °C until use. The bacterial pellet was resuspended in lysis buffer that contains Tris/HCl (50 mm), NaCl (300 mm), imidazole (10 mm), lysozyme (0.3 mg·mL−1), benzonase (2 U), and PIC, pH 8. The cells were incubated with lysis buffer for 30 min to 1 h, followed by sonication (Digital Sonifier Models 250, 5 s, on 5 s off for 5 min) on ice. Cell debris was removed by centrifugation at 20,960 g for 1 h. The supernatant was loaded onto the NiNTA chelating column, which was equilibrated with lysis buffer previously. The column was washed with wash buffer 1 that contained Tris/HCl (50 mm), NaCl (300 mm), and imidazole (50 mm), pH 8. The second wash with wash buffer 2 contained Tris/HCl (50 mm), NaCl (300 mm), and imidazole (100 mm), pH 8. Finally, the column was eluted with elution buffer containing Tris/HCl (50 mm), NaCl (300 mm), imidazole (500 mm), and glycerol (10%), pH 8. The protein content was checked by the NanoDrop 2000 spectrophotometer (Thermo Scientific), and elution buffer containing proteins was dialyzed using Amicon centrifugal filters 30,000 MWCO with dialysis buffer containing Tris/HCl (50 mm), DTT (1 mm), and glycerol (10%), pH 8, and stored at −80 °C.
hMAT1A pellet was processed like archaea but used lysis buffer containing 20 mm NaH2PO4, 300 mm NaCl, 10 mm imidazole, 1 mg·mL−1 lysozyme, 1 µL benzonase, and PIC, pH 7.8, and incubated with lysis buffer for 30 min to 1 h. Cells were lysed by sonication (Digital Sonifier Models 250, 5 s on 5 s off for 5 min) on ice. After centrifugation at 12,000 r.p.m. for 1 h, the supernatant was loaded onto the NiNTA column. First, the column was washed with wash buffer 1 containing NaH2PO4 (50 mm), NaCl (300 mm), and imidazole (50 mm), pH 8, then washed with wash buffer 2 containing NaH2PO4 (50 mm), NaCl (300 mm), and imidazole (100 mm), pH 8. Finally, the column was eluted with elution buffer containing NaH2PO4 (50 mm), NaCl (300 mm), imidazole (500 mm), and glycerol (10%), pH 8. Fractions containing protein were concentrated with exchange buffer Tris/HCl (25 mm) and KCl (80 mm), pH 8, using Amicon centrifugal filters 30,000 MWCO. The purity of the protein was confirmed by SDS/PAGE. Protein concentration was 2.5, 6.2, 4.4, and 14 mg·mL−1 for CrenArc, EuryArc, ArcheaArc, and hMAT1A, respectively.
MAT activity assay
Activity assay was performed as previously reported []. Briefly, ATP/GTP/CTP/UTP (5 mm), methionine (10 mm), HEPES (100 mm), MgCl2 (10 mm), KCl (50 mm), and MAT (20 µm) were mixed in water, following which the pH of the mixture was adjusted to 8. Reactions were incubated at 37 °C for eukarya and at 55 °C for archaea in a thermomixer comfort system (Eppendorf, Hamburg Germany) for 1 h. The reaction was quenched with acetonitrile, followed by centrifugation at 12,000 r.p.m. for 5 min to precipitate the enzymes. Finally, the supernatant was filtered through a 0.22-µm filter (Merck) and injected into the ultra-performance liquid chromatography (UPLC) for analysis. UPLC method (UPLC Acquity H class, Waters, MA USA): Diluted reaction aliquots were analyzed using UPLC with a HILIC column (SeQuant® ZIC®-cHILIC 3 µm, 100 Å 150 × 2.1 mm PEEK-coated HPLC column). An isocratic method was used with 35% solvent A (100 mm ammonium acetate, pH 5.3) and 65% solvent B (acetonitrile) for 15 min. Each injection had a volume of 3 µL, with a flow rate of 0.2 mL·min−1, and was detected at a wavelength of 260 nm.
For kinetic analysis, SGM was purified using the UPLC method mentioned above and standard curves were plotted for SGM and SAM. For kinetic assay of ArchaeaAnc (0.5 µm) and hMAT1A (0.5 µm), concentrations of the ATP and GTP were in the range of 0.1 to 2 mm and a saturating concentration of methionine 10 mm was used. Using the above-mentioned buffer, temperature was 55 °C for ArchaeaAnc and 37 °C for hMAT1A. Data were analyzed by UPLC, and the kinetic parameters were determined using the Michaelis–Menten equation using the graphpad prism 7.02 (San Diego, CA, USA; https://www.graphpad.com).
We gratefully acknowledge financial support from the Okinawa Institute of Science and Technology. We also thank Benjamin Clifton, Stefano Pascarelli, and Mark S. Johnson for critically reading the manuscript and providing feedback that greatly helped in improving its quality.
Conflict of interest
The authors declare no conflict of interest.
The data generated/used in this study are available in the article and online supplementary materials. In addition, any other data underlying this article will be readily shared upon request from the corresponding author.
BPSC and PL conceived the idea of this study. BPSC, MG, and DM performed the experiments. BPSC, MG, ST-P, and PL analyzed the data. BPSC, MG, STP, and PL wrote the manuscript and prepared the figures.
|feb413312-sup-0001-Supinfo.pdfPDF document, 2.6 MB||
Scheme S1. S-adenosylmethionine (SAM) biosynthesis and its functions by secondary metabolites.
Fig. S1. Sequence similarity networks for MAT enzymes.
Fig. S2. The evolutionary trends adapted by the interface sites (located along the large interface of the homo-tetramer) as well the active sites.
Fig. S3. An interface view of three representative MAT structures and Weblogo of large interface residues from the three domains of life.
Fig. S4. Structural alignment of the residues depicting the 51 interface residues from 17 MAT crystal structures.
Fig. S5. Structure-based sequence alignment.
Fig. S6. Clustering analysis of hydropathy (A) and charge (B) of representative structures from the three domains of life.
Fig. S7. Structural comparison of MAT representatives from the three domains of life.
Fig. S8. Comparison of MAT subdomains within domains of life.
Fig. S9. Structural comparison of MAT subdomains across the domains of life.
Fig. S10. SDS/PAGE of ArchaeaAnc MAT protein purification.
Fig. S11. SDS/PAGE of CrenAnc MAT protein purification.
Fig. S12. SDA/PAGE of EuryAnc MAT protein purification.
Fig. S13. SDS/PAGE of hMAT1A purification.
Fig. S14. UPLC chromatogram of the reaction between NTP, methionine, and ArchaeaAnc.
Fig. S15. UPLC chromatogram of the reaction between NTP, methionine, and CrenAnc.
Fig. S16. UPLC chromatogram of the reaction between NTP, methionine, and EuryAnc.
Fig. S17. UPLC chromatogram of the reaction between NTP, methionine, and hMAT1A.
Fig. S18. Kinetic parameters for the SAM and SGM formation by ArchaeaAnc and hMAT1A.
|feb413312-sup-0002-AppendixS1.docxWord 2007 document , 112.6 KB||Appendix S1. Bacteria.|
|feb413312-sup-0003-AppendixS2.docxWord 2007 document , 42.5 KB||Appendix S2. Archaea.|
|feb413312-sup-0004-AppendixS3.docxWord 2007 document , 21.9 KB||Appendix S3. Eukarya.|
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
- 1. A formal test of the theory of universal common ancestry. Nature. 2010; 465: 219–U104.
- 2, . Divergence and convergence in enzyme evolution: parallel evolution of paraoxonases from quorum-quenching lactonases. J Biol Chem. 2012; 287: 11–20.
- 3. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol. 2003; 1: 127–36.
- 4, , , . A minimal estimate for the gene content of the last universal common ancestor–exobiology from a terrestrial perspective. Res Microbiol. 2006; 157: 57–68.
- 5. Trees and networks before and after Darwin. Biol Direct. 2009; 4, 43: discussion 43.
- 6, . Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution. Proc Natl Acad Sci USA. 2007; 104: 870–5.
- 7, , . Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA. 2005; 102: 14332–7.
- 8, . Reconstructing reticulate evolutionary histories of plants. Trends Plant Sci. 2006; 11: 398–404.
- 9, , . Classification of AAA+ proteins. J Struct Biol. 2006; 156: 2–11.
- 10, , . Prokaryotic ancestry of eukaryotic protein networks mediating innate immunity and apoptosis. J Mol Biol. 2014; 426: 1568–82.
- 11, , , , , , et al. A CTP-dependent archaeal riboflavin kinase forms a bridge in the evolution of cradle-loop barrels. Structure. 2007; 15: 1577–90.
- 12, . Bacterial genes outnumber Archaeal genes in eukaryotic genomes. Genome Biol Evol. 2020; 12: 282–92.
- 13, , , , . Identification of a highly diverged class of S-adenosylmethionine synthetases in the Archaea. J Biol Chem. 2000; 275: 4055–9.
- 14. Activation of methionine for transmethylation. J Biol Chem. 1951; 189: 745–54.
- 15, , , , , , et al. The physiology and habitat of the last universal common ancestor. Nat Microbiol. 2016; 1:16116.
- 16, , . Mechanism of radical initiation in the radical S-adenosyl-l-methionine superfamily. Acc Chem Res. 2018; 51: 2611–9.
- 17, , , , , , et al. Atlas of the radical SAM superfamily: divergent evolution of function using a "plug and play" domain. Methods Enzymol. 2018; 606: 1–71.
- 18, , , . Radical S-adenosylmethionine enzymes. Chem Rev. 2014; 114: 4229–317.
- 19, , , , . S-adenosylmethionine: a metabolite critical to the regulation of autophagy. Cell Prolif. 2020; 53:e12891.
- 20, , , , , , et al. S-Adenosylmethionine and methylation. FASEB J. 1996; 10: 471–80.
- 21. DNA methylation and human disease. Nat Rev Genet. 2005; 6: 597–610.
- 22. Functions of polyamines in mammals. J Biol Chem. 2016; 291: 14904–12.
- 23. Regulation of glutathione synthesis. Mol Aspects Med. 2009; 30: 42–59.
- 24, , , , , , et al. Radical SAM catalysis via an organometallic intermediate with an Fe-[5'-C]-deoxyadenosyl bond. Science. 2016; 352: 822–5.
- 25, . S-adenosylmethionine in liver health, injury, and cancer. Physiol Rev. 2012; 92: 1515–42.
- 26, , , , . S-Adenosylmethionine synthetase is required for cell growth, maintenance of G0 phase, and termination of quiescence in fission yeast. iScience. 2018; 5: 38–51.
- 27, , , . SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995; 247: 536–40.
- 28, . Structure-function relationships in methionine adenosyltransferases. Cell Mol Life Sci. 2009; 66: 636–48.
- 29. A review on protein oligomerization process. Int J Precis Eng Manuf. 2015; 16: 2731–60.
- 30. Activating oligomerization as intermediate level of signal transduction: analysis of protein-protein contacts and active sites in several glycolytic enzymes. Front Biosci. 1999; 4: D557–70.
- 31, , , . The accessible surface area and stability of oligomeric proteins. Nature. 1987; 328: 834–6.
- 32, , , , , , et al. The interdimeric interface controls function and stability of Ureaplasma urealiticum methionine S-adenosyltransferase. J Mol Biol. 2019; 431: 4796–816.
- 33, , , , , . Structures of catalytic cycle intermediates of the Pyrococcus furiosus methionine adenosyltransferase demonstrate negative cooperativity in the archaeal orthologues. J Struct Biol. 2020; 210:107462.
- 34, , , . Redesigning the substrate specificity of an enzyme by cumulative effects of the mutations of non-active site residues. J Biol Chem. 1999; 274: 2344–9.
- 35, , , . How enzymes adapt: lessons from directed evolution. Trends Biochem Sci. 2001; 26: 100–6.
- 36, , , , , . The 'evolvability' of promiscuous protein functions. Nat Genet. 2005; 37: 73–6.
- 37, . Improving enzyme properties: when are closer mutations better? Trends Biotechnol. 2005; 23: 231–7.
- 38, , , . Mimicking natural evolution in metallo-beta-lactamases through second-shell ligand mutations. Proc Natl Acad Sci USA. 2005; 102: 13761–6.
- 39, , , , , , et al. Increased oligomerization and phosphorylation of alpha-synuclein are associated with decreased activity of glucocerebrosidase and protein phosphatase 2A in aging monkey brains. Neurobiol Aging. 2015; 36: 2649–59.
- 40, , , , . Status of the Archaeal and bacterial census: an update. MBio. 2016; 7: e00201-16.
- 41, . CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics. 2004; 20: 3702–4.
- 42, , , , . Crystal structure of the S-adenosylmethionine synthetase ternary complex: a novel catalytic mechanism of S-adenosylmethionine synthesis from ATP and Met. Biochemistry-Us. 2004; 43: 1821–31.
- 43, , . Structural basis of the dominant inheritance of hypermethioninemia associated with the Arg264His mutation in the MAT1A gene. Acta Crystallogr D. 2020; 76: 594–607.
- 44, . Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006; 22: 1540–2.
- 45, , , , . PDBsum: Structural summaries of PDB entries. Protein Science. 2018; 27: 129–34. https://doi.org/10.1002/pro.3289
- 46, , , , , , et al. ECOD: an evolutionary classification of protein domains. Plos Comput Biol. 2014; 10:e1003926.
- 47, , , . Manual classification strategies in the ECOD database. Proteins. 2015; 83: 1238–51.
- 48, . Enzymatic properties of S-adenosylmethionine synthetase from the archaeon Methanococcus jannaschii. J Biol Chem. 2002; 277: 16624–31.
- 49, . Codon preferences in free-living microorganisms. Microbiol Rev. 1990; 54: 198–210.
- 50, . The action of selection on codon bias in the human genome is related to frequency, complexity, and chronology of amino acids. BMC Genom. 2006; 7: 67.
- 51, , . The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. Gene. 1984; 30: 157–66.
- 52, , . Nitrogen-fixing aerobic bacteria have higher genomic GC content than non-fixing species within the same genus. Hereditas. 1998; 128: 173–8.
- 53, , . Synonymous codon usage bias in the rice cultivar 93–11 (Oryza sativa L. ssp. indica). Yi Chuan Xue Bao. 2003; 30: 335–40.
- 54. Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 2004; 14: 2279–86.
- 55, , , , , , et al. Proteome-wide analysis of chaperonin-dependent protein folding in Escherichia coli. Cell. 2005; 122: 209–20.
- 56, . A systematic search for positive selection in higher plants (Embryophytes). BMC Plant Biol. 2006; 6: 12.
- 57, , . Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be. Bioinformatics. 2010; 26: 625–31.
- 58, , , , . Basic local alignment search tool. J Mol Biol. 1990; 215: 403–10.
- 59, , , , . CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010; 26: 680–2.
- 60, . Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007; 56: 564–77.
- 61, , , . MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002; 30: 3059–66.
- 62, , , , , , et al. Enzyme function initiative-enzyme similarity tool (EFI-EST): a web tool for generating protein sequence similarity networks. Biochim Biophys Acta. 2015; 1854: 1019–37.
- 63. ModelTest server: a web-based tool for the statistical selection of models of nucleotide substitution online. Nucleic Acids Res. 2006; 34: W700–3.
- 64, . An improved general amino acid replacement matrix. Mol Biol Evol. 2008; 25: 1307–20.
- 65, , , . IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015; 32: 268–74.
- 66. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985; 39: 783–91.
- 67, , , , . MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018; 35: 1547–9.
- 68. The NCBI taxonomy database. Nucleic Acids Res. 2012; 40: D136–43.
- 69, , , , . mTM-align: a server for fast protein structure database search and multiple protein structure alignment. Nucleic Acids Res. 2018; 46: W380–6.
- 70, , , , , , et al. Protein identification and analysis tools in the ExPASy server. Methods Mol Biol. 1999; 112: 531–52.
- 71, . A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982; 157: 105–32.
- 72, . Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D. 2004; 60: 2256–68.
- 73, , , , , , et al. Substrate dynamics contribute to enzymatic specificity in human and bacterial methionine adenosyltransferases. bioRxiv. 2021. [PREPRINT].