Caenorhabditis elegans phosphatase complexes in UniProtKB and Complex Portal

Phosphatases play an essential role in the regulation of protein phosphorylation. Less abundant than kinases, many phosphatases are components of one or more macromolecular complexes with different substrate speciﬁcities and speciﬁc functionalities. The expert scientiﬁc curation of phosphatase complexes for the UniProt and Complex Portal databases supports the whole scientiﬁc community by collating and organising small-and large-scale experimental data from the scientiﬁc literature into context-speciﬁc central resources, where the data can be freely accessed and used to further academic and translational research. In this review, we discuss how the diverse biological functions of phosphatase complexes are presented in UniProt and the Complex Portal, and how understanding the biological signiﬁcance of phosphatase complexes in Caenorhabditis elegans


Introduction
The spatial and temporal phosphorylation of proteins and lipids by kinases is tightly regulated by their subsequent dephosphorylation by phosphatases and is necessary for the progression and/or termination of a multitude of signalling pathways. Phosphatases are dynamic hydrolases that catalyse the removal of phosphate groups from specific amino acid residues or lipid moieties in target protein or lipid substrates, respectively. Phosphatase activity is associated with various catalytic mechanisms and different protein folds which have been used for their classification [1,2].
Despite their relevance in signalling pathways and biological processes, the number of phosphatase genes is significantly less abundant than kinases in most genomes, and constitute only between 0.5% and 1.5% of proteome content across a broad taxonomic range [3]. Phosphatases therefore employ various mechanisms to act on a similar range of targets as kinases. They can act as single units displaying a broad spectrum of substrates, such as the mitogen-activated protein kinase phosphatases (MKPs) which dephosphorylate mitogenactivated protein kinases ERK1/2, p38 and JNK, to terminate MAP kinase-mediated signalling cascades [4]. They can also stably associate with other structural proteins and form holoenzymes, such as PPP2CA with the constant subunit PPP2R1A, which in turn can associate with various regulatory subunits to confer substrate specificity and regulate catalytic activity of the phosphatase [5,6].
Despite their reduced gene number in comparison to kinases, phosphatases play vital roles in a variety of biological processes. In humans, disruption in the balance between kinase and phosphatase activities results in diseases such as diabetes, a range of cancers and neurodegenerative diseases such as Alzheimer's [7]. Drugs based on manipulating kinase activity have proved invaluable in the treatment of some of these diseases [8]. As modulators of phosphorylation, phosphatases are also appropriate candidates for drug development [1,9,10]. Indeed, manipulating or inhibiting phosphatase activity is a growing field, with many phosphatase inhibitors currently being considered as potential drug targets, but to date only few are available as licensed drugs [11]. For example, cyclosporine A and FK506, two inhibitors of protein serine/threonine phosphatase 2B (also known as calcineurin), have been used successfully as immunosuppressants for the treatment of patients following organ transplantation [12].
To support the progression of translational research and drug development, biological databases collate and organise data from 3D structure analyses and large-scale proteomics studies as well as from smallerscale initiatives presented in the scientific literature. This facilitates the identification of specific phosphatases, and thus, such databases constitute invaluable research tools. Whilst some databases such as Phosphatome.net (phosphatome.net/3.0) focus specifically on phosphatases, databases such as UniProt (www.uniprot.org) provide a more general, but comprehensive overview on all proteins in a wide variety of taxa [13]. Other resources specialise in particular aspects of protein biology. One such example is the Complex Portal (www.ebi.ac.uk/complexportal) [14], which focuses on the arrangement of proteins in macromolecular complexes. In the phosphatase field, the collation of such data is of great relevance as a large number of human phosphatases are components of one or more macromolecular complexes with specific biological functions [1,15,16]. Specifically, UniProt and the Complex Portal are two resources that provide comprehensive functional and structural data on complexes of a large variety of species in different, but complementary ways. UniProt provides functional and structural data and sequences of proteins on a proteinby-protein basis for over 1 million species (UniProt release 2019_09). The Complex Portal also provides functional and structural data on proteins from a broad taxonomic range (including human and key model organisms such as Escherichia coli, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans), but within the context of macromolecular complexes.
In this review, we provide an analysis of the C. elegans phosphatome and phosphatase-containing complexes based on their curation and presentation in UniProt and the Complex Portal. We highlight the challenges in gathering data from the scientific literature and inferring biological function from orthologs, and discuss how the two databases display this data. Ultimately, we demonstrate how understanding the biological significance of phosphatase complexes in a model organism such as C. elegans allows for insight into how phosphatases are able to target such a diverse range of substrates in a variety of biological processes. This is particularly necessary for the identification of potential drug targets for therapies and treatments for a range of human diseases.

C. elegans phosphatome and complexes
The study of phosphatase evolution across species underwent an important step forward with the recent compilation of the phosphatomes for human and several model organisms including C. elegans [3]. However, whilst this study provides a list of phosphatases and a classification based on the fold of their catalytic domains, it does not provide information about the extent of their functions or their interactions. Phosphatase-specific databases such as DEPOD (www.de pod.bioss.uni-freiburg.de/) provide information about interacting partners together with their substrates and biological functions; however, their focus is the characterisation of human phosphatases [17]. To help fill the gap this focus creates and provide contextual understanding of the dynamics of these proteins in C. elegans, we comprehensively updated the C. elegans phosphatome in the UniProt database with functional data from the scientific literature (for a description of the curation process, see [18]). We chose to focus on this nematode species as it has been an invaluable tool in understanding protein biology [19]. More than 50% of human phosphatases have a counterpart in C. elegans and many are involved in the same biological processes [20]. Moreover, the relative ease in manipulating the C. elegans genome, combined with its fast life cycle, make it an ideal model to complement human and mouse studies and allows for the investigation of protein function in the context of a whole organism [19].
The first step in the curation of the C. elegans phosphatome was to create a list of phosphatases and their corresponding UniProtKB accessions. As a starting point, we used the gene list compiled by Chen et al. for C. elegans [3]. Of the 244 genes in this list, six genes corresponded to pseudogenes in the nematode database Wormbase [21] and were discarded, and two genes had the same sequence and thus were considered as one. Secondly, to identify additional phosphatases, we also took advantage of the cross references to the protein family database Pfam [22]. These cross references are automatically added to UniProtKB entries. Using Pfam signatures for phosphatase families (e.g. PF00102 Y_phosphatase), we could not find any other phosphatases besides the one identified by Chen et al. Our final list contains 237 phosphatase genes belonging to the CC1 (Cys-based class I fold), CC2 (Cysbased class II fold), CC3 (Cys-based class III fold), PPM (metal-dependent protein phosphatase fold), PPPL (protein phosphatase-like fold), HAD (haloacid dehalogenase-like fold), HP (histidine acid phosphatase fold), PHP (protein histidine phosphatase fold), RTR1 (regulator of transcription fold) and AP (alkaline phosphatase fold) superfamilies (Table S1). As the focus of this study is protein phosphatases, we did not include members of the inositol-4-phosphate, inositol-5-phosphate, glucose/lipid phosphatase and adenosine/ fructose/inositol-x-phosphatases superfamilies. We used experimentally validated information extracted from the scientific literature to update the annotation of over 25 phosphatases of the 237 UniProt entries. In total, there are now 71 phosphatases (representing 30% of the C. elegans phosphatome) whose functions have been experimentally characterised (known as reviewed entries in UniProt) (Fig. 1A). This effort provides an up-to-date view of how much is currently known about the C. elegans phosphatome. Overall, the level of experimental characterisation is around 25-30% for the three major superfamilies CC1 (Cys-based class I), PPPL (protein phosphatase-like) and HP (histidine acid phosphatase) (Fig. 1B). Unsurprisingly, most of the well-characterised phosphatases have human orthologs (Table 1).
In parallel with this functional update effort, we also added experimental characterisation data for phosphatase complexes to both UniProt and the Complex Portal (see below for examples and an overview of the process). Among the 237 C. elegans phosphatases, 12 have been experimentally proven to be part of 16 stable complexes (Table 2). Each phosphatase-containing complex has specific functional features. In particular, among the 12 characterised phosphatases, MTM-9 (UniProtKB Q965W9) and the C. elegansspecific EGG-3 (UniProtKB Q20402), EGG-4 (Uni-ProtKB O01767) and EGG-5 (UniProtKB O61789) are pseudophosphatases which lack catalytic activity [23,24]. Interestingly, whilst the majority of phosphatase complexes usually contain one active phosphatase or a combination of an active and an inactive phosphatase, three of the inactive phosphatases, EGG-3, EGG-4 and EGG-5, are found in a single complex, the Egg-3/4/5/MBK-2 complex (Complex Portal: CPX-3381), which does not contain an active phosphatase [24]. All characterised active phosphatases are Ser/Thr phosphatases, except for MTM-6, which is a phosphoinositide phosphatase [25]. The C. elegans phosphatase complexes are remarkably well conserved. To date, apart from the Egg-3/4/5/MBK-2 complex, which appears to be specific to C. elegans, all the other experimentally characterised phosphatase complexes are present in a wide variety of species including mammals, fruit fly and Saccharomyces cerevisiae. We describe below how UniProt and the Complex Portal present phosphatase complex data and use specific examples to highlight the broad data types.

UniProt and the Complex Portal databases
The Complex Portal and UniProt present complementary data on complexes to provide biologically relevant functional information. Both databases define macromolecular complexes as stable sets of two or more interacting protein molecules that can be copurified and that exist as functional units in vivo. These complexes may also contain nonprotein molecules such as small molecules and nucleic acids. Interactions between enzymes and their substrates are usually not annotated as complexes as they are often transient. However, if the substrate remains stably associated with the enzyme and their association is essential for the function, then they are annotated as complexes.
Identifying complexes and their components is experimentally challenging. To take this into account, UniProt and the Complex Portal curators use stringent criteria to determine what constitutes a complex. Briefly, copurification by sequential column chromatography, reciprocal co-immunoprecipitation and/ or GST pull-downs, and biophysical techniques such as cocrystallisation, electron microscopy or fluorescence resonance energy transfer methods associated with functional assays are suitable evidence for the interaction. However, large-scale pull-downs followed by mass spectrometry or yeast two-hybrid assays are not considered unless the interactions are further confirmed with additional experiments. Also, we do not annotate complexes if they are only inferred from genetic interactions or are only detectable when amino acid residues are mutated to 'stabilise' the association. Nonetheless, if there is firm evidence in another species that the complex does exist, we do annotate the complex using orthology-based evidence (see below). UniProt presents complex data on a protein-by-protein basis in the 'Interaction' section of each protein entry participating in the complex ( Fig. 2A). If known, binding regions for each protein are provided in the subunit comment and/or in the sequence feature table and the ProtVista feature viewer (Fig. 2B,C) [26]. Proteins that are integral components of a macromolecular complex will have a supporting cross reference that provides a direct link to the relevant entry page in the Complex Portal (Fig. 3A). The Complex Portal presents functional and structural data on macromolecular complexes as a whole, which can be identified by a complex-specific accession number (Fig. 3A,E). Specific proteins within the complexes can be identified by UniProt accession numbers with a link to the corresponding UniProt records (Fig. 3D) [14]. The Complex Portal provides an interactive graphical view of the complex which includes not only protein participants, but also any small molecules, such as metal ions, that are an integral part of the protein function. For each key protein, domains, features and metal binding sites can be viewed (Fig. 3B) and where known, the specific amino acids involved in the interactions are specified (Fig. 3C).
There are multiple challenges associated with the annotation of macromolecular complexes in both databases, for example, (a) how we assign complex names, taking into account functional conservation among species, (b) how we annotate proteins that belong to multiple complexes and (c) how information about complex formation, stability and regulation is included. To illustrate how UniProt and Complex Portal curators address these challenges, we chose six phosphatases among the 12 phosphatases that have been experimentally shown to be part of complexes as they are well characterised. These are the three phosphatases LET-92 (UniProtKB G5EGK8), TAX-6 (UniProtKB Q0G819) and FEM-2 (UniProtKB P49594), and the three pseudophosphatases EGG-3, EGG-4 and EGG-5.

LET-92: one phosphatase for 8 complexes
Reversible phosphorylation is essential for a range of cellular processes. In C. elegans, there are 438 kinases and to date, over half of them have been experimentally characterised [18]. By contrast, there are only 237 phosphatases in the C. elegans phosphatome (Table S1). How do so few phosphatases counteract the phosphorylation activity of such a high number of kinases in diverse processes? In order to achieve functional diversity, one phosphatase may associate with different 'regulatory' proteins, which confer substrate and functional specificity. The participation of a phosphatase in multiple complexes raises several questions in terms of how databases can comprehensively and clearly present this.  An interesting example is the PP2A family (Fig. 4). These proteins are among the most well-characterised serine/threonine phosphatases. They play essential roles in many key and diverse biological processes such as cell division and apoptosis [27,28]. Several PP2Acontaining complexes have been experimentally characterised in mammals and functional conservation is apparent throughout the animal kingdom in species such as D. melanogaster and S. cerevisiae (Fig. 4A) [29][30][31]. In particular, a phylogenetic analysis of the catalytic subunit shows that PP2A is highly conserved across several species. In C. elegans, there is only one PP2A family member encoded by the let-92 gene (Fig. 4A). Based on protein sequence similarities, LET-92 most likely associates with the scaffolding protein PAA-1 (UniProtKB Q09543) to form the common heterodimeric core enzyme that is the basis of multiple PP2A complexes (Fig. 4B). The association of this core with regulatory proteins confers distinct functional properties. Experimental data demonstrates that this core is involved in processes including cell division, centriole duplication, microtubule outgrowth and vulval development in C. elegans [5,6,32].
Whilst the interaction between a PP2A catalytic subunit and a constant scaffolding protein has been experimentally demonstrated in mammals, to date, no experimental data confirms the physical association between C. elegans LET-92 and PAA-1 subunits (Fig. 4B). However, the existence of the LET-92-PAA-1 heterodimer has been inferred from genetic interaction studies, a common tool in C. elegans, whereby the analysis of animals lacking paa-1 and/or let-92 has shown that the constant and catalytic subunits share similar functions and appear to be part of a common signalling pathway [6]. Despite the lack of direct evidence for the existence of the LET-92-PAA-1 heterodimer, biochemical studies provide evidence for an interaction between the regulatory subunit RSA-1 (UniProtKB O02217) and the core proteins LET-92 and PAA-1 (Fig. 4B) [6], and between the regulatory subunit SUR-6 (UniProtKB G5EDR3) and LET-92 [5]. Other potential regulatory subunits have been identified based on sequence similarity (F43B10.1, F47B8.3, T22D1.5, Y71H2AM.20) or genetic interaction studies (pptr-1, pptr-2) (Fig. 4A) [33,34]. Based on this information, LET-92 appears to be the common component of at least 8 distinct complexes (Table 2).
How do UniProt and the Complex Portal represent these distinct complexes that share one or more components and how do they convey the type of evidence available for their existence? In both databases, the annotation of proteins and complexes is primarily based on experimental data. However, for proteins that are well conserved, as in the case of LET-92 and its interacting partners, functional data may be inferred from orthologs (Fig. 4A). In UniProt, in the 'Subunit structure' subsection of the 'Interaction' section of all the protein entries that are predicted or proven to be part of the complex, a general description of the complex is provided. As there are multiple regulatory subunits, and not all are experimentally confirmed, only the components of the common heterodimer core, LET-92 and PAA-1, are explicitly named (Fig. 4C). The interactions with specific subunits supported by experimental evidence are described separately. The 'Function' section of a UniProt protein entry provides more information about the individual role of each component in any complexes it may form. For example, during mitosis, the regulatory subunit RSA-1 recruits the heterodimeric core of LET-92 and PAA-1 to the centrosomes, and thereby regulates microtubule outgrowth from centrosomes and mitotic spindle assembly (Fig. 4D) [6]. By a different mechanism, the regulatory subunit SUR-6 associates with LET-92 to regulate centriole duplication during early embryonic cell divisions [5].
In UniProt, all the complexes that an individual protein is involved in are described together. In the Complex Portal, separate records are provided for each complex and so multiple complex entries may contain the same proteins, which is indeed the case for the LET-92-PAA-1 heterodimeric core. At the top of each Complex Portal entry page, the interactive graphical viewer shows the various components of a complex (Fig. 3B). When there is experimental evidence for the interaction between two components, a direct link is provided (Fig. 4B).
For LET-92-and PAA-1-containing complexes, there is no experimental confirmation for a direct physical interaction, and only genetic interaction studies are available. However, as the heterodimeric core is conserved between species, a direct connection between the catalytic and the constant subunits is inferred even though it is not supported by physical interaction data (Fig. 4B). Similarly, in six of the eight LET-92-containing complexes, there is no experimental evidence for the interaction between the various regulatory subunits and the heterodimeric core. However, as it is not always clear whether an orthologous complex with a functionally comparable regulatory subunit exists in other species, the subunit is included in the complex, but no binary interactions are presented between the subunit and core as without experimental data, it is not possible to predict which proteins in the core interact with the regulatory subunit. Complex names are assigned based on conservation between species. For the PP2A complexes, typically both the common catalytic subunit and the specific regulatory subunit are included in the name. For example, the LET-92/PAA-1/RSA-1 complex is called PP2A-RSA-1 phosphatase complex (Complex Portal: CPX-1361). However, for complexes with multiple components, a functionally specific name may be assigned such as in the case of RSA centrosome-targeting complex (Complex Portal: CPX-1357) (Fig. 4B). The 'Function' and 'Properties' sections provide more information about the biological role and features of the complex as a whole. In contrast to UniProt, individual protein functions are only indicated when integral to the function of the complex as a whole. In the 'Properties' section of all the PP2A complexes, the same description of the heterodimeric core enzyme is given, and in the 'Function' section, a description of the roles of the regulatory subunits in terms of the overall function of the complex is provided. Therefore, in the Complex Portal, the number of individual complex records highlights the diverse functions of the regulatory subunits in the PP2A-containing complexes.

TAX-6: a calcium-dependent complex
As the previous LET-92 example shows, the regulatory subunits in phosphatase complexes exert multiple roles, which allow the complexes to have an involvement in a wide range of processes. In addition to the diverse roles indicated above, regulatory subunits can also regulate the activation of their catalytic binding partner. For instance, in response to external signals, regulatory subunits may undergo post-translational modifications or, in some cases, facilitate the recruitment of other components to the complex, which ultimately leads to the activation of the phosphatase. One example of this is the calcineurin complex. The calcineurin-calmodulin complex consists of a catalytic subunit and a Ca 2+ -binding regulatory subunit. This heterodimeric complex is conserved throughout evolution. It is present in mammals and D. melanogaster as well as in S. cerevisiae [35][36][37]. Whilst in humans, the catalytic subunit is encoded by three genes (PPP3CA, PPP3CB and PPP3CC) and the regulatory subunit by two genes (PPP3R1 and PPP3R2), in C. elegans there is only one gene each for the catalytic (tax-6) subunit and the regulatory subunit (cnb-1; UniProtKB G5EDN6), which facilitates functional studies ( Table 3). The catalytic subunit, also known as calcineurin A (CNA), is a Ca 2+ -dependent, calmodulin-stimulated serine/threonine-protein phosphatase which plays an essential role in the transduction of intracellular Ca 2+ -mediated signals [38]. In mammals, CNA is essential for immunity and for synaptic transmission in the brain [39][40][41][42][43]. However, studies in C. elegans show that CNA may have more widespread functions. Indeed, TAX-6 regulates a wide range of biological processes including egg laying, fertility, growth, lifespan, movement, sensory behaviours, and a specific set of endocytic processes such as coelomocyte endocytosis and synaptic vesicle recycling [44][45][46][47][48][49][50][51]. Figure 5A shows the current model for the assembly and regulation of the calcineurin-calmodulin complex based on experimental evidence from structural and biochemical studies across various species. In humans, the Ca 2+ -induced complex is inferred based on experimental evidence [52]. In C. elegans, experimental evidence for the interaction is only available between TAX-6 and CNB-1, and for TAX-6 and calmodulin (CMD-1; UniProtKB O16305), and requires calcium [38,53]. As this complex is present in a range of species, we assessed the evidence across species and inferred information to complete the characterisation. As in humans, the C. elegans catalytic subunit TAX-6 forms a stable association with the regulatory subunit CNB-1. CNB-1 contains 4 Ca 2+ -binding sites, two with high affinity and two with low affinity. When  intracellular calcium levels are low, only the two highaffinity binding sites are occupied. Following an increase in intracellular Ca 2+ , the occupancy of the low-affinity sites on CNB-1 by Ca 2+ causes a conformational change of the C-terminal regulatory domain of TAX-6, resulting in the exposure of the calmodulinbinding domain and in the partial activation of TAX-6 [38]. The subsequent binding of Ca 2+ -bound CMD-1/ calmodulin leads to the full displacement of the autoinhibitory domain from the active site and possibly of the autoinhibitory segment from the substrate binding site, which fully activates TAX-6. In UniProt, the stable binary interactions between TAX-6 and CNB-1, and TAX-6 and CMD-1 are presented in the 'Interaction' section for the relevant protein entries (Fig. 5B). As this complex is only formed in response to an increase in intracellular Ca 2+ , the 'Interaction' section and 'Activity regulation' subsection of the TAX-6 UniProt protein entry contain a detailed description of how the induced complex is formed, how it is regulated by calcium, and how it leads to the activation of TAX-6. Interactions with small molecules such as calcium are indicated on a protein-by-protein basis (Fig. 5C). In the Complex Portal, all the stable and transient interactions between all proteins and all small molecules in the heterotrimeric complex are consolidated into a single calcineurin-calmodulin complex entry (Complex Portal: CPX-1128) (Fig. 5D). The assembly and the conditions and requirements for TAX-6 activation are described in the 'Properties' section (Fig. 5D). This demonstrates how the two databases display complementary information to provide a complete overview of the complex and its members.

FEM-2: ferrying substrate to complexes
The LET-92 and TAX-6 examples above show how single phosphatases can function in a range of processes by associating with different partners. In the next example, we describe the functional diversity achieved by the association of a phosphatase complex with another macromolecular complex, and explain how UniProt and the Complex Portal capture this information.
To date, the experimental data available show that FEM-2 associates with both FEM-1 and FEM-3 and that FEM-1, by interacting with ELC-1 (UniProtKB Q9BKS1), a component of the CBC E3 ubiquitin ligase complex, mediates the interaction between the two complexes (Fig. 6A) [57,58]. This macrocomplex is essential to control male sexual development in C. elegans [55,58]. Several models have been proposed to explain how the complex regulates this process. In one model, the FEM-2 phosphatase complex interacts with the zinc finger transcription factor TRA-1 (UniProtKB P34708), a sex determination regulator in C. elegans [58]. This association leads to the repression of the transcription of male-specific genes in somatic cells, and results in TRA-1 dephosphorylation. FEM-1 then associates with the CBC-fem-1 ubiquitin ligase complex and facilitates the ubiquitin-mediated degradation of TRA-1 (Fig. 6A).
UniProt presents the two experimentally proven FEM-2-containing complexes in the 'Interaction' sections of the individual protein entries for FEM-1, FEM-2 and FEM-3 (Fig. 6D). UniProt entries do not usually include substrates as a complex component due to their transient association. However, as TRA-1 appears to interact relatively stably with the complex prior to its degradation, it is included in the composition of the CBC-fem-1 ubiquitin ligase complex. By contrast, the Complex Portal is more stringent and does not present ligands as components of the complex unless they have been shown to be integral to the function of the complex. As TRA-1 acts as a substrate and is dispensable for the complex to function (other substrates may exist), it is not included in either of the Complex Portal entries for FEM-2-containing complexes (Fig. 6B). The Complex Portal does not standardly present more than one complex per entry. However, as the CBC-fem-1 ubiquitin ligase complex exerts its function on the TRA-1 substrate as a consequence of the activity of the Fem-2 phosphatase complex, this complex is also present on the entry page of the CBC-fem-1 ubiquitin ligase complex (Fig. 6B). Thus, by combining the information in UniProt and the Complex Portal, users have access to all binary   one-to-one interactions, the context for the interactions and substrate interaction data in the context of a complex.

Egg-3/4/5/MBK-2 complex: a kinase trapping complex
Phosphatases within complexes have dynamic roles. In the previous fem-2 example, we have seen that phosphatases are not only present in a complex to exert their catalytic activity, but they may also facilitate the delivery of a protein to another complex for degradation. In the next example, we use the Egg-3/4/5/MBK-2 complex to demonstrate how subcellular location dictates the function of a phosphatase complex (Fig. 7A). The Egg-3/4/5/MBK-2 complex consists of three inactive phosphatases (or pseudophosphatases), EGG-3, EGG-4 and EGG-5, and the serine/threonine-protein kinase MBK-2 (UniProtKB Q9XTF3). Whilst MBK-2 is conserved throughout the animal kingdom in species such as yeast and human [59], the three pseudophosphatases appear to be specific to nematodes as no homologues exist in other species. Therefore, the Egg-3/4/5/MBK-2 complex is likely to be unique to C. elegans and other nematode species. A combination of yeast two-hybrid, coimmunoprecipitation and in vitro pull-down assays provide experimental evidence for the existence of the interactions between the participants [24, 60,61]. Interestingly, the interaction of EGG-4 and EGG-5 with MBK-2 relies on EGG-4 and EGG-5 pseudophosphatases, which, despite the lack of catalytic activity [24], have the capacity to bind to phosphorylated proteins. This feature allows the pseudophosphatases to sequester MBK-2 and thus prevent MBK-2 from phosphorylating its substrates.
Some of the steps required for the assembly/disassembly of this complex have been elucidated based on the subcellular location of the different complex components in C. elegans lacking either egg-3, egg-4 or egg-5 (Fig. 7A). This complex is transiently formed in oocytes, and is required for the oocyte-to-zygote transition [62]. It localises to the cortex of the maturing oocyte and it is retained there until the end of meiosis I. Whilst EGG-4 and EGG-5 bind and inhibit MBK-2 kinase activity, EGG-3 acts as a scaffold to tether MBK-2, EGG-4 and EGG-5 to the oocyte cortex, and thus prevents MBK-2 accessing its cytoplasmic substrates during meiosis I. During anaphase of meiosis I, EGG-3 relocalises to the cytoplasm and is probably degraded by the anaphase-promoting complex/cyclosome (Complex Portal: CPX-3382), resulting in MBK-2 release from the complex. In UniProt, the information regarding the subcellular localisation of the individual protein components in this complex is described in the 'Subcellular location' section ( Fig. 7B) and the role of each component is documented in the 'Function' section of each protein entry. In the Complex Portal, this localisation information is summarised in the 'Function' section as the subcellular localisation determines how the complex functions (Fig. 7C). Through the complementary data captured by each resource, users are provided with a summary of subcellular location at the level of the individual complex components as well as a description of the role of location of the complex in determining its activity.

Discussion
Phosphatases are largely signal-transducing molecules that control the transmission of intracellular signals by dephosphorylating molecules, reversing the action of cellular kinases. For most species, the phosphatome contains significantly fewer phosphatases than there are kinases in the kinome, and protein phosphatases  Step ① Step ② Step ③ Step ④ Step  Step ① Step ② Step ③ Step ④ process or pathway. Thus, simply documenting the function of one biological entity in one record without considering interactors, protein networks and pathways is not feasible or indeed useful to scientific communities. UniProt acts as a central hub of knowledge, both describing interactions and pathways within entries, and also linking out to more detailed information in domain-specific resources such as model organism databases and molecular interaction resources such as the IMEx databases [63] for more details of binding partners. However, as described above, to fully understand the role of a phosphatase it is often necessary to comprehend how that protein operates in the context of a macromolecular complex of which it is a member. The Complex Portal supplies this additional level of detail, enabling a full picture of the way in which a phosphatase molecule may be regulated, may achieve functional diversity, and may dephosphorylate multiple substrates in a dynamic, but tightly controlled, series of events required for the fine control of cellular function. Apart from the Egg-3/4/5/MBK-2 complex, all the complexes experimentally characterised in C. elegans are conserved in human ( Table 3). Several of the phosphatases contained in these complexes are associated with diseases in human (Table 3). Therefore, mechanistic information on the regulation of phosphatase activity is becoming critical as drug development based on inhibiting/enhancing phosphatase activity is an actively growing field. Some phosphatase inhibitors are already available as licensed drugs, such as the immunosuppressants cyclosporin A and tacrolimus (FK506), which inhibit calcineurin activity [64]. The PP2A phosphate complexes have been shown to be involved in several disease processes [65] and a range of drugs are already available, although not licensed yet, which inhibit some of its activity directly and indirectly by targeting its interactors and substrates [66]. Conversely, drugs with neuroprotective properties designed to prevent inhibition of PP2A in Alzheimer's disease are also currently being developed [67,68]. A full understanding of the mechanisms by which these molecules function and are regulated is critical for a comprehension of human health and disease.
We describe above a comprehensive review of the existing experimental literature pertaining to the C. elegans phosphatome by expert biocurators, and the capture of summary data into the UniProt and Complex Portal databases. Now that this proof-of-concept exercise has been completed for C. elegans, efforts need to switch to updating the human phosphatome and creating the human phosphatase complexeosome. Both of these are much more challenging tasks due to the much higher number of catalytic molecules, regulatory subunits and also unanswered questions as to the role of the many paralogous proteins which exist in the human proteome as a result of gene duplication. Ongoing efforts will also ensure that entries for both species are kept up to date for new experimental data as it appears in the literature, and also that key information is transferred to less well-studied, but closely related, organisms. For example, the annotation of C. elegans entries can be used to inform the study of parasitic worms which infect both humans and animals, causing weakness and disease of the host. The work of the expert biocurators is critical to support and enhance the work of laboratory researchers in the fields of both molecular mechanistic studies and, health and disease.