A putative RNA binding protein from Plasmodium vivax apicoplast

Malaria is caused by Apicomplexa protozoans from the Plasmodium genus entering the bloodstream of humans and animals through the bite of the female mosquitoes. The annotation of the Plasmodium vivax genome revealed a putative RNA binding protein (apiRBP) that was predicted to be trafficked into the apicoplast, a plastid organelle unique to Apicomplexa protozoans. Although a 3D structural model of the apiRBP corresponds to a noncanonical RNA recognition motif with an additional C‐terminal α‐helix (α3), preliminary protein production trials were nevertheless unsuccessful. Theoretical solvation analysis of the apiRBP model highlighted an exposed hydrophobic region clustering α3. Hence, we used a C‐terminal GFP‐fused chimera to stabilize the highly insoluble apiRBP and determined its ability to bind U‐rich stretches of RNA. The affinity of apiRBP toward such RNAs is highly dependent on ionic strength, suggesting that the apiRBP–RNA complex is driven by electrostatic interactions. Altogether, apiRBP represents an attractive tool for apicoplast transcriptional studies and for antimalarial drug design.

Malaria is caused by Apicomplexa protozoans from the Plasmodium genus entering the bloodstream of humans and animals through the bite of the female mosquitoes. The annotation of the Plasmodium vivax genome revealed a putative RNA binding protein (apiRBP) that was predicted to be trafficked into the apicoplast, a plastid organelle unique to Apicomplexa protozoans. Although a 3D structural model of the apiRBP corresponds to a noncanonical RNA recognition motif with an additional C-terminal a-helix (a 3 ), preliminary protein production trials were nevertheless unsuccessful. Theoretical solvation analysis of the apiRBP model highlighted an exposed hydrophobic region clustering a 3 . Hence, we used a C-terminal GFP-fused chimera to stabilize the highly insoluble apiRBP and determined its ability to bind U-rich stretches of RNA. The affinity of apiRBP toward such RNAs is highly dependent on ionic strength, suggesting that the apiRBP-RNA complex is driven by electrostatic interactions. Altogether, apiRBP represents an attractive tool for apicoplast transcriptional studies and for antimalarial drug design.
Malaria is one of the most devastating parasitic diseases in the world, causing death to 1-2 million people per year, mostly children. This disease is caused by Apicomplexa protozoans from the Plasmodium genus passing onto humans and animals through the bite of the female mosquitoes from the Anopheles genus [1]. The discovery of an essential plastid organelle, the so-called apicoplast, has rekindled current search for new drugs to fight malaria [2]. Widely found in Apicomplexa except for Cryptosporidium species [3], the apicoplast is a vestigial nonphotosynthetic plastid surrounded by four membranes due to its secondary endosymbiosis origin. Indeed, an ancient eukaryotic cell engulfed a cyanobacterium to become a photosynthetic eukaryotic alga. Then, a Plasmodium predecessor ingested the eukaryotic alga to establish a new symbiosis and preserved it as a plastid [4]. Hence, the prokaryote-derived metabolic pathways within the apicoplast are substantially different to those from the human host, bringing new opportunities to design drugs against malaria [5].
The apicoplast genome of Plasmodium spp. consists of a highly conserved~35-kb circular, doublestranded DNA lacking the genes encoding proteins involved in photosynthesis [6]. Unlike mRNAs produced in the parasite nucleus, those synthesized in the apicoplast are polycistronic, and they mainly provide the essential machinery for transcription and translation needed for organelle housekeeping functions [7]. An important exception is the presence of the SufB protein (ycf24), which is a constituent of the [Fe-S] biogenesis pathway [8]. To our knowledge, a small number of proteins involved in transcription, control of mRNA stability, and translation within the organelle have been studied [7,9,10].
Many genes initially encoded in the apicoplast have been transferred to the parasite nucleus to avoid deleterious mutations of nonrecombinant genomes. Consequently, apicoplast biogenesis and function critically relies on targeting nuclear encoded proteins back to the organelle, by distinctive apicoplast targeting motifs. Apicoplast proteins present a bipartite leader sequence at its N terminus consisting on a hydrophobic signal peptide (SP) and a chloroplast-like transit peptide (TP). While the SP allows the entry into the secretory pathway, the TP is a relatively simple and flexible trafficking signal for post-translational targeting and translocation to the apicoplast [2,11] (Fig. 1).
Up to 466 apicoplast predicted proteins were identified in the P. falciparum genome using the PlasmoAP and PATS algorithms that extract amino acid features from TP that target proteins to the organelle [12,13]. A similar analysis in the P. vivax genome revealed the presence of 316 proteins predicted to be targeted to the apicoplast [14,15]. One of them (PVX_084415 or apiRBP in this manuscript) is a putative RNA binding protein (RBP) displaying a predicted N-terminal signal peptide and a single RNA recognition motif (RRM). In P. falciparum, 189 genes have been annotated as putative RBPs: 179 of them possess an ortholog in P. vivax, including the one coding apiRBP [16]. Unfortunately, most of them lack definitive functional annotations and only a few have been structurally characterized (e.g., PDB: 2N7C and 2MYF).
To unveil fundamental aspects of RNA metabolism in Plasmodium that may hint to new targets for antimalarial therapy, we took the challenge of characterizing apiRBP, which is extremely prone to aggregation. In this work, we resorted to theoretical solvation analysis of a 3D model of the protein to design a soluble GFP-fused chimera. Finally, we proved by calorimetry assays that apiRBP is indeed able to recognize target RNA stretches, presumably driven by electrostatic interactions.

Materials and methods apiRBP plasmid constructs
Three different apiRBP plasmid constructs were used in this study. The first one, His-apiRBP* (residues 76-182), was obtained by amplification of the PVX_084415 gene from P. vivax Sal1 cDNA and was cloned into the NcoI-NotI cloning sites of the pETM-11 vector (Invitrogen) for protein expression in Escherichia coli. The second one, Fig. 1. Protein targeting to the Plasmodium spp. apicoplast. Most apicoplast proteins traffic to the organelle due to N-terminal bipartite leaders (SP and TP). The nascent apicoplast proteins are targeted to the endoplasmic reticulum (ER) membrane, where its SP is removed during cotranslation by a signal peptidase. The exact mechanisms that lead the transport from the ER lumen to the apicoplast are not fully understood but may involve ER-apicoplast communication by vesicular transport and TP recognition [2].
His-apiRBP (residues 81-182), is a derivative construct from His-apiRBP* but lacking the first five amino acids (NSITL) ( Fig. 2A). It was cloned into the pIVEX2.4d cellfree expression vector by the In-Fusion cloning kit (Clontech) following the manufacturer's instructions. The third construct, apiRBP-GFP-His, was cloned in a modified pET21a(+) vector. Such vector was kindly provided by Prof. Frank Bernhard (Frankfurt, Germany) and contains a C-terminal GFP-His-tag (superfolderGFP) as a quantitative reporter of gene expression. In this variant, the apiRBP gene also comprehends residues 81-182. All primer sequences are available upon request.

Protein expression
His-apiRBP* (residues 76-182) recombinant protein was expressed in Luria-Bertani (LB) medium in E. coli BL21 (DE3) cells. Cultures were grown at 37°C with continuous agitation of 180 r.p.m. Protein expression was induced by the addition of isopropyl-b-D-thiogalactopyranoside (IPTG) to a final concentration of 1 mM once an OD 600 of 0.8 was reached. Cells were collected by centrifugation after 5 h of continuous agitation at 30°C. To isolate the inclusion bodies, cells obtained from 1 L of culture were resuspended in 20 mL of 100 mM Tris/HCl buffer (pH 8) with His-apiRBP* (residues 76-182) and His-apiRBP (residues 81-182). The five Nterminal amino acids that are included in the first construct, but not in the second, are marked by an orange square and an asterisk. The apiRBP SP is colored in purple. Predicted secondary structure elements by JPred V.4 [30] (a-strand and b-sheet) and predicted RNA binding sites by BindN software [31] are also indicated. (B) (Left) SDS/PAGE Coomassie gel (12%) of His-apiRBP* (residues 76-182) expression in E. coli. M stands for molecular mass marker in kDa. Lanes 1 and 2 are the pellet and the supernatant from cell sonication, while lane 3 stands for the supernatant of inclusion bodies refolding. The protein was found in fractions from lanes 1 and 3 (marked with a red asterisk) with a molecular weight of 19 kDa. All lanes come from the same gel, but it has been spliced and put together (dashed line between lanes 2 and 3). (Right) Far-UV (190-250 nm) CD spectrum of the refolded protein indicating the contents of secondary structure elements. 100 mM NaCl, 1 mM dithiothreitol (DTT), and 1 mM phenylmethylsulfonyl fluoride (PMSF). Lysozyme was added to the cell suspension (0.35 mgÁmL À1 ), which was then sonicated. After that, the suspension was treated with DNase I (20 mgÁmL À1 ) for 1 h at 37°C and centrifuged at 30 000 g for 15 min at 4°C. The pellet, containing the inclusion bodies, was washed twice with ice-cold PBS buffer enriched in 1% of Triton X-100. The inclusion bodies were then resuspended in 2 mL of 100 mM Tris/HCl icecold buffer (pH 8) containing 100 mM NaCl, 1 mM DTT, and 6 M guanidinium chloride. The suspension was incubated at 20°C for 2 h and centrifuged at 100 000 g for 20 min at 4°C. The inclusion bodies were soluble in the supernatant fraction, which mainly contained the recombinant His-apiRBP*, so protein refolding was performed without further purification. Two milliliters of inclusion bodies was added to 200 mL of ice-cold refolding buffer containing 100 mM Tris/HCl (pH 8), 5 mM EDTA, and 0.5 M L-arginine with low agitation at 15°C [17]. After 24h incubation at 15°C with continuous agitation, the protein solution was centrifuged and most of the expressed His-apiRBP* was soluble in the supernatant which was dialyzed against 100 mM Tris/HCl (pH 8) buffer.
Finally, the apiRBP-GFP-His chimera was produced in LB medium in E. coli BL21 (DE3) cells. Cultures were grown at 37°C with continuous agitation of 180 r.p.m. Protein expression was induced by the addition of IPTG to a final concentration of 0.5 mM after reaching OD 600 of 0.3. Cells were collected by centrifugation after 16 h of continuous agitation at 20°C. Posterior to cell disruption by sonication in the presence of 1 mM PMSF and cOmplete Protease Inhibitors (one tablet per 50 mL extraction solution; Sigma), extracts were centrifuged at 17 000 g and the His-tagged protein was then purified by nickel affinity chromatography (Ni Sepharose 6 Fast Flow; GE Healthcare) applying an imidazole gradient (10-300 mM) in 20 mM Tris and 100 mM NaCl, pH 7.4. The purified fractions containing apiRBP-GFP-His were then submitted to FPLC (AKTA Prime) to remove protein contaminants. Protein concentrations were determined spectrophotometrically by the Bradford assay, and the molecular weight of the constructs was verified by MALDI-TOF spectroscopy.

Circular dichroism spectroscopy
Circular dichroism (CD) spectrum was recorded on a Jasco J-815 spectropolarimeter equipped with a Peltier temperature control system. The secondary structure analysis of His-apiRBP* (residues 76-182) was performed by recording far-UV CD spectra (190-250 nm) with 3 lM samples in H 2 O at 25°C. The spectrum was an average of 20 scans. The a-helix and b-sheet content was obtained with CDPRO software [18], which includes the algorithms CONTIN, SELCON and CDSSTR and the CLSTR option to compare the protein folding with a set of similar folded proteins.

Isothermal titration calorimetry
Isothermal titration calorimetry (ITC) experiments were performed using a microcalorimeter (TA Instruments) at 25°C titrating apiRBP-GFP-His or GFP-His (10 lM) over 10-mer U-rich RNA oligonucleotides (1 lM; Sigma). Both the proteins and oligonucleotide samples were in 20 mM Tris (pH 7.4) buffer with/without 50 mM NaCl (indicated in each experiment). Measurements were repeated at least twice. The integrated data of heat per injection normalized per mol of injectant versus molar ratio were analyzed with AFFINIMETER software v2.1608 and were fitted to a 1 : 1 interaction model.

Molecular dynamics computations
Molecular dynamics (MD) computations were carried out in a periodic orthorhombic box using AMBER12 with the AMBER 14SB force field [21] and PME electrostatics with a Ewald summation cutoff of 9 A. For this purpose, the system was neutralized with seven Cl À ions and solvated with 6058 OPC [22] water molecules. The protein model side chains were relaxed by energy minimization. Then, solvent and counter-ions were subjected to 5000 energy minimization steps followed by 500-ps NPT-MD computations using isotropic molecule position scaling and a pressure relaxation time of 2 ps at 298 K. Temperature was controlled by a Langevin thermostat [23] with a collision frequency of 5 ps À1 . The density of the system reached a plateau during the first 50 ps. Then, the whole system was energy-minimized and submitted to NVT-MD computations at 298 K. The SHAKE algorithm [24] was used to constrain bonds involving hydrogen atoms. Trajectory analyses, including grid inhomogeneous solvation theory (GIST) [25] analyses, were carried out with CPPTRAJ [26]. Grid spacing was 0.5 A. UCSF Chimera [27] was the software used for molecular graphics and modeling interface.

apiRBP annotation and secondary structure
A search of the apiRBP gene (transcript ID PVX_084415) in PlasmoDB (the official database of Plasmodium sequencing projects [28]) predicted an RNA binding protein (RBP) of 182 amino acids. In its N-terminal end, a 17-residue SP was also annotated by combining the predictions of SignalP 3.0 with orthology information (Fig. 2A). However, the plastid TP of apiRBP has not been bounded yet, probably due to the fact that they are normally variable in length, have no primary consensus sequence, and are only distinguished by positively charged residues and an abundance of hydroxylated residues [29]. The InterPro Domain search revealed that apiRBP contains a single RNA recognition motif (RRM) of about 90 amino acids whose secondary structure analysis by JPred V.4 [30] corresponded to the canonical b 1 a 1 b 2 b 3 a 2 b 4 topology, with four-stranded b-sheets packed against two a-helices. Different lengths of the RRM are proposed, depending on the superfamily taken as a reference; for instance, SSF54928 superfamily aligns with residues 75-169, while the PS50102 Prosite domain aligns with residues 83-163. Hence, different construct lengths were tested in this report (see below). BindN software [31] also identified several potential RNA-binding residues within the RRM taking into account the pK a value, hydrophobicity index, and molecular mass of each amino acid. Of note, among those binding residues, is the prevalence of positively charged residues (Arg and Lys stretches), especially the ones localized in the b 2 -strand and the C-terminal tail ( Fig. 2A).
His-apiRBP* (residues 76-182, more alike to the SSF54928 superfamily) was produced in E. coli cells and isolated from inclusion bodies due to its low solubility in the supernatant fraction (Fig. 2B, left). Correct folding and secondary structure content were assessed by CD (Fig. 2B, right). The secondary structure content was 14.8% a-helix, 37.8% b-strand, 22.6% turn, and 23.8% random coil, in agreement with the expected values for an RRM domain, although the a-helix content was slightly higher than expected [32,33]. Unfortunately, protein stability was too low to perform further functional assays. To test whether such instability was caused by the presence of the end of the TP in the N terminus of the apiRBP construct, we designed a shorter construction excluding amino acids 76-80, His-apiRBP (residues 81-182, more alike to the PS50102 Prosite domain) (Fig. 2A); and cloned it in a vector compatible with cell-free protein expression. An increase in protein solubility was achieved, still insufficient (with or without detergents) to allow the structural and functional characterization of the protein under any of the conditions tested (Fig. S1).

apiRBP structural modeling
A 3D structural model for apiRBP was obtained by the comparative methods implemented in the Robetta server (http://robetta.bakerlab.org) (Fig. 3A). The structural model displayed the expected b 1 a 1 b 2 b 3 a 2 b 4 topology of canonical RRMs followed by an extra a-helix (herein named as a 3 ) using the structure of the histone-lysine N-methyltransferase SETD1A (PDB: 3S8S) and other RRM-containing proteins as templates. Moreover, a structural homology search using DALI database [34] returned a Z-score of 14.4, with an RMSD value of 1. 8 A, between apiRBP and SETD1A, including the novel C-terminal a 3 -helix found in apiRBP. Comparison of apiRBP with another RBPs, such as ELAV-like protein 1 (PDB: 4FXV), returned a Z-score of 12.3, although DALI consistently excluded the a 3 -helix of apiRBP from such structural alignments. The 10-residue a 3 -helix is oriented parallel with respect to the b-sheet, mainly through contacts between Leu167 at a 3 and the b-sheet face (Phe86 in b 1 and Ile130 in b 3 ), along with the H-bond involving Asn164 (a 3 )-O d1 and Arg117 (b 2 )-guanidinium groups. Additionally, the sequence of apiRBP shows a two-residue insertion at b 2 (Ala114 and Arg115) with respect to homologous RRMs such as those from ELAV-like protein 1 (31.2% identity; PDB: 4FXV), the RNA recognition motif 1 from HuR (31.2% identity; PDB: 3HI9), or RNA binding domain 1 from HuC (31.9% identity; PDB: 1D8Z). However, the secondary structure of b 2 seems to be unaffected in the model. Interestingly  [33] are not conserved in apiRBP showing only two aromatic residues (Phe84 and Phe86) at the RNP2 of the b1-strand (Fig. 3A). Further, the loop joining b 4 and a 3 covers the aromatic ring of Phe86, probably preventing it to form p-stacking interactions with RNA. To check the behavior of b 2 and a 3 , and to understand the experimentally observed instability of the protein, a 50-ns molecular dynamics computation was reckoned (Fig. 3). The RMSD evolution depended on whether residues 81-161 or 81-182 were used for the structural alignments (Fig. 3B). This indicates that the C-terminal end of apiRBP-including a 3 -is more mobile than the canonical RRM core. The secondary structure timeline (Fig. 3C) shows that the structure of a 3 is stable along the trajectory. Thus, the RMSD differences in Figure 3B can be attributed to a slight rigid-body motion of a 3 and the high mobility of the last C-terminal residues (stretch 176-182). Interestingly, the three-amino acid sequence (from 162 to 164) neighboring the N terminus of a 3 -helix is folded in a short 3 10 -helix along MD (highlighted in ocher in Fig. 3C). Consistently, residues 162-182 show the highest fluctuations and RMSD values when comparing the average structure to the initial, energy-minimized model (Fig. 3C). In addition, b 2 resulted in being unstable probably because of the above-mentioned two-residue insertion (highlighted in yellow in Fig. 3A) with respect to other RRMs. Notably, the sequence stretches flanking the insertion showed high fluctuations and RMSD values with respect to the initial, energy-minimized model (Fig. 3A and 3C).

Solvation analysis of apiRBP
To understand the high propensity of apiRBP to aggregate in solution, MD computations were performed with explicit OPC water molecules, which are the best to represent the bulk properties of water [22]. Then, the trajectories were analyzed using the grid inhomogeneous solvation theory to assess the number density of water molecules around the domain (Fig. 4A and 4B). Notably, the density of water molecules decreased near the surface patch of the RRM, and the translational entropy of water molecules in this region increased with respect to the bulk. This revealed a hydrophobic cluster near the C terminus involving side chains from Ile173 at a 3 and Val176, Leu177, and Pro179.

apiRBP binding to U-rich RNA stretches
Taking into account the results from the solvation analysis, a new version of the apiRBP construct (residues 81-182) was designed in-frame with a C-terminal GFP-His-tag (namely apiRBP-GFP-His). Presumably, the high stability and solubility of the GFP-tag would serve as a stabilizing factor for the insoluble apiRBP, as previously seen for other protein targets [35]. As anticipated, when recombinant apiRBP-GFP-His expression was performed in E. coli, the chimera was stable enough to perform further binding analysis (Fig. S2). As no RNA binding activity has been reported for the structurally apiRBP-related SETD1A RRM domain [36,37], nor for its yeast homologous SET1 RRM1 [38], in this work we tested the ability of apiRBP to bind uridine-rich (U-rich) 10-mer RNA. This is a consensus AU-rich element (ARE) at the 3 0 -UTR of mRNAs recognized by ELAV family of proteins that share~31% of sequence identity with apiRBP. Isothermal titration calorimetry (ITC) performed at low ionic strength showed that apiRBP-GFP-His recognized 10-mer U-rich RNA stretches (Fig. 5A, left) with an affinity within the lM range when fitting the stoichiometry to n = 1 (Table 1). This indicates that one molecule of apiRBP-GFP-His binds to one molecule of RNA (1 : 1). When 50 mM of NaCl was added to the ITC buffer, the recognition of 10mer U-rich RNA was negligible (Fig. 5B, left). Beyond the high isoelectric point (pI) of apiRBP (pI = 9.72), its surface electrostatic potential at low ionic strength (Fig. 5A, center and right) showed a high positively charged face (surrounding RRM b 2 and a 3 ) and a slightly negative patch in the opposite side of the protein. This evidences a dipole, which is partly attenuated at higher ionic strength (Fig. 5B, center and right), although no substantial differences in His-apiRBP solubility have been observed upon salt addition (data not shown). The strong dependence on ionic strength of the binding indicated that the apiRBP-RNA complex is mainly driven by electrostatic interactions rather than p-stacking. This is consistent with the fact that aromatic residues placed at RNP2 are occluded by the extra a 3 -helix. However, we cannot exclude that binding to other RNA/DNA stretches would lead to higher protein-target affinities and/or sequence specificity. Finally, a negative control carried out with free GFP-His showed no interaction by ITC with 10-mer U-rich RNA, proving that apiRBP is leading RNA recognition. Additionally, the apiRBP-GFP-His dilution experiment in 20 mM Tris (pH 7.4) produced no substantial heats (Fig. S3).

Discussion
The cyanobacterial origin of the apicoplast makes this organelle an excellent target for pharmaceutical research [39]. Here, we have characterized a putative P. vivax RBP that is predicted to be targeted to the apicoplast [14,15]. The elevated protein conservation of its orthologous in the genus Plasmodium (72% of amino acid similarity) indicates that this protein may play a key role for the malaria parasite life cycle. Most of RBPs are composed of small RNA binding domains (RBDs) that are needed for their recruitment to specific RNA targets. Within RBDs, there are four prominent families: RNA recognition motifs (RRMs), zinc fingers, KH domains, and double-stranded RNA binding motifs [33].
The identification of secondary structure elements in apiRBP corresponded with a noncanonical RRM domain (four-stranded b-sheet packed against two a-helices), which is extended by a C-terminal a 3 -helix. This is consistent with the increased a-helix content detected by CD (14.8%) in comparison with other RRMs [32]. Other noncanonical RRMs that include additional secondary elements have been extensively reported in the literature. Such is the case of the extra  N-terminal a-helix (a 0 ) in the third RRM of TIA-1 (RRM3) [40,41], the b-sheet extended by an additional fifth antiparallel b-strand in PTB RRM2 and RRM3 modules [42], or the RRM with an extra C-terminal helix (xRRM) in La and LARP7 proteins [43,44]. In addition, its RNP1 module is not fully conserved, whereas RNP2 is partly occluded by a 3 -helix. Eukaryotic RRMs are often found as multiple copies within a protein, and together with other protein domains, they confer different affinity and specificity for the RNA sequences [45][46][47][48][49][50][51][52][53]. RRMs are also found in prokaryotes, where they tend to occur as single domains in small proteins, typically around 100 amino acids in length (Pfam ID PF0076 and InterPro ID IPR000504 [54,55]). Both the short length and the presence of a single RRM domain in apiRBP point to an endosymbiotic origin of the protein.
In most of the solved RRM-RNA complexes, two of the three aromatic residues in RNP1 make stacking interactions with the nucleobases, while the third can interact hydrophobically with the sugar rings. In addition, an Arg or a Lys residue at RNP1 forms a salt bridge with the phosphodiester backbone [33]. Particularly interesting in apiRBP is the lack of aromatics at the strand b 3 , suggesting that the binding between apiRBP and RNA might occur by electrostatic steering rather than by nucleobase stacking. This is in agreement with the mapping of the surface electrostatic potential showing a highly positively charged face surrounding b 2 and the C-terminal a 3 -helix; and also with the finding that a higher salt concentration in the buffer disrupted the binding of apiRBP to target U-rich RNAs by ITC. Finally, the prediction server BindN [31] localized potential RNA binding sites that were out of the limits of the strands b 1 and b 3 (particularly concentrated in b 2 ) . Then, if RNA recognition was b 2 -mediated, this highly dynamic secondary element could be fixed upon apiRBP-RNA complex formation. Examples of RRMs using unusual RNA binding surfaces are reported in the literature. In such cryptic, atypical RRMs, aromatic residues have been displaced from the b 1 -b 3 motif to the b-connecting loops in qRRMs [56] or to the so-called RNP3 at b 2 in xRRMs [43,44,57]. In the particular case of apiRBP RRM, b 2 -strand could bind to RNA through electrostatic contacts. Additionally, we cannot exclude the participation of the C-terminal end of apiRBP in the binding to RNA as it is also highly positively charged. Both b 2 and a 3 may take part of a wide RNA binding platform. Altogether, these findings suggest an uncommon way of binding for an unusual RRM module and open a new door for further structural and functional characterization to better understand those mechanisms.
The fold of a protein and its stability toward denaturation are governed by the sequence of amino acids and the environment (solvent, salts, pH, temperature, crowding, etc.) [58]. Expression of a soluble recombinant apiRBP in E. coli was unsuccessful under the conditions tested in this study. The solvation analysis of apiRBP showed an exposed hydrophobic cluster at the C terminus. We then resolved to fuse a GFP-tag to the C-terminal end of the protein generating a more stable chimera that was useful for functional assays. Indeed, a model of the apiRBP-GFP chimera protein showed that GFP blocked the C-end hydrophobic patch in the RBP, probably preventing aggregation and interfering, somehow, with RNA binding (Fig. S4). A recent thermodynamic and molecular modeling study revealed that reaction in tight nucleic acid-RRM interactions is mostly enthalpy-driven [59]. According to their analysis, all these interactions display a strong enthalpy-entropy compensation effect (see Fig. 4A in [59]). Despite the fact of using a GFPtagged chimera, plotting our values in Figure 4A of the manuscript would match their fitting at one extreme. Similar compensation effects have been observed in other systems [60,61]. Nevertheless, for a deeper structural approach, it would be of great interest to study the RRM module alone under native conditions.
Additionally, the exact residue where the TP finishes and the RRM domain starts in apiRBP still remains unknown. Interestingly, a study of the TP in Toxoplasma gondii showed that positive charges are more influential in the N-terminal portion of the TP, that arginine and lysine are equally suitable, and that the exact position of these charges is not important [11]. The accumulation of positive charges along the b 2 -strand and the extra C-terminal a 3 -helix of the RRM module is particularly interesting in apiRBP. Once proven that apiRBP is targeted to the apicoplast by cellular approaches, the possible role of those positive charges into the transit to the apicoplast organelle could be investigated. This would resemble the functioning of the single RRM from the trypanosome TcUBP1 RBP that behaves as a structural nuclear localization signal (NLS), alternating nuclear import and RNA binding [62]. Truncations and strategic point mutations of apiRBP would help to prove or reject this theory.
Historically, malarial drug design has been focused on compounds that modulate protein function such us doxycycline [63], but recently, RNA has become a new target for pharmaceutical companies [64]. Aminoglycoside antibiotics that target the RNA component of the small ribosomal subunits are being widely used for the treatment of bacterial infections. In addition, novel approaches to drug discovery are identifying potential target sites in mRNA molecules, in particular the binding sites of proteins that regulate mRNA translation or stability [64]. The future molecular characterizations of proteins with a crucial role in expression of plastid genes, such as apiRBP, are a promising target for RNA-based antimalarial drugs.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Fig. S1. Solubilization tests of His-apiRBP synthesized in cell-free extracts. Fig. S2. Protein purification, stability, and identification of apiRBP-GFP-His. Fig. S3. Control experiments of the apiRBP-GFP-His binding to RNA by isothermal titration calorimetry. Fig. S4. apiRBP-GFP structural model.