miRNA‐mediated ‘tug‐of‐war’ model reveals ceRNA propensity of genes in cancers

Competing endogenous RNA (ceRNA) are transcripts that cross‐regulate each other at the post‐transcriptional level by competing for shared microRNA response elements (MREs). These have been implicated in various biological processes impacting cell‐fate decisions and diseases including cancer. There are several studies that predict possible ceRNA pairs by adopting various machine‐learning and mathematical approaches; however, there is no method that enables us to gauge as well as compare the propensity of the ceRNA of a gene and precisely envisages which among a pair exerts a stronger pull on the shared miRNA pool. In this study, we developed a method that uses the ‘tug of war of genes’ concept to predict and quantify ceRNA potential of a gene for the shared miRNA pool in cancers based on a score represented by SoCeR (score of competing endogenous RNA). The method was executed on the RNA‐Seq transcriptional profiles of genes and miRNA available at TCGA along with CLIP‐supported miRNA‐target sites to predict ceRNA in 32 cancer types which were validated with already reported cases. The proposed method can be used to determine the sequestering capability of the gene of interest as well as in ranking the probable ceRNA candidates of a gene. Finally, we developed standalone applications (SoCeR tool) to aid researchers in easier implementation of the method in analysing different data sets or diseases.


Introduction
Recent studies have unmasked the ability of a transcript to influence the expression of similar or different transcript via sequestration of shared microRNA (miRNA). Such transcripts are named as competing endogenous RNA (ceRNA), which include a mixed pool or individual pools of messenger RNA (mRNA), long-noncoding RNA (lncRNA), pseudogenes and circular RNA (circRNA) (Cesana et al., 2011;Hansen et al., 2013;Memczak et al., 2013). They regulate each other by competing for the common pool of miRNA (s)  and their shared miRNA response elements (MREs) act as natural molecular sponges for the miRNA (Cazalla et al., 2010;Jeyapalan et al., 2011;Kloc, 2008;Lee et al., 2009;Poliseno et al., 2010;Seitz, 2009). miRNA is a noncoding RNA of about 22 nucleotides modulating either degradation or silencing of the transcripts by binding to them (Bartel, 2009;Liu et al., 2013;Roy and Mallick, 2017;Samantarrai and Mallick, 2017) preferably on 3 / UTR. The interactions among the pool of miRNA and their target MREs in a given biological system give rise to an unprecedented complex network of miRNA-mediated regulations. The rationale for the existence of such a complex network is supported by the facts that (a) each transcript containing multiple MREs can be the target of multiple distinct miRNA (Rennie et al., 2016), and (b) each miRNA can have multiple target MREs belonging to the same or different transcripts . This redundancy in the synergy of the miRNA and the transcripts led to the discovery of ceRNA. Salmena and colleagues formally put forward the ceRNA hypothesis in August, 2011  that was later on endorsed by several studies demonstrating functions of the ceRNA in various biological processes such as viral infections (Cazalla et al., 2010), muscle development (Cesana et al., 2011), embryonic stem cell differentiation (Wang et al., 2013), angiogenesis (Gao et al., 2016a), tumorigenicity (Gao et al., 2016b;Jeyapalan et al., 2011;Li et al., 2014b) and metastases (Li et al., 2016;Zheng et al., 2015) in different cancers.
Although ceRNA research is in its infancy, emerging evidence suggests that ceRNA can regulate the function of miRNA by acting as oncogenes or tumour suppressors contributing to several tumorigenic processes. Hence, ceRNA are pivotal in understanding the additional dimension of post-transcriptional gene regulation that will unearth the underlying unknown mechanism of cancers. Among the ceRNA, mRNA have recently emerged as notable ceRNA candidates in some of the cancers as well as in normal cells. The competition between forkhead box protein O1 (FOXO1) and E-cadherin (CDH1) for miR-9 was reported wherein FOXO1 acted as a ceRNA resulting in inhibition of metastasis of breast cancer cells by inducing CDH1 expression (Yang et al., 2014;Zhou et al., 2014). A study reported significant positive correlation between erythropoietin receptor (EPOR) and erb-b2 receptor tyrosine kinase 2 (ERBB2) levels in breast cancer, both of which are targeted by miR-125b and behave as ceRNA (Ferracin et al., 2013). The ceRNA activity of versican, a chondroitin sulphate proteoglycan present in the extracellular matrix, against fibronectin and CD34 via miR-133a, miR-199a, miR-144 and miR-431 resulted in the development of hepatocellular carcinoma (Fang et al., 2013). A group studying HCT116 colon cancer cells detected KRAS and ZEB2 to regulate and be regulated by PTEN through ceRNA mechanism (Gao et al., 2012).
PTEN and ZEB2 are also found to modulate each other's expression in melanoma by competing for miR-181, miR-92, miR-25 and miR-200b (Karreth et al., 2011). In addition to these, several lncRNA and pseudogenes have been reported to be involved in different cancers by sponging miRNA. Linc-RoR function as a ceRNA and impairs miR-205-dependent repression of zinc finger E-box binding homeobox (ZEB1 and ZEB2), aiding in the progression of breast cancer by inducing epithelial mesenchymal transition (EMT) (Hou et al., 2014). An oncogenic lncRNA, highly upregulated in liver cancer (HULC) acts as a ceRNA of PRKACB through miR-372 (Pilyugin and Irminger-Finger, 2014). PTENP1 and KRAS1P affect the expression levels of their cognate genes, PTEN and KRAS, respectively, by sponging shared miRNA through ceRNA-mediated mechanisms in carcinomas (Alimonti et al., 2010;Poliseno et al., 2010;Trotman et al., 2003).
Several parameters that determine ceRNA activity have been proposed to mimic the milieu within a cell (Ala et al., 2013), which include (a) the common number of miRNA, (b) the number of shared MREs, (c) the concentration levels of ceRNA and miRNA and (d) the binding affinity of miRNA to a transcript. These along with the availability of a plethora of data encouraged researchers to develop computational methods and databases for prediction of the ceRNA. A method named cefinder predicted ceRNA by ranking mRNA according to the number of MREs shared between two mRNA (Sarver and Subramanian, 2012). The predictions of this method are publicly available through a database, ceRDB. The method opened up a new arena of ideas for the scientific community but did not consider some essential variables such as the concentrations of the transcripts and miRNA in the system to predict ceRNA precisely. starBase v2.0 is another database which predicts ceRNA using hypergeometric test (Li et al., 2014a) based on the number of common miRNA shared between the transcripts. The Database of Human long-noncoding RNA (lnCeDB) predicts lncRNA as candidate ceRNA by defining two scores (Das et al., 2014). However, this failed to infer whether highly ranked pairs with lower expression values or lowly ranked pairs with higher expression values should be preferred as ceRNA. A similar database, known as Pan-ceRNADB, predicted only mRNA as ceRNA candidates across 20 cancer types (Xu et al., 2015) using hypergeometric test. Although it took into account the correlation between the expressions of the genes, it did not consider other features of ceRNA which makes it less sensitive in making reliable predictions. TraceRNA predicts ceRNA by considering most of the known attributes of the miRNA/mRNA interactions (Flores et al., 2014). However, this method did not include miRNA expression, an important parameter while making predictions, which was later considered optionally by its sequel, NetceRNA. Moreover, the utility of Net-ceRNA was illustrated and evaluated only in breast cancer. SpongeScan (Furio-Tari et al., 2016) is another web interface to predict MREs of lncRNA that act as ceRNA. Apart from being concerned with only lncRNAs, the method talks about individual MREs in contrast to the entire transcript. The most recent addition to this effort is a machine-learning approach, called CERNIA (ceRNA predIction Algorithm) that considers the density, distribution and energy of the MREs along with the number of common miRNA as well as correlation of the gene expressions (Sardina et al., 2017).
At present, there is a lack of methods that enable to predict which of the transcripts in a ceRNA pair will act as the natural miRNA sponge in a particular cancer type. The majority of the available resources simply predict pairs of transcripts along with their shared miRNA and users have to intuit which transcript in the pair sequesters more. To address this issue, we propose a mathematical model with the ability to predict potential ceRNA pairs as well as assign relative sequestering strength to each of the genes in the pair. The model computes SoCeR (score of competitive endogenous RNA) by considering several key target-binding features of CLIP-identified miRNA targets, expression profiles of individual miRNA and mRNA as well as correlation in expression between the transcripts obtained from RNA-Seq data of TCGA. This score tells the user which of the transcripts sharing a miRNA pool is potentially the stronger attractive force and also indicates the strength of the competition between them in the specified cancer type.

Materials and methods
We adopted multiple steps and parameters ( Fig. 1) to incorporate ceRNA concept into a novel mathematical model to predict ceRNA of a GOI (gene of interest) that share MREs from a set of miRNA expressed in 32 cancer types (Table S1).

Data collection and processing
The expression profile of genes and miRNA of different cancer types were obtained from RNA sequencing (RNA-Seq) data available at The Cancer Genome Atlas (TCGA). We preferentially selected 32 cancer types for which transcriptional profiles of both genes and miRNA were available at TCGA. Importantly, we used level 3 RNA-Seq (IlluminaGA and Illu-minaHiSeq) data sets for genes as well as for miRNA that have normalized expression values in terms of RPKM/RSEM. These values indicate the abundances of an individual transcript in a sample from the transcriptome reads generated from the sequencing data. We parsed miRNA identifiers of TCGA using miR-Base (Release 20) (Kozomara and Griffiths-Jones, 2014) to get expression profiles of 1100 mature miRNA.
The targets of 1100 miRNA that comprised of 249 conserved and 851 nonconserved human miRNA were downloaded from miRSVR, which is a part of miRanda (http://www.microrna.org) (Betel et al., 2008). The target pool includes 2940741 MREs of 20 817 transcripts (18 913 genes), which contains information about its conservation, energy, alignment score and seed type residing on different transcripts. The miRSVR algorithm identifies a significant number of experimentally verified MREs based on a regression model trained on contextual features of CLIP-Seq predicted target sites (Betel et al., 2010). The genes considered for the study included mostly 17 685 (93.51%) mRNA in addition to 218 (1.15%) lncRNA, 166 (0.88%) pseudogenes and 844 (4.46%) miscellaneous RNA (Fig. S1).

Computation of positively correlated genes
An elevated expression of ceRNA leads to increased expression of its paired gene by limiting the amount of shared miRNA bound to the latter. Consequently, genes of a ceRNA pair are positively co-expressed in a cell (Chiu et al., 2015;Sumazin et al., 2011;Tay et al., 2011). Therefore, we computed the correlation between each pair of genes across all the available samples of cancer to identify preliminary sets of probable ceRNA pairs in them. We considered the transcripts with nonzero expression values (RPKM/RSEM) in a cancer type for correlation analyses. Spearman's correlation coefficient (q) was used over that of Pearson's as gene expression profiles of cancers are not normally distributed. Spearman correlation coefficient allows more robustness towards any extreme values in the expression profiles (Mukaka, 2012). We filtered only positively correlated genes with q > 0.0. The P-values of all positively correlated pairs were sorted and then FDR corrected. The pairs with corrected P-value ≤ 0.001 were considered in subsequent steps to find potential ceRNA pairs.

miRNA-mediated 'tug-of-war' model for predicting ceRNA and their strength
We devised a model based on the 'tug of war' of genes for the shared pool of miRNA that can precisely predict the possible ceRNA gene candidates (PCCs) responsible for modulating the expression of coexpressed genes indirectly through shared miRNA. Dynamic steady state of the cells precisely executes the processes of replication, transcription, and translation in a harmonic manner rejuvenating the cellular environment maintaining defined cellular fate. Here, the dynamic steady state is defined as the varying steady states of the tumour cells in the cancerous tissues. This is based on the observations that the primary tumour after attaining a steady state concentration in due time induces a secondary tumour which also attains a steady state concentration with time (Gatenby, 1995;Nishikawa et al., 1994).
We assume such a steady state system in the model to mimic the ceRNA mechanism in a cell and propose a discrete time model that is characterized by two stages of action (Fig. 2). In the initial stage, there is no interaction between the miRNA and target genes, and hence, the complete pool of miRNA is freely available for binding to different genes in the system. In the next stage, the available miRNA bind to preferred MREs on individual genes according to their inclination and the concentration of individual genes. These stages are recursive because the corresponding cell system will be back to the first stage after completion of the second stage when it is replenished with a fresh pool of unbound miRNA and transcripts. The propensity of miRNA binding to targets can be influenced by (a) relative expression of miRNA and target genes, (b) seed types (8mer, 7mer-m8, 7mer-A1, 6mer) (Grimson et al., 2007;Liu et al., 2013), (c) binding energy of the miRNA-target duplex, (d) alignment score of miRNA with target at the MRE and (e) conservation score of the MREs (Rennie et al., 2016). The seed types considerably influence the competition in binding to mRNA and the potency of repression (Agarwal et al., 2015;Chen et al., 2015;Denzler et al., 2016). Based on this, we define the weights of individual seed types to bring out their importance in calculating ceRNA propensity. We defined the binding affinity (b) of a miRNA towards a gene as the product of the features of miRNA-target duplexes.
where c, e, a and s are conservation score, binding energy, alignment score and seed type, respectively. We used a multiplicative combination that would ensure homogeneous changes in the binding affinity with changes in the features, that is proportional changes in the features will result in proportional changes in the binding affinity. The rate of miRNA-ceRNA target complex formation depends on the concentration and the relative binding affinity of the transcript. The relative binding affinity of a PCC with respect to its GOI (b PCCjGOI ) is defined as: where z is the coefficient of relative binding affinity and is found to be 2 as b g1|g1 should be 1. An advantage of calculating the relative binding affinity is to avoid the bias introduced by the different scales and their extreme values. In our miRNA-sponging model, we have integrated the expression values of targets with above targetbinding features, which will be useful to break the tie in cases of similar binding affinities. The expression value, E PCC , used is the average of the expressions of the PCC taken from all the samples. This term is termed as the total binding affinity (TBA) of a gene (denoted as B) and is used to predict PCCs of a GOI.
Here, PCC MREs are the MREs present on the PCC. The TBA of a PCC with respect to the GOI is the cumulative of the relative binding affinity of the MREs present on all the transcripts of the PCC. TBA is a measure of the rate of the complex formation expressed as the product of expression levels (E) and affinity of a gene towards a miRNA. Further, we have also included the expressions of miRNA in the model as it is a crucial factor in influencing the rate of degradation or inhibition of the genes. E miRNA is defined as the average of all the miRNA samples.
As the system is in a steady state, the time taken for formation of the duplex by both the genes of a ceRNA pair is equal. Therefore, Here, x is the part of the miRNA pool that is sequestered due to the TBA of the PCC. The equation (4) is reminiscent of the 'tug of war' between GOI and PCC for their shared miRNA with respect to their TBAs. In equation (4), the denominators are the rates at which the miRNA form duplexes while the numerators are the expression levels of miRNA. Thus, values on left-hand side and right-hand side of the equation (4) symbolize the time taken for formation of complex with PCC and GOI, respectively. We defined our score, SoCeR, by solving equation (4) and calculating the differences between the amount of miRNA binding to PCCs and GOIs. Mathematically, we present SoCeR (non-normalized) as: Theoretically, equation (5) gives the excess (or deficit) of amount of miRNA binding to PCC over GOI of a ceRNA pair sharing MREs, which is mathematically represented by a positive (or negative) score. The scores lie between À1 and +1 by normalizing with the highest absolute score for a particular GOI. A negative score implies that the GOI sequesters a higher number of miRNA in the pair, and thus, the GOI has higher ceRNA propensity in that system with respect to the same pool of miRNA. Similarly, a positive score suggests that the predicted gene has higher ceRNA propensity in that particular system. The magnitude of the score indicates the imbalance in sequestering strength of the genes of a ceRNA pair. A magnitude close to unity suggests the case of ceRNA dominance, while a score closer to zero (or exactly zero) indicates a state of ceRNA dependency (or no ceRNA activity) of the genes in the cell. The terms ceRNA domination and dependency are defined with respect to the power of individual ceRNA to influence the system. A dominant ceRNA can strongly influence its partner gene and thus, the system, whereas a dependent ceRNA has a weak influence on its partner gene and the combined effect of the pair is needed to influence the system. These extreme cases show a clear rift in the strength of the genes symbolizing weak ceRNA regulation either due to the inability of a gene to compete or due to the similarity in strength of the two genes to sponge miRNA.

Results and Discussion
Here, we have proposed a novel miRNA-mediated 'tug-of-war' sponging model and a well-defined score (SoCeR), which can accurately interpret all possible conditions while predicting ceRNAs in a system. Standalone applications, SoCeR (for Windows as well as Linux), have been developed to assist the scientists in performing similar analysis on different data sets and systems and is available at http://vvekslab.in/tools. html. Hereon forth, we use the notation GOI-PCC for the ceRNA pairs. SoCeR not only gives a measure of the sequestering capability of the GOI but can also be used for ranking the PCCs of a GOI. For example, in breast cancer, PTEN-PTENP1 is ranked 48th, and PTENP1-PTEN is ranked 4th when their ceRNA are sorted in ascending and descending manner, respectively. To test the robustness of SoCeR, we swapped the GOI and PCC of a ceRNA pair. As expected, we obtained a non-normalized score of the same magnitude but of opposite sign predicting the same conclusion as before. This ascertains the sturdiness of the score. Although the non-normalized SoCeR for a pair can be obtained by multiplying negative unity to the same of the opposite pair, it is not true for SoCeR as the latter gives the relative strength of a pair with respect to all of the ceRNA pairs of its GOI. Refer Table 1 to compare SoCeR of PTEN-PTENP1 and PTENP1-PTEN predicted across all cancers. Our model is further validated by considering different cases and interpreting their scores. We tested cases with respect to three possible conditions in a cell: (a) expression of miRNA is zero; (b) either or both, expression and affinity of a gene is zero; and (c) total binding affinity of a gene is greater than its ceRNA counterpart. The case with no miRNA in the system gives an obvious score of zero because a common pool of miRNA is absent, and hence, there would be no influence on either the genes or the system. The second condition can be regarded as a case of complete ceRNA dominance due to the inability of binding or unavailability of the other genes. The last case, as expected, gives a positive score if the PCC has a greater total binding affinity (than the GOI) symbolizing that more miRNA would be sequestered by the PCC. The above evaluations reveal SoCeR as a dependable score. Refer to Table S2 that enlists all the experimentally validated ceRNA predicted by our model in different cancers.

ceRNA profile across 32 cancers
The profiles of ceRNA across 32 cancers were predicted using our 'tug-of-war' model. We observed the highest percentage of ceRNA predicted in HNSC with 57.84% of all possible pairs followed by UCEC (53.01%) and OV (51.20%). Figure 3 gives the distribution of the percentage of ceRNA predicted in individual cancer types. From this, we can infer that mRNA-related ceRNA activity is highest in HNSC among all cancers, whereas lowest ceRNA activity is in KICH (42.84%). Interestingly, we noticed that our method can predict ceRNA that are already reported and validated in cancers (Table S2), which indicate the reliability of our method for predicting ceRNA of any GOI in any other cancers. Table 2 lists some of these pairs that can be justified in their sequestration strength. We have discussed some of these cases below in detail.

The ceRNA networks of PTEN
PTEN has been the subject of investigation by many researchers (Ioffe et al., 2012;Poliseno et al., 2010;Song et al., 2012;Sumazin et al., 2011;Yu et al., 2014) due to its tumour suppressing abilities across almost all known cancers that regulate the PI3K/AKT signalling pathway. The ceRNA network involving PTEN has been recorded in some cancers and speculated for others. The expression of PTEN is reported to be regulated by its pseudogene ceRNA, PTENP1 via their shared miRNA at the post-transcriptional level. Influenced by these, we used our model to find possible cases of the ceRNA crosstalks between PTEN and PTENP1 across 32 cancers (Table 1). We noticed that SoCeR is not only dependent on the number of common miRNA but also on the binding site features of the shared miRNA which are evident in the cancer types such as CHOL and KICH. Both the cancers have the same number of common miRNA shared between PTEN and PTENP1, yet CHOL endorses PTENP1 as a stronger ceRNA candidate of PTEN than KICH. Another befitting comparison that supported the above argument is the case of CHOL and READ. We observed that the absolute score for PTEN and PTENP1 is lesser in READ as compared to CHOL although a higher number of miRNA are shared in the former. Further, PTEN-PTENP1 pair was found to be strongly positively correlated in all cancers except acute myeloid leukaemia (LAML) where the correlation was found to be negative. Table S2 lists out the experimentally validated cases of PTEN interactions that have been predicted by our method. PTEN and CNOT6L predicted as a ceRNA pair by our method gives a negative score. This can be explained by the observations of Tay et al. (2011) which stated that a change in CNOT6L changes the expression levels of PTEN significantly, while a change in the expression levels of PTEN brings about significantly lesser change in CNOT6L. Thus, it can be concluded that CNOT6L sequesters majority of the common pool of miRNA as indicated by our score. Further, our model has predicted several other mRNA that may function as PTEN ceRNA and regulate PTEN expression in some of the cancers, which have already been reported experimentally elsewhere Poliseno et al., 2010;Qu et al., 2015;Tay et al., 2011;Yu et al., 2015). One such case is repression of PTEN by the oncomir miR-130b in oesophageal squamous cell carcinoma, which increases the proliferation and migration abilities of the tumour cells (Yu et al., 2015). SoCeR provides a negative score for the interaction of PTEN (GOI) and PTENP1 in this cancer mediated through miR-130b.   (He et al., 2015) In summary, the widespread occurrence of PTEN-PTENP1 ceRNA network in almost all cancers (Table 1) predicted in our study indicated a common regulatory activity of the tumour suppression pathway. Moreover, this case study authenticated our tug-of-war model by predicting already validated cases of the PTEN-centred ceRNA in cancers. Table S3 report the scores of all possible ceRNA of PTEN in breast cancer. Table S4 reports the shared list of miRNA along with their scores for the PTEN-PTENP1 interaction in breast cancer.

miRNA sponging by TSGs
While a cell in its normal condition is a quintessential blend of genes, the tumour-afflicted cell is the result of dysregulation of tumour suppressors among other reasons. SoCeR gives a measure of the miRNA-sponging ability (MSA) of a gene. A negative score would represent a higher MSA of the GOI or query gene. Similarly, a cumulative negative SoCeR would symbolize an overall downregulation of a gene in cancer on account of sequestering more miRNA than its counterparts. We selected three well-known TSGs: PTEN, TP53 and RB1 to infer the fate of their tumour suppressing activities in all 32 cancers. We obtained a negative score for these TSGs, propelling us to hypothesize that miRNA sponging involving TSGs could be a possible event promoting the formation of cancer cells. It is evident from Table 3 that PTEN, TP53 and RB1 are important TSGs active across all cancers as speculated.

Putative PTEN ceRNA regulate PTEN in human melanoma cells
Expression levels of PTEN can be modulated through miRNA by putative PTEN ceRNA . The study by Karreth et al., 2011 selected nine well-known PTEN ceRNA and silenced them in two different melanoma cell lines which resulted in attenuation of PTEN expression. We considered these nine candidates and pitted them individually against PTEN through our model. SoCeR of most of these pairs were found to be positive, revealing the ceRNA candidates have higher sequestering capability compared to PTEN (Table S2). This substantiates our model to be capable of predicting reported cases of ceRNA.
3.2.4. VCAN-CD34/FN1 ceRNA promoting oncogenesis VCAN was investigated by Fang et al. (2013) to explore its role in HCC. It was found to promote the proliferation, survival, migration and invasion of the cells by modulating the activities of miR-133a, miR-199a*, miR-144 and miR-431 which were sequestered by mRNA such as CD34 and FN1. Our method predicted VCAN-CD34 and VCAN-FN1 to be co-expressed and thus initially assumed as potential ceRNA pairs. Further, we noticed a positive SoCeR for both of these pairs with VCAN as the GOI, which indicated VCAN to be a stronger miRNA sponge compared to its partners. This observation supported the findings by Fang et al. about VCAN freeing CD34 and FN1 for protein translation through ceRNA mechanism by miR-431 sponging and promoting HCC. The other reported miRNA were absent in the mediation of the interactions in question as our method is dependent on data generated by other predictive tools such as mirSVR. Table S5 and S6 contain the scores of the shared pool of miRNA between VCAN-CD34 and VCAN-FN1, respectively.

EPOR and ERBB2 cooperatively regulate malignancy
Transfection of miR-125b by Ferracin et al. (2013) confirmed EPOR and ERBB2 crosstalk with each other through ceRNA mechanism. miR-125b is a wellknown tumour suppressor miRNA which has reduced expression in metastatic breast cancer. Ferracin et al. showed ERBB2 to act as a decoy of miR-125b and induce the expression of EPOR by reducing the amount of free cellular miR-125b. Our method also predicted ERBB2 to be the major miRNA sponge with a low positive score in breast cancer besides nine other cancers, for several miRNA but not miR-125b. Ferracin et al. realized the inability of ERBB2 alone to explain the sustenance of the malignant behaviour of breast cancer cells which was solved by including EPOR into the pool as ceRNA crosstalks. Thus, to maintain the malignancy, both the transcripts are functioning cooperatively. This brings out the true essence of our score as the low magnitude of SoCeR suggests a dependent behaviour of the ceRNA in regulating the cell state.

Comparison of predictions and threshold of SoCeR
The primary objective of our model is to predict the ceRNA propensity of the genes among the positively correlated pairs in a system. Moreover, it can also be used to predict ceRNA. Due to the unavailability of the negative class (putative non-ceRNA), we could not perform the receiver operating characteristic analysis to calculate sensitivity and specificity of our method. Alternatively, we computed the number of true positives (TP) and false negatives (FN) of our predictions. The validated sets of 34 ceRNA reported in three cancers (breast, prostate and brain) reported in the literature (Poliseno et al., 2010;Sardina et al., 2017;Sumazin et al., 2011;Tay et al., 2011) were used for comparison by calculating TP and FN. Of these, our method correctly predicted 27 pairs, while CERNIA, the most recent method which has been reported to outperform previous methods predicted 28 pairs (Fig. 4). This indicates our method is equally competitive with other methods. However, availability of more number of validated ceRNA in different cancers in future will reinforce this comparison. We have shown the statistics on the classes of genes in the predicted ceRNA pairs in Fig. S2 for five cancers.
In order to define a range of SoCeR that could be used as a threshold to find significant ceRNA pairs, we considered the validated ceRNA predicted by our model as well as CERNIA and assessed their SoCeRs. We recommend the minimum and maximum of the absolute values of SoCeR of these validated ceRNAs as the threshold. Therefore, we recommend the pairs whose absolute SoCeRs lie in the range of [0.00403, 0.88367]. The number of predicted pairs are cut down by a third when these cut-offs are enforced. Table S7 gives the percentages of the predicted pairs that fall in the suggested range.

Conclusions
In the current study, we devised a miRNA-sponging model for predictions of ceRNA of a GOI in 32 cancer types to complement the resources available for ceRNA, which is still in its nascent stage of research. This model is motivated by the lack of algorithms or tools for predicting which of the transcripts with shared miRNA will be the most potential ceRNA by defining SoCeR scores that range between À1 and 1. To the best of our knowledge, our model is the only model that is specialized to envisage potential of each transcript sharing the same set of miRNA/MREs to behave as ceRNA based on sequence, thermodynamic and conservation features of miRNA-target duplexes as well as cancer-specific co-expression patterns of genes and expression levels of shared miRNA. This method revealed the sponging ability of each of the transcripts expressed in a particular cancer type whose expression levels are obtained from RNA-Seq data available at TCGA. Further, our method includes targets of both conserved and nonconserved human miRNA predicted from Ago interaction sites of CLIP-Seq studies integrated into mirSVR of miRanda which enhances the reliability of our predictions by reducing the false miRNA targets in our miRNA-target duplex data sets. The method can be used for effective retrieval of ceRNA of GOI in specific cancer as well as in multiple cancers which will enable the researchers to compare ceRNA candidates of GOI in selected cancers and validate whether similar or dissimilar ceRNA regulatory mechanisms exist in these cancers. This will help to decipher common oncogenic processes operated by the same set of ceRNA in multiple cancers and will thus offer new opportunities to manipulate these common ceRNA networks through miRNA competition for their therapy, perhaps by adopting one strategy.
Our method demonstrated some limitations because of which some of the ceRNA reported previously by other labs were not detected by our method. Some of these are due to the dependency of our method on data retrieved from other resources such as TCGA and mirSVR. For example, our method failed to predict FOXO1-CDH1 as ceRNA reported earlier by Yang et al. (Yang et al., 2014). While investigating the reason for this miss, we surprisingly found their expressions to be negatively correlated. This is possibly due to anomalies in the expression data of these genes obtained from TCGA, which have prevented our method to predict these genes as a potential ceRNA pair. Moreover, we missed some miRNA in our prediction for few ceRNA pairs which are due to the absence of these miRNA in CLIP-verified targets available at mirSVR or TCGA.
As discussed previously, the ceRNA, although in its infancy, has the potential to regulate several biological processes, both in normal and cancer cells, and hence provides a new perspective to understand oncogenic processes. Thus, understanding the regulatory mechanisms and functions exhibited by them is very essential. Our method will help the researchers to gain an indepth insight into the myriad roles of ceRNA language in diverse cancer types and devise strategies for miRNA loss-of-function studies as a therapeutic measure. Yu

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Fig. S1. The overview of different classes of genes considered in our study. Fig. S2. The overview of different classes of ceRNA predicted by our model in five different cancers. Table S1. The abbreviations of cancer types considered in our study. Table S2. List of experimentally validated ceRNA predicted by our model in different cancers. Table S3. The probable ceRNA of PTEN predicted by our model in BRCA .  Table S4. List of shared miRNA of PTEN-PTENP1 pairs in BRCA along with their respective SoCeR. Table S5. The shared miRNA of VCAN-CD34 pairs in LIHC along with their corresponding scores. Table S6. The shared miRNA of VCAN-FN1 pair in LIHC along with their corresponding scores. Table S7. The percentage of predicted pairs lying in the cut-off range across cancers.