A six‐microRNA signature to predict outcomes of patients with gastric cancer

Gastric cancer (GC) is a common gastrointestinal tumor with poor prognosis. However, conventional prognostic factors cannot accurately predict the outcomes of GC patients. Therefore, there remains a need to identify novel predictive markers to improve prognosis. In this study, we obtained microRNA expression profiles of 385 GC patients from The Cancer Genome Atlas. We performed Cox regression analysis to identify overall survival‐related microRNA and then constructed a microRNA signature‐based prognostic model. The accuracy of the model was evaluated and validated through Kaplan–Meier survival analysis and time‐dependent receiver operating characteristic (ROC) curve analysis. The independent prognostic value of the model was assessed by multivariate Cox regression analysis. Enrichment analysis was performed to explore potential functions of the prognostic microRNA. Finally, a prognostic model based on a six‐microRNA (miRNA‐100, miRNA‐374a, miRNA‐509‐3, miRNA‐668, miRNA‐549, and miRNA‐653) signature was developed. Further analysis in the training, test, and complete The Cancer Genome Atlas set showed the model can distinguish between high‐risk and low‐risk patients and predict 3‐year and 5‐year survival. The six‐microRNA signature was also an independent prognostic marker, and enrichment analysis suggested that the microRNA may be involved in cell cycle and mitosis. These results demonstrated that the model based on the six‐microRNA signature can be used to accurately predict the prognosis of GC patients.

Gastric cancer (GC) is a common gastrointestinal tumor with poor prognosis. However, conventional prognostic factors cannot accurately predict the outcomes of GC patients. Therefore, there remains a need to identify novel predictive markers to improve prognosis. In this study, we obtained micro-RNA expression profiles of 385 GC patients from The Cancer Genome Atlas. We performed Cox regression analysis to identify overall survivalrelated microRNA and then constructed a microRNA signature-based prognostic model. The accuracy of the model was evaluated and validated through Kaplan-Meier survival analysis and time-dependent receiver operating characteristic (ROC) curve analysis. The independent prognostic value of the model was assessed by multivariate Cox regression analysis. Enrichment analysis was performed to explore potential functions of the prognostic microRNA. Finally, a prognostic model based on a six-micro-RNA (miRNA-100, miRNA-374a, miRNA-509-3, miRNA-668, miRNA-549, and miRNA-653) signature was developed. Further analysis in the training, test, and complete The Cancer Genome Atlas set showed the model can distinguish between high-risk and low-risk patients and predict 3-year and 5-year survival. The six-microRNA signature was also an independent prognostic marker, and enrichment analysis suggested that the microRNA may be involved in cell cycle and mitosis. These results demonstrated that the model based on the six-microRNA signature can be used to accurately predict the prognosis of GC patients.
Gastric cancer (GC) is one of the most common gastrointestinal malignant tumors. In 2015, 1 310 000 people were diagnosed with GC around the world and 810 000 patients died because of GC. The morbidity and mortality of GC ranked 5th and 3rd among all malignant tumors, respectively [1]. Due to atypical early symptoms, most patients are diagnosed with GC at an advanced stage and the median overall survival time is usually < 1 year [2,3]. On the other hand, although some patients have received radical surgery, up to 37%-48% of them died from recurrence or metastasis [4]. Therefore, the prognosis of GC is poor and it is very important and essential to improve early diagnosis and perform appropriate and individualized therapies based on prognosis. AJCC TNM staging system is a conventional prognostic indicator. However, it is sometimes difficult to obtain an accurate stage in clinical practice for several reasons, such as < 15 lymph nodes dissection and failure to remove the tumor completely. Moreover, the AJCC staging system could not distinguish some patients at the same stage but with different survival time [5,6]. In the genomic era, the most likely explanation is the molecular heterogeneity of the patients within the same stage group. Recently, several novel molecular classification schemas of GC have been proposed according to the heterogeneous molecular characteristics [7,8]. Logically, it is also necessary and crucial to develop a novel prognostic model based on molecular characteristics to predict the outcome of patients with GC.
microRNA are a group of small noncoding RNA consisting of approximate 22 nucleotides. It has been demonstrated that one microRNA can regulate expression levels of multiple mRNA to exert its biological functions by participating in the degradation of mRNA or by inhibiting the translation of mRNA [9,10]. A number of studies have shown that micro-RNA are involved in proliferation [11], apoptosis [12,13], differentiation [14,15], invasion [16,17], and migration [18] of GC cells. Moreover, several studies have reported that some microRNA can also affect the survival of patients with GC [19,20]. Consequently, it is feasible to construct a prognostic model based on expression profiles of microRNA.
In this study, we developed a prognostic model of GC based on six-microRNA expression signature by using The Cancer Genome Atlas (TCGA) highthroughput sequencing data of microRNA. The six-microRNA expression signature was associated with overall survival and can predict 3-and 5-year overall survival of patients with GC. Moreover, it was also an independent prognostic factor.

Genetic and clinical data acquisition and processing
Genetic and clinical data of patients with GC were obtained from TCGA (http://cancergenome.nih.gov/). Genetic data included microRNA and mRNA expression levels for each patient, and clinical information included age, gender, pathological stage, histological grade, survival status, and overall survival time. microRNA and mRNA expression levels were measured by log (RPM + 1) and log (FPKM + 1), respectively. microRNA that were not expressed in more than 50% of patients were removed. The patients were randomly divided into two groups which served as the training set and test set by sampling package in R program (v3.5.0, The R Foundation, Vienna, Austria), and the survival status of patients balanced between the two sets.

Statistical analysis
Univariate and multivariate Cox regression analyses were used to identify the survival-related microRNA in the training set. Then, the prognostic model based on the survival-related microRNA was constructed according to Cox regression model, in which the regression coefficients represented the weights of micorRNA expression levels. The risk score of each patient was calculated by the sum of weighted expression levels of microRNA. The patients in each set were classified to the high-risk group and low-risk group using the median risk score in the training set as a cutoff value. Kaplan-Meier survival analyses by log-rank test were used to compare the overall survival of patients in the two groups, and univariate Cox regression analyses were used to calculate hazard ratios (HR) between the two groups. Time-dependent receiver operating characteristic curve (ROC curve) analyses were performed to evaluate the sensitivity and specificity of the prognostic model to predict 3-and 5-year overall survival in each set by survival ROC [21] package in R program. In addition, multivariate Cox regression analyses were used to determine whether the microRNA signature was an independent prognostic marker.

Function enrichment analysis
Since microRNA exert their biological activities through trans-regulating mRNA, the expression correlations between microRNA and mRNA were analyzed by Pearson's correlation test. mRNA with correlation coefficients value < À0.3 and P < 0.05 were identified as target genes of microRNA. Subsequently, gene ontology (GO) in cell component (CC), molecular function (MF) and biological process (BP) categories, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed and visualized by clusterprofiler [22] package in R program. P < 0.05 was considered to be significant.

Preparation of genetic and clinical data
Genetic and clinical data of 385 patients with gastric adenocarcinoma were downloaded from the TCGA database. They were randomly assigned to the training set (n = 192) and test set (n = 193). There were no statistically significant differences in age, gender, pathological stage, histological grade, and survival status between the two sets (Table 1). After removing genes unexpressed in more than half of the samples, 566 out of total 1046 microRNA were further analyzed (Table S1-S3).

Development of prognostic model in the training set
In the training set, by univariate Cox regression analysis, we found that the expression levels of 46 micro-RNA were related to the overall survival time of patients (Table S4). Subsequently, by multivariate Cox regression analysis, we found that the expression levels of six in the 46 microRNA were related to the overall survival of patients (Table 2). They were independent prognostic factors of GC patients. Among them, miRNA-100, miRNA-653, and miRNA-668 were risk genes, while miRNA-374a, miRNA-509-3, and miRNA-549 were protective genes.
To construct a prognostic model, multivariate Cox regression analysis was performed on the six micro-RNA with independent prognostic value, and the weight of each microRNA expression level in the predictive model was obtained according to the regression coefficient. The risk score was defined as follows: Risk score = (0.336*expression level of miRNA-100) + (À0.777*expression level of miRNA-374a) + (À0.578*expression level of miRNA-509-3) + (À0.487*expression level of miRNA-549) + (0.618*expression level of miRNA-653) + (1.223* expression level of miRNA-668).
Based on this model, the risk score of each patient was calculated and there were 96 patients in the highrisk group and 96 patients in the low-risk group in the training set using the median risk score of patients in the training set as cutoff value. Kaplan-Meier survival analysis by log-rank test demonstrated that there was a significant difference between the two groups. Patients in the low-risk group tended to have longer overall survival time than those in the high-risk group (P < 0.001, Fig. 1A). The univariate Cox regression analysis indicated that the HR of high-risk group versus low-risk group was 3.154 (95% CI: 1.899-5.24, P < 0.001, Table 3). Furthermore, time-dependent ROC analysis of the six-microRNA signature showed that the area under the ROC curve (AUC) reached 0.759 and 0.821 to predict 3-and 5-year survival (Fig. 1B). Therefore, the six-microRNA signaturebased model can predict the prognosis of patients.
Validation of the prognostic model in testing and entire TCGA set To assess the predictive value of this model, we further validated the six-microRNA signature in the test set. By using the same risk score calculation method, the 193 patients in the test set were divided into the highrisk group (n = 87) and low-risk group (n = 106) according to the same cutoff value as used in the training set. The result of Kaplan-Meier survival analysis was consistent with that in the training set. The patients in the low-risk group tend to have longer overall survival time than those in the high-risk group (P = 0.023, Fig. 2A). The HR of the high-risk group versus the low-risk group was 1.699 (95% CI: 1.07-2.698, P = 0.025, Table 3) according to univariate Cox regression analysis. The AUC in time-dependent ROC analysis was 0.708 at 3-year survival and 0.729 at 5year survival (Fig. 2B). These results showed that the model also performed well in the test set.
To further verify the robustness of the prognostic model, the six-microRNA signature was tested in the entire TCGA set. By using the same risk cutoff criteria as above, the patients in the entire TCGA set were classified into the high-risk group (n = 183) and lowrisk group (n = 202). Similar result of Kaplan-Meier survival analysis by log-rank test was observed. The patients in the low-risk group tended to have better overall survival than those in the high-risk group (P < 0.001, Fig. 3A). The univariate Cox regression analysis showed that the HR of the high-risk group versus the low-risk group was 2.3 (95% CI: 1.646-3.216, P < 0.001, Table 3). Time-dependent ROC analyses illustrated that the AUC of the prognostic model to predict 3-and 5-year survival was 0.71 and 0.789 (Fig. 3B). These analyses on the entire TCGA set confirmed the robustness of the six-microRNA signature.

Assessment of independence value of the six-microRNA signature
To assess the independent prognostic value of six-microRNA signature, multivariate Cox regression analyses were performed. The consistent results in the training, test, and entire TCGA set showed that pathological stage, age, and the six-microRNA   Fig. 4A,B). These results demonstrated that the six-microRNA signature was an independent prognostic marker of GC patients and superior to pathological stage and age.

Function pathway enrichment analysis of the six microRNA
To explore potential functions of these six microRNA, 978 co-expressed mRNA, which may be the target genes of the microRNA, were identified by Pearson's correlation test. GO enrichment analysis of the coexpressed mRNA suggested that chromosome, centromeric region, ATPase activity, and mitotic nuclear division were the most significantly enriched CC, MF, and BP categories ( Fig. 5A-C, Table S5). KEGG pathway enrichment analysis indicated that cell cycle was the most significantly enriched pathway (Fig. 5D, Table S6). In addition, these mRNA also functioned as microtubule binding, tubulin binding, etc., which have been proved to be related to cell proliferation. They were also involved in some cancer-related biologic processes or signal pathways such as cell cycle phase transition, cell cycle checkpoint, regulation of cell division, and p53 signal pathway.

Discussion
In the present study, we identified six survival-related microRNA in patients with GC by Cox regression model and proposed a six-microRNA signature-based prognostic model. The model can distinguish the patients of GC with poor and good prognosis, and the ROC curve analysis showed that the AUC of the model to predict 3-or 5-year overall survival was > 0.7. In addition, according to the multivariate Cox regression analysis, the six-microRNA signature was also an independent prognostic marker. These results, which were validated in the training set, test set, and entire TCGA set, illustrated that the model based on six-microRNA signature was robust to predict the outcomes of patients with GC. There have been several similar studies which developed prognostic models of GC depending on molecular profiles. Tow studies have constructed prognostic models based on the mRNA signature. However, both the sample sizes were relatively small [23,24]. Another study conducted by Wang et al. [25] built a model based on a nine-mRNA signature to predict the prognosis of GC patients. It can distinguish patients with high risk or low risk in a cohort but cannot predict the prognosis of a single patient, because the evaluation method was based on median gene expression levels of the cohort. Recently, noncoding RNA were also used to construct prognostic models. Miao et al. [26] proposed a four lncRNA-based prognostic model of GC. However, the AUC of time-dependent ROC curve to predict 5-year overall survival was < 0.7. Another study [27] developed a microRNA-based model, but it did not evaluate the prognostic value on predicting 3-and 5-year survival. Compared with these studies, the model in the current study can distinguish the patients with poor or good prognosis, and it also performed well in predicting 3-and 5-year survival.
In our study, six microRNA were identified to be associated with overall survival. The enrichment analyses revealed that the target mRNA of them took part in process of cell cycle, mitosis, p53 signal pathway,  etc. These results can explain why the six microRNA were related to the prognosis of patients. On the other hand, most of the microRNA have been found to be related to tumors. Among these microRNA, miRNA-100 and miRNA-374a were the most frequently studied microRNA. Nevertheless, controversial results about their roles in tumors have been reported. miRNA-100 was upregulated in patients with diffusetype GC and related to the depth of invasion, lymph node metastasis, and stage [28]. On the contrary, another study showed that miRNA-100 could promote apoptosis of GC cell through Notch-apoptosis pathway and improve the sensitivity of GC cells to chemotherapy [29]. miRNA-374a could promote proliferation, migration, and invasion of GC cells through downregulating SRCIN1 while inhibit proliferation, invasion, migration, and intrahepatic metastasis of colon cancer cells by targeting CCND1 [30,31]. In our study, miRNA-100 was a risk microRNA and miRNA-374a was a protective microRNA. These inconsistent results may be due to the different tumor types or microenvironments such as in vitro and in vivo. miRNA-509-3 has been previously identified as a tumor suppressor gene in lung cancer [32], ovarian cancer [33], hepatoma [34], leukemia [35], renal cell carcinoma [36], and GC [37]. It was also an independent prognostic biomarker in GC patients. These findings were consistent with ours. miRNA-668 might play a role of oncogene [38] and could be associated with radioresistance in breast cancer [39]. In our study, similar results were found and showed that miRNA-668 was a risk gene in GC. To date, there have been no direct studies focusing on the relationships between miRNA-549 or miRNA-653 and tumors. However, our study showed that miRNA-549 was a protective microRNA and miRNA-653 was a risk microRNA, which deserve further study.
In summary, our study identified six survival-related microRNA (miRNA-100, miRNA-374a, miRNA-509-3, miRNA-668, miRNA-549, and miRNA-653) in GC patients and developed a prognostic prediction model. The model can be utilized to predict the risk of death and 3-and 5-year overall survival for patients with GC. Moreover, the six-microRNA signature of the model was also a novel independent molecular prognostic biomarker. These results will contribute to individualized therapies for GC patients. 36

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Training set. Table S2. Test set. Table S3. Clinical information. Table S4. Survival-related microRNA according to univariate Cox regression analysis. Table S5. GO enrichment analysis of the co-expressed mRNA. Table S6. KEGG pathwy enrichment analysis of the co-expressed mRNA.