Identification of nine microRNAs as potential biomarkers for lung adenocarcinoma

Lung cancer is a leading global cause of cancer‐related death, and lung adenocarcinoma (LUAD) accounts for ~ 50% of lung cancer. Here, we screened for novel and specific biomarkers of LUAD by searching for differentially expressed mRNAs (DEmRNAs) and microRNAs (DEmiRNAs) in LUAD patient expression data within The Cancer Genome Atlas (TCGA). The identified optimal diagnostic miRNA biomarkers were used to establish classification models (including support vector machine, decision tree, and random forest) to distinguish between LUAD and adjacent tissues. We then predicted the targets of identified optimal diagnostic miRNA biomarkers, functionally annotated these target genes, and performed receiver operating characteristic curve analysis of the respective DEmiRNA biomarkers, their target DEmRNAs, and combinations of DEmiRNA biomarkers. We validated the expression of selected DEmiRNA biomarkers by quantitative real‐time PCR (qRT‐PCR). In all, we identified a total of 13 DEmiRNAs, 2301 DEmRNAs and 232 DEmiRNA–target DEmRNA pairs between LUAD and adjacent tissues and selected nine DEmiRNAs (hsa‐mir‐486‐1, hsa‐mir‐486‐2, hsa‐mir‐153, hsa‐mir‐210, hsa‐mir‐9‐1, hsa‐mir‐9‐2, hsa‐mir‐9‐3, hsa‐mir‐577, and hsa‐mir‐4732) as optimal LUAD‐specific biomarkers with great diagnostic value. The predicted targets of these nine DEmiRNAs were significantly enriched in transcriptional misregulation in cancer and central carbon metabolism. Our qRT‐PCR results were generally consistent with our integrated analysis. In summary, our study identified nine DEmiRNAs that may serve as potential diagnostic biomarkers of LUAD. Functional annotation of their target DEmRNAs may provide information on their roles in LUAD.

Lung cancer is a leading global cause of cancer-related death, and lung adenocarcinoma (LUAD) accounts for~50% of lung cancer. Here, we screened for novel and specific biomarkers of LUAD by searching for differentially expressed mRNAs (DEmRNAs) and microRNAs (DEmiRNAs) in LUAD patient expression data within The Cancer Genome Atlas (TCGA). The identified optimal diagnostic miRNA biomarkers were used to establish classification models (including support vector machine, decision tree, and random forest) to distinguish between LUAD and adjacent tissues. We then predicted the targets of identified optimal diagnostic miRNA biomarkers, functionally annotated these target genes, and performed receiver operating characteristic curve analysis of the respective DEmiRNA biomarkers, their target DEmRNAs, and combinations of DEmiRNA biomarkers. We validated the expression of selected DEmiRNA biomarkers by quantitative real-time PCR (qRT-PCR). In all, we identified a total of 13 DEmiRNAs, 2301 DEmRNAs and 232 DEmiRNA-target DEmRNA pairs between LUAD and adjacent tissues and selected nine DEmiRNAs (hsa-mir-486-1, hsa-mir-486-2, hsa-mir-153, hsa-mir-210, hsa-mir-9-1, hsa-mir-9-2, hsa-mir-9-3, hsa-mir-577, and hsamir-4732) as optimal LUAD-specific biomarkers with great diagnostic value. The predicted targets of these nine DEmiRNAs were significantly enriched in transcriptional misregulation in cancer and central carbon metabolism. Our qRT-PCR results were generally consistent with our integrated analysis. In summary, our study identified nine DEmiRNAs that may serve as potential diagnostic biomarkers of LUAD. Functional annotation of their target DEmRNAs may provide information on their roles in LUAD.
Lung cancer is the most frequently diagnosed cancer and a leading cause of cancer-related death worldwide [1]. Approximately 4/5 diagnosed lung cancers are non-small-cell lung cancer (NSCLC), and lung adenocarcinoma (LUAD) is the most common histological subtype of NSCLC, accounting for~50% of lung cancer [2,3]. Due to the lack of effective diagnostic tools, most LUAD patients are diagnosed at the later stages, and the 5-year survival rate of LUAD has shown no significant improvement over the past few decades [4]. Hence, it is essential to identify the novel, non-invasive, and specific biomarkers for the diagnosis of LUAD.
The Cancer Genome Atlas (TCGA) is a central bank for multidimensional experimental cancer data that contributes to uncovering the molecular mechanisms of cancer. In the present study, we used TCGA to obtain and further analyze the miRNA and mRNA expression data of LUAD patients. The key miRNAs with great diagnostic value were identified and their roles in LUAD were further revealed by investigating the function of their target genes.

Identifying miRNAs and mRNAs to distinguish tumor from normal
Based on the read count of each sample, the differentially expressed miRNAs (DEmiRNAs) and mRNAs (DEmR-NAs) in LUAD compared to normal tissues were calculated via the R-bioconductor package DESEQ2 (http://bioconduc tor.org/packages/DESeq2/). We performed multiple comparisons by using the Benjamini and Hochberg method to obtain the false discovery rate (FDR). The threshold was FDR < 0.05 and |log 2 fold change| > 4 for DEmiRNAs. In order to gain an overview of the characteristics of the miRNA expression profile, a heat-map was further generated by hierarchical clustering analysis based on the normalized expression values of all DEmiRNAs using the R package (https://www.r-project.org/). For DEmRNAs, the threshold was defined as FDR < 0.05 and |log 2 fold change| > 1.5.

Statistics for classification and prediction
To identify optimal diagnostic miRNA biomarkers for LUAD, we performed a feature selection procedures as follows. Firstly, the importance value of each DEmiRNA was ranked according to the mean decrease in accuracy using the random forest analysis. Then, the optimal number of features was found by subsequently adding one DEmiRNA at a time in a top-down forward-wrapper approach. The diagnostic odds ratio (DOR) is defined as the ratio of the odds of the positivity of a diagnostic test among a diseased population relative to that in the non-diseased population. DOR was assessed by using the support vector machine (SVM) at each increment, and the optimal diagnostic miRNA biomarkers for LUAD were identified.
These optimal DEmiRNAs with diagnostic value for LUAD were used to establish classification models including decision tree, the SVM model, and random forest to better distinguish LUAD and adjacent tissues. The decision tree model was established by using the RPART package (https://cran.r-project.org/web/packages/rpart/). The SVM model was established by using the e1071 package in R, and the random forest model was established by using the 'RANDOMFOREST' package (https://cran.r-project.org/web/pac kages/randomForest/). The three kinds of classification models were compared by the average misjudgment rates of their 10-fold cross-validations. Diagnostic ability of classification prediction was evaluated by obtaining the area under a receiver operating characteristic (ROC) curve (AUC) and DOR. A heat-map of the optimal DEmiRNAs with diagnostic value for LUAD was generated by hierarchical clustering analysis by using the R package.

Identifying mRNA targets of miRNA
To uncover the potential roles of the identified optimal DEmiRNA diagnostic biomarkers, we predicted their targets. Considering the opposite trend in the expression of miRNA and their targets, we screened the significant negatively co-expressed DEmiRNA-DEmRNA pairs by pairwise Pearson correlation coefficients. DEmiRNA-DEmRNA pairs with P < 0.05 and r < 0 were defined as significant negative co-expression pairs. The putative targets of DEmiRNAs were predicted by six bioinformatic algorithms (RNA22, miRanda, miRDB, miRWalk, PIC-TAR2 and Targetscan) of MIRWALK2.0 (http://zmf.umm. uni-heidelberg.de/apps/zmf/mirwalk2/mir-mir-self.html). The targets recorded by more than four algorithms were served as target mRNAs of miRNAs. Moreover, the confirmed targets of miRNAs obtained by MIRWALK2.0 were served as target mRNAs of miRNAs as well. Finally, significant negatively co-expressed DEmiRNA-DEmRNA pairs overlapped with miRNA-target mRNA pairs were used to construct the DEmiRNA-DEmRNA interaction network by using CYTOSCAPE software (http:// www.cytoscape.org/).

Functional annotation of miRNA targets
To uncover the biological functions and detect the potential pathways of target DEmRNAs of DEmiRNAs, the online software GENECODIS was used to perform the functional annotation, including gene ontology (GO) classification (molecular functions, biological processes, and cellular component) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Statistical significance was defined as FDR < 0.05.

Confirmation of differentially expressed miRNAs and mRNAs
Ten LUAD tumor tissues and 10 normal adjacent tissues were obtained from 10 patients (P1-P10) who were diagnosed as LUAD based on pathological analysis. Patient characteristics are displayed in Table 1. We obtained written informed consent from the patients and approval from the ethics committee of Chinese PLA General Hospital. The study conformed to the Declarations of Helsinki. Total RNA was extracted with the TRIzol reagent (Invitrogen, Shanghai, China). One microgram RNA was used to synthesize cDNA using SuperScriptÒ III Reverse Transcriptase (Invitrogen). Quantitative real-time PCR (qRT-PCR) was performed with Power SYBRÒ Green PCR Master Mix (Applied Biosystems, Carlsbad, CA, USA) in the ABI7500 real-time PCR system (Applied Biosystems). And the reverse transcriptions of miRNAs were performed using the miScript II RT Kit (Qiagen, Hilden, Germany). qRT-PCR was performed with miScript SYBR Green PCR Kit (Qiagen). Relative gene expression was analyzed using the 2 ÀDDCt method. Statistical significance was assessed by one-way ANOVA. The human 18S rRNA and U6 were used as endogenous controls for mRNA and miRNA expression in the analysis.

Identification of differentially expressed miRNAs and mRNAs between LUAD and adjacent tissues
The mRNA expression profiles of 533 patients with LUAD and 59 normal adjacent tissues were obtained. The median age of these 533 patients was 67 years. Female and male patients account for 53.7% and 46.3% of the population, respectively. The information of the TNM stage was as follows: T1N0M0-T1N2M1 (32.8%), T2N0M0-T2N2M1 (53.8%), T3N0M0-T3N2M1 (9.2%) and T4N0M0-T1N3M1 (3.6%). A total of 19 450 mRNAs were detected. mRNAs with read count value = 0 in more than 20% (n = 107) of tumors or in more than 20% (n = 12) of adjacent tissues were considered to be low/not expressed mRNAs.
The miRNA expression profiles of 519 patients with LUAD and 46 normal adjacent tissues were obtained. The median age of these 519 patients was 67 years. Female and male patients account for 53.6% and 46.4% of the population, respectively. The information of the TNM stage was as follows: T1N0M0-T1N2M1 (33.3%), T2N0M0-T2N2M1 (53.0%), T3N0M0-T3N2M1 (9.4%) and T4N0M0-T1N3M1 (3.7%). A total of 1881 miRNAs were detected. miRNAs with read count value = 0 in more than 60% (n = 311) of tumors or in more than 60% (n = 28) of adjacent tissues were considered to be low/not expressed miRNAs.
After filtering the low/not expressed mRNAs or miR-NAs, 16 043 mRNAs and 559 miRNAs were retained for analysis, respectively. A total of 13 DEmiRNAs (three down-regulated and 10 up-regulated miRNAs) between LUAD and normal tissues were identified with FDR < 0.05 and |log 2 fold change| > 4. A total of 2301 DEmRNAs (933 down-regulated and 1368 up-regulated mRNAs) between LUAD and normal tissues were identified with FDR < 0.05 and |log 2 fold change| > 1.5. Hierarchical clustering analysis of DEmiRNAs and the top 100 DEmRNAs is displayed in Fig. 1A and B, respectively.

Identification of optimal diagnostic miRNA biomarkers for LUAD
To identify the optimal diagnostic miRNA biomarkers for LUAD, we performed random forest feature selection and classification (SVM, decision tree and random forest) procedures. All DEmiRNAs were ranked according to the standardized drop in prediction accuracy ( Fig. 2A). Then, we compared the DOR increment for a specific number of miRNAs by subsequently adding one miRNA at a time in a topdown forward-wrapper approach. We found that the DOR of nine DEmiRNAs reached the highest point for the first time (Fig. 2B). Hence, these nine DEmiRNAs were defined as the optimal diagnostic miRNA biomarkers for LUAD (Table 2). These nine optimal DEmiRNAs with diagnostic value for LUAD were used to establish classification models including decision tree, SVM and random forest. The 10-fold cross-validation indicated that the misjudgment rate of SVM, decision tree and random forest was 5.12%, 1.59% and 1.42%, respectively. This result suggested that the random forest model, with the smallest average misjudgment rate, could effectively predict LUAD. Based on the classification model of random forest, the DOR of these nine DEmiRNAs was 7735.1; the ROC curves are displayed in Fig. 2C and hierarchical clustering analysis of the nine DEmiRNAs is displayed in Fig. 2D.

Functional enrichment analysis of miRNA targets
The 116 target DEmRNAs of the DEmiRNAs were used to conduct the GO and KEGG enrichment analysis. GO enrichment analysis (Table 3) revealed that the miRNA targets were significantly enriched in circulatory system development (P < 0.05), plasma membrane region (P < 0.05), caveola (P < 0.05) and regulation of synapse assembly (P < 0.05). KEGG pathway  enrichment analysis (Table 3) showed that transcriptional misregulation in cancer (P-value < 0.05) and central carbon metabolism in cancer (P-value < 0.05) were two significantly enriched pathways.

ROC analyses
By using the PROC package in the R language, we performed ROC analyses to assess the diagnostic value of DEmiRNAs and their targets. The AUC under binomial exact confidence interval was calculated and a ROC curve was generated.

ROC curve analysis
We performed ROC curve analyses and calculated the AUC to assess the discriminatory ability of these nine DEmiRNAs and six differentially expressed targets between LUAD and adjacent tissues. The expression levels of these nine DEmiRNAs and six differentially expressed targets between LUAD and adjacent tissues are displayed in Fig. 4.

Confirmation of differentially expressed miRNAs and mRNAs
Quantitative real-time-PCR of 10 pairs of LUAD and the adjacent tissues were used to verify the expression of five DEmiRNAs (hsa-mir-210-3p, hsa-mir-486-3p, hsa-mir-153-3p, hsa-mir-9-5p, hsa-mir-577). Based on TCGA, hsamir-486-3p was down-regulated while the other four DEmiRNAs were up-regulated in LUAD compared to adjacent tissues. According to the qRT-PCR results, all the five DEmiRNAs were up-regulated, which was consistent with the results for TCGA, generally (Fig. 6).

Discussion
Lung adenocarcinoma is the most frequent subtype of lung cancer, with high global incidence and mortality   CEACAM1, CAV1, EMP2, ERG, MYH11, NTRK3, SIX1,  SOX4, TAL1, TGFBR2, THBS2, BASP1, FLRT3, TENM4,  RHOJ, TMEM204 [14]. It is essential to explore accurate and specific biomarkers for it. Accumulated evidence indicates that miRNAs play crucial roles in the progress of LUAD. However, research into the diagnostic and prognostic value of miRNAs is currently in its infancy.
In this study, we performed genome-wide analysis of miRNAs and mRNAs in a large number of patients with LUAD from TCGA. Altered miRNA expression patterns were found between LUAD and adjacent tissues, suggesting a potential diagnostic role for miRNA for LUAD. To identify the optimal LUAD-specific miRNA biomarkers, we searched for combinations of miRNAs among the 13 DEmiRNAs whose expression pattern may distinguish LUAD from adjacent tissues with the highest DOR by using random forest feature selection. A nine-miRNA combination including three down-regulated miRNAs (hsa-mir-486-1, hsa-mir-486-2, and hsa-mir-4732) and six up-regulated miRNAs (hsa-mir-210, hsa-mir-153-2, hsa-mir-9-1, hsa-mir-9-2, hsa-mir-577, and hsa-mir-9-3) were served as optimal LUAD-specific miRNA biomarkers. The 10-fold crossvalidation of three kinds of models (decision tree, SVM, and random forest) suggested that the random forest model, with the smallest average misjudgment rate of 1.42%, could effectively predict LUAD. Moreover, the AUC of all these nine DEmiRNAs in LUAD was more than 0.85. Taken together, our study has demonstrated the feasibility and potential diagnostic value of these nine miRNA biomarkers in LUAD.
Among them, hsa-mir-486-1 and hsa-mir-486-2 were two hub DEmiRNAs based on the DEmiRNA-DEmRNA interaction network, suggesting their importance in LUAD. Down-regulation of mir-486 has been demonstrated to be involved in various cancers including LUAD [15][16][17]. Moreover, mir-486 was reported to be cytotoxic in A549 LUAD cells by repressing the expression of bone morphogenetic protein-2 reporter gene [18]. Hsa-mir-153 was found to be down-regulated in NSCLC tissues and cell lines; it inhibits migration and invasion of NSCLC by targeting ADAM19 [19] and protein kinase B (AKT) [20]. These three miRNAs were speculated to serve as tumor suppressors in LUAD.
The other six DEmiRNAs were all up-regulated in LUAD compared to adjacent tissue based on our study, which suggested their potential oncogenic roles in LUAD. Hsa-mir-210 was the most up-regulated miRNA in LUAD compared to adjacent tissue based on our study, which was consistent with previous studies [21,22]. Moreover, up-regulation of hsa-mir-210 was found to be closely associated with distant metastases of LUAD [21]. Both hsa-mir-9 and hsa-mir-577 were also found to be up-regulated in human NSCLC tissues and cell lines compared to normal lung tissues  [16,23,24]. Hsa-mir-9 was reported to play an oncogenic role in the proliferation, invasion, and migration of NSCLC cells through mechanisms such as regulating SOX7 [23], eukaryotic translation initiation factor 5A2 [25], FoxO1 [26], epithelial-to-mesenchymal transition and the signal transduction pathway [27]. Hsamir-577 could play a role in the cell proliferation of esophageal squamous cell carcinomas and glioblastoma multiforme by regulating a tumor-associated antigen, testis specific 10 [28], and the Wnt signaling pathway [29], respectively. However, the precise role of mir-577 in LUAD remains unknown.
Although there is no study that reports an association between hsa-mir-4732 and LUAD, hsa-mir-4732 was indicated to be associated with breast cancer [30]. Our study is the first to report the down-regulation of hsa-mir-4732 in LUAD compared to adjacent tissues.
To deeply research the biological functions of these DEmiRNAs in LUAD, we conducted a functional annotation of the target DEmRNAs of these DEmiR-NAs. Two significantly enriched pathways in LUAD and their related DEmRNAs, namely transcriptional misregulation in cancer (TGFBR2, FLI1, ERG and SIX1) and central carbon metabolism (NTRK3 and SLC7A5), were speculated to play key roles in LUAD regulated by DEmiRNAs. TGFBR2 was a putative tumor suppressor gene in the TGF-b signaling pathway. Loss of Tgfbr2 in a K-ras-induced LUAD mouse model has been found to induce a highly invasive phenotype associated with lymph node metastasis and reduced survival. In our study, TGFBR2 was downregulated in LUAD tissues compared to adjacent tissues, which was in agreement with previous study. Both FLI1 and ERG are members of erythroblast transformation-specific oncogene family. FLI1 was down-regulated in squamous lung cancer [31] and could up-regulate TGFBR2 in LUAD and might be a potential regulator of LUAD [32]. Overexpressed ERG variant 2 was found in lung tumors compared to adjacent tissues, which suggested that ERG might exert an oncogenic effect in lung tumors through functions encoded by variant 2 [33]. Aberrantly methylated NTRK3 has been demonstrated to be involved in various cancers including colorectal cancer and lung cancer [34,35]. Poor survival in lung cancer was closely associated with NTRK3 [36]. TGFBR2, FLI1, ERG and NTRK3 were shared DEmRNAs of hsa-mir-9-1, hsa-mir-9-2 and hsa-mir-9-3, which suggested that mir-9 might play a crucial role in LUAD by regulating these four lung cancer-related genes. The SIX family was demonstrated to play crucial roles in the tumorigenesis of NSCLC and was associated with the prognosis of patients with NSCLC [37]. As a member of the SIX family, Six1 was found to be up-regulated in LUAD and contributes to preinvasiveto-invasive adenocarcinoma progression by inducing epithelial-mesenchymal transition and nuclear atypia [38]. The protein SLC7A5 is part of a two-protein complex with SLC3A2 and the gene was hypermethylated and its expression increased in lung cancer [35]. Moreover, SLC7A5 was also reported to be differentially expressed between adenocarcinoma and squamous cell lung cancers [39,40]. Both SLC7A5 and Six1 were target genes of hsa-mir-486 and hsa-mir-4732, respectively. We speculate that hsa-mir-486-SLC7A5 and hsa-mir-4732-Six1 interactions might also be involved in the pathogenesis of LUAD. Moreover, the ROC analysis of the present study indicated that all these six targets have great diagnostic value for LUAD and might be potential biomarkers of LUAD as well.
In conclusion, our study identified nine DEmiRNAs that serve as potential biomarkers of LUAD. Functional annotation of their target DEmRNAs in LUAD provided new clues for exploring their precise roles in LUAD. Further validation studies in prospective datasets are needed to test the predictive power for diagnosis before this is applied clinically.