Identification of modules and novel prognostic biomarkers in liver cancer through integrated bioinformatics analysis

In this study, we constructed nine coexpression modules and identified that miR‐105‐5p, miR‐767‐5p, miR‐1266‐5p, miR‐4746‐5p, miR‐500a‐3p, miR‐1180‐3p and miR‐139‐5p were differentially expressed in liver cancer. These miRNAs exhibited a significant association with prognosis of patients with liver cancer. Among them, miR‐105‐5p and miR‐139‐5p may be considered as independent prognostic factors. Thus, seven miRNAs could be prognostic biomarkers for liver cancer.

Liver cancer is a common malignant tumor with poor prognosis. Due to the lack of specific clinical manifestations at early stages, most patients are already at advanced stages of the disease by the time of diagnosis. Identification of novel biomarkers for liver cancer may thus enable earlier detection, improving outcome. MicroRNAs (miRNAs) are small endogenous noncoding RNAs of 18-22 nucleotides in length, which have a regulatory role in the expression of target proteins. Increased evidence suggests that miRNAs are abnormally expressed in a variety of cancer malignancies. Here, we combined RNA sequencing data and clinical information from The Cancer Genome Atlas Liver Hepatocellular Carcinoma database for weighted gene coexpression network analysis to identify potential miRNA prognostic biomarkers. We constructed nine coexpression modules, allowing us to identify that miR-105-5p, miR-767-5p, miR-1266-5p, miR-4746-5p, miR-500a-3p, miR-1180-3p and miR-139-5p are significantly associated with liver cancer prognosis. We found that these miRNAs exhibit significant association with prognosis of patients with liver cancer and confirmed the expression of these miRNAs in liver cancer tissues. Multivariate Cox regression analysis showed that miR-105-5p and miR-139-5p may be considered as independent factors. In summary, here we report that seven miRNAs have potential value as prognostic biomarkers of liver cancer.
Liver cancer is one of the most common cancers worldwide and the second leading cause of cancer-related deaths [1,2]. Despite much progress in diagnosis and treatment, the prognosis of patients with liver cancer is still poor. Due to the lack of specific clinical manifestations in the early stage, most patients are already in advanced stages of symptoms and miss the opportunity to undergo radical resection. Therefore, identification of liver cancer pathogenesis contributes to early diagnosis, choice of treatment methods, determination of follow-up timetable, and prognosis assessment, which can significantly prolong the survival time of patients with liver cancer [3].
Increased evidence suggests that microRNAs (miR-NAs) are abnormally expressed in a variety of malignancies and are closely related to the pathogenesis of cancers, including liver cancer. miRNAs participate in the development of liver cancer as tumor suppressor genes or oncogenes. Therefore, further study of miRNA expression patterns and effects can provide new diagnostic or therapeutic targets for liver cancer. miRNAs are small endogenous noncoding RNAs, 18-22 nucleotides in length, which have a regulatory role in the expression of target proteins via inhibiting protein translation or enhancing down-regulation of mRNA transcripts [4]. Long noncoding RNAs (lncRNAs) are a class of noncoding RNA transcripts; furthermore, their abnormal gene expressions promote tumor formation, progression, and metastasis, including liver cancer. In the cytoplasm, lncRNAs can regulate the expression of miRNA targets by competitively binding miRNAs to act as competitive endogenous RNA (ceRNAs) [5]. The ceRNA hypothesis shows that in the lncRNA-miRNA-mRNA ceRNA network, lncRNA competitively binds miRNA by sharing miRNA response elements and indirectly regulates mRNA expression levels [6]. At the same time, miR-NAs negatively regulate gene expression at the posttranscriptional level by binding to sequences (mostly located in the 3 0 UTR) and are partially complementary on their target mRNA [7].
Weighted gene coexpression network analysis (WGCNA) is one of the commonly used methods in coexpression module correlation analysis, which is widely applied in various biological processes (BP), especially for the identification of candidate biomarkers or therapeutic targets for many malignant tumors [8]. WGCNA is helpful to find associations between genes in different coexpression modules. Herein, a coexpression network was constructed through WGCNA to analyze the liver cancer expression profile dataset from The Cancer Genome Atlas (TCGA), which was used to explore possible carcinogenic mechanisms and potential hub genes as prognostic biomarkers. Then, we constructed a ceRNA regulatory network to understand the progress of liver cancer. Finally, based on the hub miRNAs, we constructed seven-miRNA modules. The seven miRNAs were confirmed between 22 pairs of hepatocellular carcinoma tissues and adjacent normal tissues by reverse-transcription quantitative PCR (RT-qPCR).

Data collection and preprocessing
The level 3 RNA sequencing data of Liver Hepatocellular Carcinoma (LIHC) were retrieved from TCGA data portal (https://cancergenome.nih.gov/), containing 371 liver cancer samples and 50 normal tissue samples. The mRNAs and lncRNAs were identified after annotation using Refseq transcript ID and Ensembl gene ID. A total of 335 liver cancer samples with complete clinical information in TCGA database were included in our study. The clinical information of patients with liver cancer included TNM, stage, grade, age and sex. The detail information of the dataset is shown in Table 1.
We obtained the log 2 (reads per million mapped reads (RPM) + 1) miRNA expression profile from TCGA database using the University of California Santa Cruz Xena (http://xena.ucsc.edu/). All data were downloaded in September 2017. The overall workflow of our study is shown in Fig. 1.

Differential expression analysis
The raw data from TCGA-LIHC database were filtered and normalized using the edgeR package in R 3.6.0 [9]. The lncRNAs, miRNAs or mRNAs with adjusted P < 0.05 and |logFC| > 1 between 371 liver cancer samples and 50 normal tissue samples were identified to be differentially expressed via the edgeR package. The P value was adjusted using the Benjamini-Hochberg (BH) method.

WGCNA
The coexpression network was constructed through WGCNA package version 1.49 [10]. To build a coexpression module, we filtered the power values. As an important parameter, power value could mainly affect the scale independence and the mean connectivity degree of coexpression modules. Scale independence and average connectivity analysis of modules with different power values from 1 to 30 were performed by gradient test. When the scale independent value was equal to 0.85, the appropriate power value was confirmed. After constructing coexpression modules, the gene information in each module was extracted and cluster analysis was performed at the appropriate threshold value. The relationships among different coexpression modules were analyzed. The strength of the relationship was performed using heatmap package (strong or weak degree).

Module-trait relationship analysis of liver cancer
Two methods were used to identify the module-trait relationships. Module-trait relationships were assessed by the correlation between the module eigengene (ME) and clinical traits. ME, as the major component for principal component analysis of genes in a module with the same expression pattern, may reflect the entire features of genes in a module. The clinical traits included TNM, stage, grade, age and sex. The correlation between ME and clinical traits was analyzed by Pearson's correlation tests, and P < 0.05 was considered to be significantly correlated. The module with the highest correlation coefficient and P < 0.05 was considered as a meaningful module.
In the intramodular analysis, for each expression profile, gene significance was calculated as the absolute value of the correlation between the gene expression profile and each clinical feature. Module Membership (MM) was defined as the correlation of expression profile and each ME. MM of a gene may be used to stand for the membership of the gene with respect to the module. Therefore, genes with a high significance for clinical traits and MM were identified.

Identification of interested module and function enrichment analysis
The gene enrichment analysis of the genes in the meaningful module was performed including Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) through the clusterProfiler package (version 3.12.0; up to September 2017) [11]. KEGG helps to better understand the advanced functions and utilities of biological systems, such as cells, organisms and ecosystems [12]. GO terms contain BP, molecular function (MF), and cellular component (CC). A P value < 0.05 after correction was set as the cutoff criterion.

Univariate and multivariate Cox regression analysis
Kaplan-Meier curves were drawn, and statistical assessment was performed using the log rank test. The hazard ratio (HR) and 95% confidence interval (CI) of the association between the differentially expressed miRNA and overall survival were assessed by univariate Cox regression analysis [13]. The differentially expressed miRNAs with P < 0.01 were identified to be related with prognosis; multivariate Cox regression analysis was then performed. According to the cutoff of expression value, we made survival analysis. Furthermore, an Akaike information criterion (AIC)-based stepwise factor reduction was performed to evaluate the goodness of fit of the model.

The ceRNA network construction
After identification of differentially expressed miRNAs related with prognosis, the ceRNA network was built. The miRNA-mRNA pairing relationship was extracted by miRDB (http://www.mirdb.org) and miRTarBase (http:// mirtarbase.mbc.nctu.edu.tw) databases. The targeted mRNAs were identified based on seven differentially expressed miRNAs related with prognosis and differentially expressed mRNAs in the blue coexpression module. Next, the miRNA-lncRNA pairing relationship was extracted from the miRcode (http://www.mircode.org) database, and the targeted lncRNAs were identified based on the seven miRNAs and the differentially expressed lncRNAs in the blue coexpression module. Finally, the ceRNA network was visualized using CYTOSCAPE (version 3.12.0).

RNA extraction and RT-qPCR
According to the manufacturer's instructions, total RNA was extracted from 22 pairs of hepatocellular carcinoma tissues and adjacent normal tissues using TRIzol. To detect miRNA expression, we synthesized cDNA with the miScript Reverse Transcription Kit (Qiagen, Hilden, Germany). Afterward, qPCR was carried out with the miScript SYBR Green PCR Kit (Qiagen). Glyceraldehyde-3 phosphate dehydrogenase (GAPDH) was used as an internal control. The relative expression levels of miRNAs were calculated with the 2 ÀΔΔCt method. The specific primers for miRNAs are listed in Table 2. This study was approved by the Ethics Committee of People's Hospital of Yichun City (2019002). All patients provided written informed consent. This study was strictly in line with the standards set by the Declaration of Helsinki.

Identification of differentially expressed lncRNAs, miRNAs and mRNAs
A total of 129 differentially expressed miRNAs (including 90 down-regulated and 39 up-regulated miR-NAs) were screened according to adjusted P < 0.05 and |logFC| > 1 using edgeR package (Table S1 and Fig. 2A).
Furthermore, differentially expressed lncRNAs and mRNAs were identified between 371 liver cancer samples and 50 adjacent normal tissue samples according to adjusted P < 0.05 and | logFC| > 1 using edgeR package. The results showed that there were 405 differentially expressed lncRNAs, including 107 down-regulated and 298 up-regulated lncRNAs (Table S2 and Fig. 2B). Furthermore, 2788 differentially expressed mRNAs, including 910 downregulated and 1878 up-regulated mRNAs, were identified (Table S3 and Fig. 2C).

Gene coexpression network construction
The outlier samples whose connectivity was less than À2.5 were excluded. Finally, the cluster analysis of 291 liver cancer samples was performed via the hclust tools package (Fig. 3A). After removing the outlier samples, the power value was calculated (Fig. 3B). The coexpression module was built by hierarchical clustering and dynamic branch cutting (Fig. 4). To explore the interaction between these coexpression modules, we calculated the connectivity of MEs and performed clustering analysis. Of these modules, nine coexpression modules with similar MEs were merged (Fig. 4). The gray module stands for the gene set that is not assigned to any module. The eigengene dendrogram and heatmap were performed to identify modules of correlated eigengenes, and the dendrogram suggested that these modules were associated with liver cancer clinical features (Fig. 5A,B).

Module-trait relationship construction
The relationships between nine coexpression modules and clinical traits are shown in Fig. 5C. Among them, the seven modules (turquoise, brown, green, blue, pink, black and yellow) had significant associations with liver cancer clinical traits, including event, TNM, stage, grade, age and gender. We found that blue module was associated with event (r = 0.3, P = 9eÀ6), T (r = 0.2, P = 0.003), stage (r = 0.21, P = 0.002) and grade (r = 0.25, P = 2eÀ4). Therefore, we further analyzed the genes in the blue module. A scatterplot of Gene Significance versus MM in the blue module is shown in Fig. 6A-H. There is a highly significant correlation between gene significance (for event, TNM, stage, grade and age) and MM in the blue module. The genes in the blue module are listed in Table S4.

Function enrichment analysis of genes in interested module
To further explore the function of differentially expressed genes in blue module, we conducted function enrichment analysis, including GO and KEGG analysis. The top 10 GO terms, including MF, CC and BP, were shown in Fig. 7A-C. We found that the differentially expressed genes were mainly enriched in several pathways, such as ATPase activity, chromosomal region, organelle fission and so on. Also, these differentially expressed genes were mainly enriched in  KEGG pathways, such as cell cycle, cellular senescence, oocyte meiosis, DNA replication, Fanconi anemia pathway, p53 signaling pathway and so on (Fig. 7D). Therefore, the differentially expressed genes in blue module could participate in the occurrence and development of liver cancer.

Survival analysis and ceRNA network construction
After extracting the differentially expressed genes in blue module, we performed univariate Cox regression analysis to identify differentially expressed miRNAs related with prognosis, which were used to construct the ceRNA network (P < 0.001). After matching miRNA-mRNA and miRNA-lncRNA relationships, the ceRNA network was built. There were 72 mRNAs, 3 lncRNAs and 7 miRNAs, all of which were closely related with prognosis (Fig. 8A). The miRNA-mRNA pairs are listed in Table S5. Univariate and multivariate analyses of various prognostic parameters in patients with liver cancer were performed. As shown in Table 3, univariate analysis results showed that T classification and M classification were closely correlated with overall survival of patients with liver cancer. Furthermore, T classification could become an independent prognostic factor for liver cancer according to multivariate analysis results. There were seven miR-NAs closely related with liver cancer prognosis, including miR-105-5p (P < 0.  (Fig. 9A). The seven-miRNA model was constructed, and survival analysis demonstrated that the model could significantly distinguish prognosis differences between highrisk and low-risk groups (Fig. 9B). Furthermore, we further validated the efficiency of the seven-miRNA model for prediction of liver cancer prognosis. We first calculated the score of each sample through the seven-miRNA model. The linear regression model after integrating various factors showed that the risk value was significantly associated with prognosis ( Table 4). The AIC value of the risk score + stage integrated model was the smallest through AIC, where risk score was smaller than stage, reflecting the effectiveness of the seven-miRNA model for liver cancer prognosis. The median value of the model was used to differentiate the samples into high-and low-risk groups, as shown in Table 5. As expected, there were significant differences in several clinical subtypes. Thus, the seven-miRNA model could become a better prediction model related to liver cancer prognosis.

Discussion
As molecular biomarkers continue to develop, prognostic indicators will be promising clinical tools for liver cancer [14]. WGCNA has been widely applied to screen novel markers [15]. In the coexpression networks, genes with similar expression patterns are clustered together in the same module, and these genes may have similar regulatory functions [16]. In addition, to explore the genetic mechanisms behind clinical features, the relationship between modules and clinical features was identified. The link between clinical features and genes in the module may help to understand the pathogenesis of liver cancer and screen for potential biomarkers. The results showed that the blue module is significantly related to event, T, stage and grade. Function enrichment analysis demonstrated that the differentially expressed genes were mainly enriched in several KEGG pathways, such as cell cycle, cellular senescence, oocyte meiosis, DNA replication, Fanconi anemia pathway, p53 signaling pathway and so on. Therefore, the differentially expressed genes in the blue module could participate in the occurrence and development of liver cancer. Therefore, we further analyzed the genes in this module. To further clarify biomarkers that can serve as prognostic factors, we used univariate Cox regression analysis to identify miRNAs associated with prognosis. The results revealed that seven differentially expressed miRNAs were closely related with prognosis. After matching miRNA-mRNA and miRNA-lncRNA relationships, the ceRNA network was then constructed and the seven miRNAs were identified in the network. The hub miRNAs represent the primary regulatory role of the blue module. To further confirm the prognostic value of these seven miRNAs in liver cancer, we presented multivariate Cox regression analysis. The results showed that the seven-miRNA model has a high sensitivity in predicting the prognosis of patients with liver cancer, and miR-105-5p and miR-139-5p could be an independent prognostic factor compared with other miRNAs. Survival analysis revealed that the seven-miRNA module had significant correlation with prognosis of liver cancer. More importantly, RT-qPCR confirmed that miR-105-5p, miR-500a-3p, miR-767-5p, miR-1180-3p, miR-1266-5p and miR-4746-5p were up-regulated in liver cancer tissues compared with normal tissues. Furthermore, miR-139-5p was down-regulated in liver cancer tissues compared with normal tissues. Human miR-105 is located in the intron region of GABR A3A , which is located on the X chromosome [17]. Compared with normal tissues, miR-105 is downregulated in many malignant tumors as a tumor suppressor or oncogene, such as breast cancer, non-small cell lung cancer, gliomas, colorectal cancer and so on [18][19][20][21]. Increasing evidence suggests that miR-105 can be used as a prognostic predictor, because its expression pattern is closely associated with prognosis of these cancers. It has been confirmed that miR-105 is down-regulated in hepatocellular cancer cell lines and tissues, which promotes proliferation and tumorigenicity of hepatocellular cancer cells in vitro and in vivo [22]. Furthermore, miR-105 acts as a tumor suppressor in hepatocellular carcinoma via inhibiting the phosphatidylinositol 3-kinase (PI3K)/AKT signaling pathway. In our study, univariate and multivariate Cox regression analysis showed that miR-105-5p could become an independent prognostic factor for liver cancer. miR-767 is up-regulated in human melanoma tissues and cell lines, which promote melanoma cell proliferation, and miR-767 acts as a tumor promoter in human melanoma by targeting CYLD [23]. In addition, miR-767 could become a prognostic factor for thyroid cancer [24]. It has been confirmed that miR-1266 contributes to several cancers. For example, miR-1266-5p is down-regulated in prostate cancer, which regulates the apoptotic pathway by targeting the antiapoptotic genes BCL2 and BCL2L1 [25]. Human telomerase reverse transcriptase (hTERT) is a catalytic subunit of the telomerase complex, and its increased expression is associated with the expansion and metastasis of gastric cancer. miR-1266 is identified as a hTERT inhibitor in gastric cancer, which interacts with the 3 0 UTR of hTERT, whereas miR-1266 is significantly reduced in gastric cancer tissues [26]. miR-1266 is significantly elevated in pancreatic cancer, which is associated with poor survival and chemotherapy response in patients with pancreatic cancer [27]. miR-4746 is differentially expressed in several cancers  *P < 0.0001.; **P < 0.01. through pan-cancer analysis [28]. miR-500a-3p is down-regulated in lung cancer, which is associated with poor prognosis in patients with lung cancer [29].
In addition, miRNA-500a-3p suppresses cell proliferation and invasion in human non-small cell lung cancer [30]. miR-500a-3p is highly expressed in hepatocellular carcinoma tissues and cells, which is associated with the prognosis of patients with hepatocellular carcinoma. Moreover, miR-500a-3p promotes cancer stem cell characteristics by activating the JAK/STAT3 signaling pathway [31]. The expression of miR-1180 is significantly increased in hepatocellular carcinoma cells and tissues, which promotes cell proliferation of hepatocellular carcinoma by targeting TNFAIP3 interacting protein 2 (TNIP2) [32]. Apoptosis resistance in human hepatocellular carcinoma is an important factor in carcinogenesis. The ectopic expression of miR-1180 has an antiapoptotic effect in hepatocellular carcinoma [33]. miR-139-5p plays a role in aerobic glycolysis, cell proliferation, migration, invasion and metastasis [34,35]. In addition, miR-139-5p is significantly associated with recurrence of hepatocellular carcinoma [36].
The earlier analysis revealed that the seven miRNAs could contribute to the development of liver cancer. Abnormal expression of these miRNAs could predict prognosis of liver cancer. Several similar studies have   the similar aims and objectives to identify the miRNA using Gene Expression Omnibus and TCGA databases. Wang et al. [37] identified several miRNAs that could become potential prognostic biomarkers for liver cancer by bioinformatics analysis. Li et al. [21] identified novel prognostic biomarkers for liver cancer by constructing a coexpression network. Furthermore, Zhang et al. [38] reported that lncRNAs had differential expression patterns and ceRNA potential in liver cancer between 372 liver cancer tissues and 48 adjacent normal tissues from TCGA and Gene Expression Omnibus databases. However, these biomarkers were not validated by basic experiments. In our study, we first constructed nine coexpression modules by WGCNA, and the blue module had a significant association with clinical traits of liver cancer. Furthermore, the genes in the blue module could participate in many signaling pathways. After identifying seven miRNAs related with prognosis, the ceRNA network revealed that the seven miRNAs had a complex regulatory network. Moreover, the seven-miRNA module could predict prognosis of patients with liver cancer. In our study, RT-qPCR results confirmed the expression patterns of seven miRNAs in liver cancer tissues compared with adjacent normal tissues. Therefore, the function of the seven miRNAs in liver cancer are worth more in-depth research.

Conclusion
In our study, we constructed gene coexpression modules related with clinical traits of liver cancer. Seven miRNAs were identified as prognostic biomarkers by univariate and multivariate Cox regression analysis. Furthermore, the seven-miRNA module possesses potential value to predict prognosis of liver cancer. Therefore, our study constructed coexpression modules by WGCNA and identified prognostic biomarkers for liver cancer. Cancer specific long noncoding RNAs show differential expression patterns and competing endogenous RNA potential in hepatocellular carcinoma. PLoS One 10, e0141042.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Differentially expressed miRNAs for liver cancer. Table S2. Differentially expressed lncRNAs for liver cancer. Table S3. Differentially expressed mRNAs for liver cancer. Table S4. The genes in the blue module. Table S5. The miRNA-mRNA pairs in the ceRNA network.