Diagnostic value of strand‐specific miRNA‐101‐3p and miRNA‐101‐5p for hepatocellular carcinoma and a bioinformatic analysis of their possible mechanism of action

There is accumulating evidence that miRNA might serve as potential diagnostic and prognostic markers for various types of cancer. Hepatocellular carcinoma (HCC) is the most common type of malignant lesion but the significance of miRNAs in HCC remains largely unknown. The present study aimed to establish the diagnostic value of miR‐101‐3p/5p in HCC and then further investigate the prospective molecular mechanism via a bioinformatic analysis. First, the miR‐101 expression profiles and parallel clinical parameters from 362 HCC patients and 50 adjacent non‐HCC tissue samples were downloaded from The Cancer Genome Atlas (TCGA). Second, we aggregated all miR‐101‐3p/5p expression profiles collected from published literature and the Gene Expression Omnibus and TCGA databases. Subsequently, target genes of miR‐101‐3p and miR‐101‐5p were predicted by using the miRWalk database and then overlapped with the differentially expressed genes of HCC identified by natural language processing. Finally, bioinformatic analyses were conducted with the overlapping genes. The level of miR‐101 was significantly lower in HCC tissues compared with adjacent non‐HCC tissues (P < 0.001), and the area under the curve of the low miR‐101 level for HCC diagnosis was 0.925 (P < 0.001). The pooled summary receiver operator characteristic (SROC) of miR‐101‐3p was 0.86, and the combined SROC curve of miR‐101‐5p was 0.80. Bioinformatic analysis showed that the target genes of both miR‐101‐3p and miR‐101‐5p are involved in several pathways that are associated with HCC. The hub genes for miR‐101‐3p and miR‐101‐5p were also found. Our results suggested that both miR‐101‐3p and miR‐101‐5p might be potential diagnostic markers in HCC, and that they exert their functions via targeting various prospective genes in the same pathways.

According to Cancer Statistics, 2017 [1], the incidence rates of liver cancer in the USA continue to increase rapidly (~3% per year in women and 4% per year in men), and the death rate rose by almost 3% per year from 2010 to 2014. In addition, the mortality rate is three times higher in men than in women. Since Asia is the area with the highest incidence rate of liver cancer, especially China, annual incidence and mortality are more than half of the global totals [2]. Among the three histological types of liver malignancy, hepatocellular carcinoma (HCC) has become the leading cause of death from cancer. Since there have been no biomarkers or common surgical techniques for the early stage of HCC, the majority of patients with HCC are diagnosed late, which directly correlates with a poor outcome and low survival rate. As with other cancers, HCC development is a multistep process with abundant genetic and epigenetic mutations. A recent study confirmed that hepatocarcinogenesis can be caused by chronic hepatitis B virus (HBV) infection [3]. Much effort towards the treatment of HBV-infected HCC has been made in the past, but with only limited success. Thus, identifying novel biochemical markers for early HCC diagnosis is a matter of the utmost urgency. miRNAs,~20-22 nucleotides in length, are a class of small endogenous non-coding RNA molecules. They post-transcriptionally regulate mRNA expression through imperfect base paring with the 3 0 -untranslated region of target genes. With comprehensive study, miRNAs have become known as the star molecules of cancer research. miRNAs in human cancers are involved in several pivotal biological processes (BP), including cancer proliferation, differentiation, progression and cell apoptosis [4][5][6]. Although their functions remain elusive, up-and down-regulation of miRNAs have been widely reported in all kinds of cancer tissues in comparison with expression in the corresponding normal tissues [7,8]. In particular, miRNAs have been found to be biomarkers for cancer clinical diagnosis, histological classification and prognosis [9][10][11][12][13].
Accumulating evidence has clearly demonstrated that the aberrant expression of miRNAs may further influence the expression of tumor oncogenes and suppressor genes, thereby leading to the occurrence of a tumor [14][15][16][17]. Theoretically, mature miRNA generation requires a series of enzyme reactions. First, primary miRNA transcripts are cleaved in the nucleus by the Drosha enzyme to liberate the precursor miRNA (pre-miRNA) hairpin. Subsequently, the pre-miRNA is exported to the cytoplasm and further processed by the enzyme Dicer to produce two mature miRNAs (miR-5p and miR-3p) [18,19]. Even though the two mature miRNAs are transcribed from the same pre-miRNA, they may have different target genes and biological functions. A previous study [20] reported that the expression levels of the miR-5p and miR-3p mature sequences can be altered in different tissues.
Accumulating evidence [19,[21][22][23][24][25] has shown that miR-101-3p/-5p is down-regulated in multiple malignances, including HCC. For example, Hou et al. [26] explored miRNA expression profiling and revealed that miR-101 (3p and 5p were not distinguished) expression in HCC tissues was lower than in healthy controls. Wei et al. [27] also showed that miR-101 (3p and 5p were not distinguished) was down-regulated in HBV-associated HCC tissues and may have therapeutic potential in HCC. Additionally, the function of these miRNAs has also been investigated. Zhang et al. [28] revealed that enforced expression of miR-101 (3p and 5p were not distinguished) by siRNA inhibited the cell proliferation and tumorigenicity of an HCC cell line in vitro. Sheng et al. [29] investigated how miR-101-3p regulated cell proliferation, cell cycle and apoptosis in HCC and found that overexpression of miR-101-3p caused an enhanced rate of apoptosis but no obvious change in the cell cycle. Besides, several oncogenes, such as EZH2, FOS, COX-2 and SOX9, have been found to be directly regulated by miR-101-3p/5p [30][31][32]. Recently the potential of the miR-101 family as diagnostic indicators has also caught the eye of researchers. He et al. [33] conducted a meta-analysis that summarized miRNAs' diagnostic value in HCC and found that miR-101-5p had great diagnostic value, though only three data sets were included and the results need to be further validated. Furthermore, in human, miR-101 precursor transcripts are encoded with two genomic loci (miR-101-1 and miR-101-2). For two mature miRNAs, miR-101-3p is generated from the 3 0 ends of the precursors, and miR-101-5p from the 5 0 end of pre-miR-101-1 (http://www.mirbase. org/). We speculated that miR-101-3p may also serve as a diagnostic marker for HCC. Since the seed region of miR-101-3p and miR-101-5p is unique, they are predicted to regulate unique targets. However, to the best of our knowledge, the comparative roles of miR-101-3p and miR-101-5p in HCC have not yet been fully studied.
The present study investigated miR-101-3p and miR-101-5p expression in HCC tissues compared with that in healthy controls. Published studies, Gene Expression Omnibus (GEO) microarray chips and The Cancer Genome Atlas (TCGA) data that included miR-101-3p or miR-101-5p expression information were collected together. Additionally, previous studies have mainly focused on a single gene [34][35][36], and studies have rarely focused on the function of coexpressed genes in cancers. For the purpose of obtaining a full understanding of the molecular mechanisms underlying HCC, comprehensive bioinformatics methods were used to investigate the function and pathways of target genes of miR-101-3p and miR-101-5p associated with HCC. In a word, the present study aimed to analyze the expression and mechanism of miR-101-3p and miR-101-5p in the initiation and development of HCC. This exploration will provide novel insights into HCC. A flowchart for the whole study designed is shown in Fig. 1.

Material and methods
The clinical role of miR-101 based on the public database TCGA To verify the difference in the miR-101 expression levels between HCC and normal liver tissues, we downloaded relevant data from the public tumor database TCGA, in which samples from 362 HCC patients and 50 adjacent non-HCC tissues were included. Additionally, miR-101-1 and miR-101-2 levels were both calculated because the relevant sample data were provided in TCGA. miR-101-1 and miR-101-2 are two precursor hairpin structures of miR-101 miRNA that are located in the human genome on chromosome 1 (MI0000103) and 9 (MI0000739), respectively [37]. Both of them are processed by the Dicer enzyme to form the mature miRNA. All of the available clinical parameters were analyzed by SPSS STATISTICS 22.0 (IBM Corp., Armonk, NY, USA).

Search strategy and study selection
Comprehensive literature searches were conducted on electronic databases PubMed, EMBASE, Web of Science, the Cochrane Library, and Chinese National Knowledge Infrastructure (CNKI) up to 29 December 2016. No language limitations were imposed. Qualifying articles were screened by combining the following keywords: 'miR-101' OR 'miRNA-101' OR 'miRNA-101' OR 'miR101' OR 'miRNA101' OR 'miRNA 101' OR 'miR-101-5p' OR 'miRNA-101-5p' OR 'miRNA-101-5p' OR'miR-101-3p' OR 'miRNA-101-3p' OR 'miRNA-101-3p' AND malignan* OR cancer OR tumor OR neoplas* OR carcinoma AND hepatocellular OR liver OR hepatic OR HCC AND diagnos* OR receiver operating characteristic (ROC) OR specificity OR sensitivity OR DEGs OR DEMs OR 'differentially expressed'. In addition, the reference lists were also manually searched to reduce article omission. The title and abstract of the obtained studies were scanned to exclude any clearly irrelevant publications. In addition to searching the literature, we also searched the GEO database for eligible microarrays with the following terms: malignan* OR cancer OR tumor OR neoplas* OR carcinoma AND hepatocellular OR liver OR hepatic OR HCC.

Criteria for inclusion and exclusion
Studies that met the following criteria were included: (a) investigated HCC; (b) measured the level of miR-101, miR-101-3p or miR-101-5p in HCC tissue, plasma or serum; (c) included the diagnosis of HCC or the clinical parameters; and (d) reported true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs) or sensitivity and specificity of miR-101. In addition, (e) if the studies did not provide a fourfold contingency table, they were included if the original data were available; and (f) microarrays were included if they enrolled more than three patient samples and measured the miR-101 profile for HCC.
Articles that met the following criteria were excluded: (a) studies without sufficient data, such as reviews or systematic reviews, (b) repeat reports, (c) studies conducted on cell lines or animals and (d) letters to the editor or conference abstracts.

Data synthesis and analysis
Studies that did not provide TPs, FPs, FNs and TNs but gave sensitivity and specificity or the original data were translated by MEDCALC 11.4.2.0 (MedCalc Software, Ostend, Belgium). To reduce inaccuracy in the relevant data extracted from the included studies, three independent researchers (XY, PL and JMC) performed the data extraction separately.

Statistical analysis
All statistical analyses were performed using SPSS STATISTICS 20.0 or STATA 12.0 (StataCorp, College Station, TX, USA). For the clinical parameter analysis, miR-101 expression was represented as the mean AE standard deviation. The standards for assessing the area under the curve (AUC) in the ROC curve were as follows: 0.5-0.7 represented poor evidence for diagnosis, 0.7-0.9 represented moderate evidence for diagnosis and 0.9-1.0 represented high evidence for diagnosis. The correlation between miR-101 expression and the clinicopathological parameters was investigated with Spearman's rank correlation. The significance of the difference between HCC and non-cancerous liver tissues was studied using Student's t test. The significant differences among three groups were examined by one-way ANOVA. For data mining, the pooled sensitivity, specificity, positive likelihood ratios (PLRs), negative likelihood ratios (NLRs), and diagnostic odds ratio with their corresponding 95% confidence intervals (CIs) were calculated with the bivariate regression model. Additionally, the summary receiver operator characteristic (SROC) curve with the area under the SROC curve was calculated [38]. What is more, the Q test and the I 2 measure of inconsistency were used to quantify heterogeneity between studies [39]. The possibility of publication bias was finally explored by Deeks' funnel plot, and P values < 0.1 were considered significant.

Natural language processing
Natural language processing (NLP) is a novel computerized approach to analyze electronic free text to achieve 'humanlike language processing'. With this approach, programmers create software to 'read' text and extract key pieces of information from clinician notes, procedure/radiology/ pathology reports and laboratory results [41,42]. We performed a literature search in PubMed to obtain all related electronic records. The detailed process was described in our previous article [43,44]. Finally, 1800 genes that were differentially expressed in HCC were identified for further analysis.

Functional and signaling pathway analyses
A set of condition-specific genes from the overlapping genes from the target prediction software and NLP further underwent functional and signaling pathway analyses on a public database platform, the Database for Annotation, Visualization and Integrated Discovery (DAVID; https://da vid.ncifcrf.gov/), which provides a functional interpretation of massive gene lists derived from genomic studies. The analyses included Gene Ontology (GO) function analysis (http://www.geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg/) analysis. The GO function analysis categorized selected genes into groups in accordance with three independent classification standards, BPs, cellular components (CCs), and molecular functions (MFs). The top 10 terms of each GO category and top 30 pathways of the KEGG pathways were visualized as GO maps and KEGG maps, separately, via CYTOSCAPE v3.4.0 (http://cytoscape.org/).

Protein-protein interaction network construction
Overlapping genes were inputted to the STRING v10.0 online tool (http://string-db.org/) to construct the protein-protein interaction (PPI) network. The direct (physical) and indirect (functional) associations of proteins were derived from four methods: (a) literature-reported protein interactions, (b) high-throughput experiments, (c) genome analysis and prediction and (d) coexpression studies. By scrutinizing the connectivity degrees of the nodes in the PPI networks, we determined the hub genes. A node with a high degree of connectivity is perceived as a hub node.

Results
Clinicopathological significance of miR-101-1/ miR-101-2 in HCC tissues The relationship between miR-101-1/miR-101-2 and clinicopathological parameters in HCC was mined from TCGA, as shown in Tables  Compared with the expression in advanced stage (III and IV) HCC patients, the relative expression of miR-101 in early stage patients was notably increased (I and II, P < 0.05), and the Spearman correlation test confirmed that the correlations between miR-101 and the pathological stage, pathological T stage and histological stage were r = À0.17, P = 0.001; r = À0.17, P = 0.001 and r = À0.18, P < 0.001, respectively.

Study selection
Through the literature search, 341 relevant articles were identified, 339 of which were excluded for being case reports, reviews, letters, repeat publications and studies not specifically pertaining to miR-101-3p/5p. The two remaining publications were examined by three researchers and ultimately included. Moreover, GEO microarrays that detected miR-101-3p and/or miR-101-5p were identified for further analysis and were combined after assessment. Finally, 12 datasets including 315 HCC and 330 normal control samples were downloaded from the GEO database to calculate the miR-101-3p diagnostic value (GSE39678, GSE21279, GSE67882, GSE65708, GSE12717, GSE 10694, GSE22058, GSE21362, GSE40744, GSE41874, GSE54751 and GSE57555); five datasets including 308 HCC and 114 normal control samples were downloaded from the GEO database to calculate the miR-101-5p diagnostic value (GSE74618, GSE21362, GSE40744, GSE41874 and GSE57555). In addition, the precursors of miR-101 identified from TCGA were also considered.

Heterogeneity analysis
The analysis of heterogeneity is widely used to evaluate the accuracy of statistical pooling from multiple studies [45]. Since heterogeneities may come from a threshold effect and a non-threshold effect, the threshold effect was first explored by the Spearman test to calculate the heterogeneity of miR-101-3p/5p among the included studies. In other words, the correlation coefficient and P value between the logit of sensitivity and logit of 1specificity were calculated. As  a result, the Spearman correlation coefficients for miR-101-3p and miR-101-5p were 0.386 (P = 0.215) and À0.059 (P = 0.912), respectively, indicating that heterogeneity from the threshold effect was not found. However, the I 2 values in the forest plots of sensitivity and specificity (more than 50%) revealed that we cannot ignore the non-threshold effect from the included studies.

Publication bias
Publication bias was conducted by using the Deeks' funnel plot asymmetry test. According to the results, the funnel plots that represented every study were almost symmetric, suggesting that publication bias from the studies included was absent in our study. The obtained P-values of 0.718 and 0.447 for miR-101-3p and miR-101-5p, respectively, also revealed the absence of publication bias (Fig. 7).

Bioinformatic analysis
To improve understanding of the function of miR-101, the potential target genes of miR-101-3p and miR-101-5p in HCC were identified separately. Based on the prediction software and NLP, 73 target genes corresponding to miR-101-3p and 90 target genes corresponding to miR-101-5p were obtained. Subsequently, bioinformatic analyses were conducted to investigate the function and pathways of target genes of miR-101 associated with HCC. All of the target genes were inputted into DAVID for bioinformatic analysis.

KEGG pathway enrichment analysis
Our study revealed that 23 KEGG pathways corresponding to miR-101-3p were enriched, from which the top five pathways in which target genes were enriched were (a) the adherens junction pathway (hsa04520: P = 8.  Table 3.

Protein-protein interaction network
A PPI network was designed to screen out the hub genes according to the degree to which each of the genes appeared in the network. Here, the PPI network was constructed by using the STRING database. As shown in Figs 12 and 13, FOX, SMARCA4 and MAPK1 remained the top three utmost important genes for miR-101-3p, while ESR1, KRAS, NRAS, FOXO1, CREBBP and SMAD3 were regarded as the hub genes for miR-101-5p.

Discussion
In the present study, we investigated the relationship between miR-101 expression and clinicopathological parameters. TCGA data showed that the miR-101 level was significantly lower in HCC than in para-non-cancerous liver tissues, and great diagnostic value of miR-101 in HCC was found. Additionally, Fig. 10. GO functional analysis of miR-101-3p in HCC. Top 10 terms of each category are displayed, and every node represents different BP terms; the map node size represents the P value of targets, low values are indicated by large nodes, and the node color represents the gene count number with low values indicated by pink. Accumulating studies have indicated that dysregulation of circulating miRNAs could be a biomarker of tumorigenesis, development and invasion in various cancers including prostate cancer, gastric cancer, ovarian cancer, breast cancer and lung cancer [30,[46][47][48][49]. A diagnostic value for circulating miR-101-3p/5p in HCC has also been reported [50,51]. Both of these studies validated that a lower miR-101-3p/5p level had diagnostic potential for HCC. However, due to the limited number of available publications, the exact diagnostic value of miR-101 and the difference between miR-101-3p and miR-101-5p are still unclear. Alpha-fetoprotein (AFP), as the traditional marker of liver diseases, has been used for HCC diagnosis in the clinic. Recently, He et al. [33] conducted a meta-analysis with 10 data sets (879 HCC patients and 1028 controls) assessing AFP for HCC diagnosis and revealed that the AUC-SROC of pooled AFP was 0.82 (95% CI: 0.78-0.85), with sensitivity of 0.631 (95% CI: 0.552-0.703) and specificity of 0.943 (95% CI: 0.875-0.975). Here, we first combined gene expression microarray datasets from the GEO database and RNA-seq from TCGA database, as well as two studies, to further confirm the diagnostic efficacy of miR-101-3p and miR-101-5p and then discover the difference between the two mature mRNAs. Our findings suggested that the pooled diagnostic accuracy of miR-101-3p for HCC (SROC: 0.86 (95% CI: 0.82-0.89); sensitivity and specificity were 78.0% (95% CI: 65.0-88.0%) and 79.0% (95% CI: 0.67-88.0%), respectively), which showed a slightly higher diagnostic value than AFP. As for miR-101-5p, the SROC was 0.80 (95% CI: 0.76-0.83), a little bit lower than AFP, but it also showed a moderate value for HCC diagnosis, which is comparable to AFP's diagnostic value.
Even though miR-101-3p/5p expression showed a high diagnostic value for HCC, the heterogeneity among the studies must be considered. Since our study indicated that heterogeneity from the threshold effect was absent, we deduced that the heterogeneity may be caused by the different data platforms and the large gaps between each study. Considering that the number of studies was small, we did not conduct a subgroup analysis.  Subsequently, bioinformatic analysis was performed to determine the molecular mechanism of miR-101-3p/ 5p in HCC. In the past, researchers exploring the molecular mechanism of miRNAs only concentrated on one or two target genes. For example, Varambally et al. [52] first reported that EZH2 was the target gene of miR-101 (3p and 5p were not distinguished) several years ago. Another study confirmed that miR-101 (3p and 5p were not distinguished) inhibits HCC progression and metastasis through EZH2 down-regulation [53]. Liu et al. [54] identified another target gene of miR 101 (3p and 5p were not distinguished), VEGF C, which promotes invasion and migration. MCL-1 and COX-2, which play a role in tumorigenesis, have also been identified as the target genes of miR-101 (3p and 5p were not distinguished) [31,55]. In addition, metastasis of HCC has been shown to be affected by different target genes of miR-101 (3p and 5p were not distinguished), such as STMN1 [56] and PTEN [57]. Since a single miRNA can target multiple genes to achieve its biological and clinical functions, the exploration of the relevant gene network can reveal the widespread molecular mechanism of miR-101-3p/5p. Hence, we identified potential target genes of miR-101-3p/5p in silico. Moreover, we further narrowed the list by analyzing the genes that overlapped with the differentially expressed genes of HCC identified via NLP. Next, these target genes were subjected to KEGG pathway annotation and GO enrichment analysis by using the DAVID. The target genes of both miR-101-3p and miR-101-5p are involved in pathways in cancer, hepatitis B and the MAPK signaling pathway. These results reveal that miR-101 probably contributes to the tumorigenesis and metastasis of HCC. Previous studies have reported the role of these pathways in liver cancer [58,59]. The GO term analysis indicated that these potential target genes of miR-101-3p/5p were significantly involved in the regulation of the cell cycle and cell proliferation, which are associated with tumor occurrence or stepwise development.
Furthermore, we constructed the PPI network with potential target genes, showing that miR-101-3p probably targets FOS, SMARCA4, MAPK1, GSK3B and JAK2 to exert its function in HCC. Li et al. [60] reported that FOS acts as a regulator of cell proliferation, differentiation and transformation, and miR-101 inhibits cell invasion and migration via down-regulation of FOS. MAPK1 has also been reported to be involved in a variety of cellular processes, such as differentiation, proliferation and development through the MAPK pathway [61]. JAK2 is a protein tyrosine kinase, and recent evidence has demonstrated that miR-101 (3p and 5p were not distinguished) inhibits breast cancer cell proliferation and promotes apoptosis by targeting JAK2 [49]. Previous studies reveal that miR-101-3p might regulate the occurrence and development of HCC by targeting various genes, and tumorigenesis probably results from the abnormality of multiple genes. Of course, the correlation of those potential key genes of miR-101-3p needs further experimental validation. Next, a functional analysis of these target genes in vitro and in vivo will need to be conducted, such as by RNA interference and cellular transfection, luciferase reporter assay, western blot and so on. miR-101-5p possibly targets ESR1, KRAS, CREBBP, FOXO1 and SMAD3 through different pathways. Among them, KRAS, a Kirsten ras oncogene homolog, was reported to have functional synergy with HBx in HCC initiation and progression [62], and FOXO1 has been proposed to inhibit EMT transcriptional activators in HCC [63,64]. Additionally, Hishida et al. [65] indicated that ESR1 is a tumor suppressor gene in HCC. Taken together, the hub genes identified may perform key roles in HCC. Further investigation appears to be necessary to confirm their exact function in HCC.
Taken together, the present study validated the down-regulation of the two opposing strands, miR-101-3p and miR-101-5p, in HCC clinical specimens; however, miR-101-3p held a greater value for HCC diagnosis. Bioinformatic analysis revealed that miR-101-3p and miR-101-5p are involved in the same or similar signaling pathways through regulating a different set of target genes. The fact that miR-101-3p and miR-101-5p are involved in these signaling pathways suggests that the expression of miR-101-3p and miR-101-5p is close in HCC tissues, and they may function cooperatively with each other in the differentiation, proliferation and development of HCC.
In conclusion, we provide a comprehensive analysis of miR-101-3p/5p and evaluated the value of miR-101-3p and miR-101-5p as biomarkers for the early diagnosis of HCC. In addition, we investigated the prospective molecular mechanisms of these two opposing strands in silico. Our results provide a deeper understanding of the role of miR-101-3p/5p in HCC and facilitate the possible development of a miRNAbased targeted therapy of HCC. However, several limitations should be considered in this study. First, the total number of studies included was limited; second, further experiments in vitro and in vivo are still required to confirm the function of the target genes.
(GXMUYSF201624). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.