Tumor‐adjacent tissue co‐expression profile analysis reveals pro‐oncogenic ribosomal gene signature for prognosis of resectable hepatocellular carcinoma

Currently, molecular markers are not used when determining the prognosis and treatment strategy for patients with hepatocellular carcinoma (HCC). In the present study, we proposed that the identification of common pro‐oncogenic pathways in primary tumors (PT) and adjacent non‐malignant tissues (AT) typically used to predict HCC patient risks may result in HCC biomarker discovery. We examined the genome‐wide mRNA expression profiles of paired PT and AT samples from 321 HCC patients. The workflow integrated differentially expressed gene selection, gene ontology enrichment, computational classification, survival predictions, image analysis and experimental validation methods. We developed a 24‐ribosomal gene‐based HCC classifier (RGC), which is prognostically significant in both PT and AT. The RGC gene overexpression in PT was associated with a poor prognosis in the training (hazard ratio = 8.2, P = 9.4 × 10−6) and cross‐cohort validation (hazard ratio = 2.63, P = 0.004) datasets. The multivariate survival analysis demonstrated the significant and independent prognostic value of the RGC. The RGC displayed a significant prognostic value in AT of the training (hazard ratio = 5.0, P = 0.03) and cross‐validation (hazard ratio = 1.9, P = 0.03) HCC groups, confirming the accuracy and robustness of the RGC. Our experimental and bioinformatics analyses suggested a key role for c‐MYC in the pro‐oncogenic pattern of ribosomal biogenesis co‐regulation in PT and AT. Microarray, quantitative RT‐PCR and quantitative immunohistochemical studies of the PT showed that DKK1 in PT is the perspective biomarker for poor HCC outcomes. The common co‐transcriptional pattern of ribosome biogenesis genes in PT and AT from HCC patients suggests a new scalable prognostic system, as supported by the model of tumor‐like metabolic redirection/assimilation in non‐malignant AT. The RGC, comprising 24 ribosomal genes, is introduced as a robust and reproducible prognostic model for stratifying HCC patient risks. The adjacent non‐malignant liver tissue alone, or in combination with HCC tissue biopsy, could be an important target for developing predictive and monitoring strategies, as well as evidence‐based therapeutic interventions, that aim to reduce the risk of post‐surgery relapse in HCC patients.

Currently, molecular markers are not used when determining the prognosis and treatment strategy for patients with hepatocellular carcinoma (HCC). In the present study, we proposed that the identification of common prooncogenic pathways in primary tumors (PT) and adjacent non-malignant tissues (AT) typically used to predict HCC patient risks may result in HCC biomarker discovery. We examined the genome-wide mRNA expression profiles of paired PT and AT samples from 321 HCC patients. The workflow integrated differentially expressed gene selection, gene ontology enrichment, computational classification, survival predictions, image analysis and experimental validation methods. We developed a 24-ribosomal gene-based HCC classifier (RGC), which is prognostically significant in both PT and AT. The RGC gene overexpression in PT was associated with a poor prognosis in the training (hazard ratio = 8.2, P = 9.4 9 10 À6 ) and cross-cohort validation (hazard ratio = 2.63, P = 0.004) datasets. The multivariate survival analysis demonstrated the significant and independent prognostic value of the RGC. The RGC displayed a significant prognostic value in AT of the training (hazard ratio = 5.0, P = 0.03) and cross-validation (hazard ratio = 1.9, P = 0.03) HCC groups, confirming the accuracy and robustness of the RGC. Our experimental and bioinformatics analyses suggested a key role for c-MYC in the pro-oncogenic pattern of ribosomal biogenesis co-regulation in PT and AT. Microarray, quantitative RT-PCR and quantitative immunohistochemical studies of the PT showed that DKK1 in PT is the perspective biomarker for poor HCC outcomes. The common co-transcriptional pattern of ribosome biogenesis genes in PT and AT from HCC patients suggests a new scalable prognostic system, as supported by the model of tumor-like metabolic redirection/assimilation in non-malignant AT. The RGC, comprising 24 ribosomal genes, is introduced as a robust and reproducible prognostic model for stratifying HCC

Introduction
Hepatocellular carcinoma (HCC) ranks fifth among solid tumors and causes 70 000 annual deaths worldwide. It is the third leading cause of cancer-related mortality in males and is most prevalent in Asia and Africa. Unlike many solid tumors, the incidence and mortality of HCC have increased over the past decade (Ashtari et al., 2015). The absence or substantial progress of effective HCC therapies is indicated by the mortality rate, which is equivalent to the incidence rate in most countries (Bruix and Sherman, 2011).
Notable disease treatment challenges include the high heterogeneity of the primary tumor (PT) and the pathophysiological status of the adjacent non-malignant tissue (AT), which affects 70% of patients after resection or local ablation (Llovet et al., 2005).
Recent studies of various cancers (including HCC) have incorporated global molecular profiling using various 'omics' platforms (Wang et al., 2015a). These studies have enabled the development of multiple multigene prognostic biomarkers for stratifying cancer patients into risk subgroups that are relevant for potential treatments (Hoshida et al., 2012). Compared to several other cancers (e.g. breast, prostate and hematological), the molecular markers are not used for the diagnosis or determination of prognosis and treatment for HCC patients. Thus, evidence-based molecular markers that could accurately and reproducibly predict survival time and response to treatment must be identified (Bruix et al., 2016).
However, in this respect, significant challenges might reflect certain caveats: (a) poor clinical reproducibility (e.g. when a biomarker fails in an independent cohort validation) and/or poor genetic reproducibility (e.g. different enriched gene sets/deregulated pathways in an independent validation cohort), which limits or confounds the clinical and therapeutic utility of the biomarker, and (b) high diversity in the genetic status and high technical (e.g. different 'omics' platforms) and/or clinicopathological cross-cohort variability from independent clinical centers. To overcome these obstacles, a workflow for biomarker selection that identifies the most etiologically and pathobiologically essential genes, gene products and biological processes with high reproducibility and prognostically confident molecular patterns is needed.
It has been reported that AT substantially contributes to HCC and has an independent prognostic value (Hoshida et al., 2008), presumably reflecting the de novo multicentric occurrence of HCC in cirrhotic tissue, which impacts late HCC recurrence (> 2 years recurrence-free survival). Alternatively, PT cells that remain after resection can disseminate across the AT of specific HCC patients to contribute to early HCC recurrence (≤ 2 years of recurrence-free survival) (Hoshida et al., 2008). However, the relationships between the PT and AT can be more complex.
A resurging interest in cancer cell and host tissue interactions, including metabolism, biogenesis and secreted metabolites, metabolic reprogramming in the systemic modulating of gene expression, and signaling pathways in joint cancer-host tissue compartments, has been observed in recent years. According to the 'field cancerization' model (Vauthey et al., 2000), pathological and genetic changes in tissues peripheral to a tumor could result from 'preconditioning' of the affected organ by various carcinogenic agents. For example, most HCC arise in the background of chronic liver disease [e.g. hepatitis B virus (HBV) or hepatitis C virus (HCV) infection, hepatitis and cirrhosis]. After the surgical resection of PT in the absence of effective therapy of the background medical condition(s), the significant probability of the appearance and development of tumor(s) in field cancerization may be similar to those that prompt the primary HCC.
Alternatively, PT cell growth and progression can modulate the host tissue metabolic pathways and induce epigenetic reprogramming in AT (Skill et al., 2011). These changes often lead to tumor-like functional modulations of the gene expression profiles in non-malignant tissue cells (Lou et al., 2009). For example, the DNA methylation status of a tumor suppressor regulatory signal(s) in the AT can not only be distinct from that of the normal (Arai et al., 2009) and cirrhotic liver tissues of non-HCC individuals, but also similar to that of the PT (Lou et al., 2009). In addition, multiple metabolic coupling between PT and AT has been described. Experimental models have been developed (Shapot and Potapova, 1986;Shapot et al., 1972) of tumor-host tissue metabolic relationships demonstrating the competition of the tumor and host tissues for RNA precursors and other metabolites. These studies also characterized a phenomenon of the tumor capacity to act as 'a trap' for nitrogen and glucose, leading to tumor-like host cell metabolism and host tissue metabolic 'assimilation' peculiarities in glycolysis and RNA biosynthesis.
Cell-cell interactions, external regulatory signaling factors and the active transport of biomolecules that are secreted from PT via the exosomes/microvesicles may induce and maintain the specific tumor-driven pro-oncogenic biological processes and signaling pathways relevant to RNA biosynthesis and the ribosomal biogenesis in AT. We propose the presence of multiple tumor-producing molecular signals, secreting factors and tumor-induced external metabolite trapping processes that collectively establish tumor-like common RNA biosynthesis and ribosome biogenesis patterns and specific steady-state aberrant pathways in AT cells. The interactions between tumor and host tissue compartments assume the co-microevolution of metabolic processes in AT and AT cell populations during the course of cancer cell metabolism, genetic and epigenetic alterations, cell growth, death and migration, clonal selection and other factors of tumor progression.
In an effort to identify HCC biomarkers, many studies have developed models of gene expression profiling to discover biomarkers either in PT or AT samples (Hoshida et al., 2012). Some studies have additionally used the differentially expressed genes (DEGs) between PT and AT as prognostic biomarkers at the prognostic variable pre-selection step (Mah et al., 2014;Orimo et al., 2008). By contrast, in the present study, we hypothesize that the next-generation prognostic biomarkers for HCC prognosis and prediction can be determined via the automatic selection of a subset(s) of co-expressed survival significant genes in PT and AT pairs. This strategy uses the concepts of (a) field cancerization and (b) PT-initiated biogenesis assimilation in AT compartments as systemically affected by aggressively growing PT compartments. To our knowledge, such an approach to tumor-host tissue co-evolution analysis has not been considered previously in the literature in the context of prognostic/predictive biomarker discovery and disease risk stratification.
We aimed to develop a prognostic biomarker discovery model that would (a) identify common pro-oncogenic biological processes induced by aggressive PT growth in PT and AT and (b) select variables predicting HCC patient risks after curative resection. We suggested that the expression of potential prognostic genes pre-selected in the cells of AT, as a non-malignant cell population, was more genetically stable and less vulnerable to oncogenic reprogramming compared to the pre-selected genes in the PT cell population. We also assumed that the identification of statistically enriched gene subsets/pathways/biological mechanisms consistent in AT and PT pairs is another important step toward further shortlisting the most biologically/ pathologically essential gene candidates for biomarkers. We expected that the use of these two strict pathobiology-based pre-selection criteria would eliminate numerous indirectly correlated but non-essential gene confounders with less or nonreproducible mechanistic functions and prognostic abilities.
In the present study, we specified both a genetic and metabolic preconditioning 'field cancerization' model in PT and AT and a PT-like AT cell ribosomal biogenesis model. We analyzed the role of gene expression patterns in cancer predisposition and metabolism, coupling cancer and non-malignant host tissues as an interconnected two-compartmental tissue cell population system.
We developed a hypothesis-driven HCC biomarker discovery and statistically-based computationally prediction method. We selected a specific group of common prognostic genes (CPGs) co-expressed in both PT and AT, generating HCC patient risk stratification into relatively favorable and unfavorable disease outcome subgroups. The CPG analysis enabled the identification of translation elongation and ribosomal (TER) gene subsets with common pro-oncogenic prognostic patterns as the most over-represented gene subsets in PT and AT. We showed how the data analysis, integrating paired PT and AT samples, identified the HCC prognostic ribosomal gene classifier (RGC). This prognostic classifier demonstrated high confidence, robustness and reproducibility across the studied cohorts and identified the clinically and pathobiologically reproducible low-and high-risk patient subgroups with thoroughly characterized and druggable deregulated biological pathways. In summary, these results highlight the importance of the deregulation of the TER pathway in HCC, comprising the essential pathological feature common to PT and AT. We discuss the clinical perspectives of these findings and the potential applications of these tools and also propose a novel strategy for the identification of uniformly co-regulated, pathway-specific, statistically reliable and reproducible prognostic biomarkers.

HCC patients and tissue samples
We retrospectively analyzed the hepatic tissue samples of HCC patients from Singapore, which were collected after surgical treatment with informed consent from the patients. The patients underwent radical resection and follow-up between 2000 and 2013 at Singapore General Hospital, and their hepatic tissue samples were collected at the National Cancer Centre of Singapore/SingHealth Tissue Repository. HCC diagnosis and treatment for the HCC patients were based on established histological criteria (International Working Party, 1995). After surgical treatment, patients were followed up at least once every 3-6 months. Other post-surgery patient treatments included imaging and a fetal protein (AFP) monitoring on each follow-up. The Singapore HCC cohort (Singapore cohort) presented here included only HCC patients diagnosed with resectable HCC, with good liver function (Child status A or B), adequate future liver remnant and good general health. The HCC patients with unresectable lesions with poor liver function or general health  were excluded from the analysis. Linked clinical and histopathology data collected from the patient medical records were rendered anonymous. The study was approved by the Sin-gHealth Institutional Review Board. Cost and practical issues restricted the primary sample size in the present study to 125 HCC patients. PT and AT liver samples obtained at the time of definitive surgery were snap-frozen and preserved at -80°C. The inclusion of matching AT samples in the study was based only on the availability of the samples from the SingHealth Tissue Repository and satisfactory results of RNA and microarray quality control. A paired sample design was used and clinical data were available for analysis only after re-identification of the RNA samples and full completion of microarray profiling.
Median follow-up in the Singapore cohort was 1.17 years. The detailed patient and tumor features of the Singapore and the Liver Cancer Institute (LCI) cohorts are presented in Tables 1 and S1.

Study design and endpoint
The pressent study complied with the recommendations for reporting prognostic cancer biomarkers according to the REMARK statement (McShane et al., 2005) (Table S2) and the guidelines for evaluating prognostic biomarkers (Simon et al., 2009). The prognostic biomarker(s) demonstrated certain traits: (a) significant prognostic power in the training and validation sample sets; (b) statistically independent prognostic value in a multivariate analysis that included known clinicopathological predictive variables; and (c) significant predictive power confirmed in an external cohort reported by independent investigators using the same technology. The primary endpoint of the study was the patient overall survival (OS). OS was defined as the time between surgical resection and death of any cause at last follow-up. Other measures of patient benefits from use of prognostic biomarkers, such as disease-free or recurrence-free survival, were not used because they are surrogate to OS and may not always translate to longer OS Llovet et al., 2008a). Additionally, OS proved to be useful in large and successful targeted drug therapy trials of HCC (Llovet et al., 2008b).

Patient tissue sample processing, gene expression microarrays and quantitative RT-PCR analysis methods
All tissue samples were uniformly homogenized using a TissueLyser LT from Qiagen (Germantown, MD, USA) in accordance with the manufacturer's instructions. RNA isolation was performed for all tissue lysates. Sample preparation and hybridization of labeled cRNA to the HumanHT-12 v4 Expression BeadChip arrays (Illumina, Inc., San Diego, CA, USA) were conducted in accordance with the manufacturer's instructions. Data from 10 patients were excluded as a result of low RNA or microarray preparation quality (Kauffmann et al., 2009). Finally, we used 115 PT samples and 52 AT samples matched to 52 corresponding PT. After the completion of RNA and microarray quality control, the included samples were matched to corresponding clinical data and reidentified before data analysis. The quantitative PCR experiments were conducted using a QuantStudio TM 6 Flex Real-Time PCR system in accordance with the standard instructions for Power SYBR Green master mix from ABI systems (Applied Biosystems, Foster City, CA, USA). The qRT PCR signals were normalized with standard reference TBP and relative fold change abundances for desired genes were estimated.
The microarray data for the gene expression profiling of the HCC patient samples are publicly available at GEO: GSE76427. As a validation HCC cohort, we used publicly available data for the 206 HCC patients from the Liver Cancer Institute (the LCI HCC cohort, Fudan University, Shanghai, China, GSE14520) (Roessler et al., 2012) that passed the same microarray quality assessment (Kauffmann et al., 2009). We selected the LCI dataset for the Singapore data analysis validation because (a) the LCI group is a relatively large publicly available gene expression dataset (includes 206 paired PT-AT samples); (b) clinical data including patient survival data have been available for these 206 patients, allowing survival prediction analysis for the PT-AT paired expression dataset to be carried out; (c) Singapore and LCI datasets include mostly Chinese/Asian patients; and (d) patients in both groups have a predominantly HBV-driven ethology,

Statistical and bioinformatics analyses
To select the survival significant genes and to stratify patients into the relatively LR and HR HCC subgroups, we used the one-dimensional data-driven grouping (1-D DDg) method that was previously developed and successfully used for survival prognosis (Chan et al., 2012;Grinchuk et al., 2015;Motakis et al., 2009). We defined the CPG as the 1-D DDg-defined survival significant gene, and the expression value was binarized using a 1-D DDg cut-off value, with patients being stratified into low-risk (LR) or high-risk (HR) subgroups for identical PT and AT samples (e.g. either exclusively tumor suppressor-like or exclusively prooncogenic in both tissue types from the same patient).
The statistically weighted voting grouping (SWVg) method is a variable selection and multivariate prediction statistically-based voting prediction method (Kuznetsov et al., 2006) that uses the 1-D DDg-derived binary variables (predictors) of the patient risk groups (LR and HR) as an input dataset (Motakis et al., 2009). Following 1-D DDg, the input file is used to obtain the statistical voting stratification of a patient from the grouping information generated using the binary variables. For each patient, the SWVg score is calculated based on an optimized number of statistically weighted votes of the binary variables (Chen et al., 2017). The estimated SWVg score cut-off was determined by maximizing the significance of the patient separation into HR and LR subgroups. For each patient, the SWVg calculates the prognostic score quantifying the risk of disease development, rank orders the patients by their SWVg scores and separates the patients into the LR and HR subgroups according to the estimated cut-off score.
DEGs were identified using the GenePattern portal (Reich et al., 2006) and the significance of the RGC subgroup similarity was estimated using the 'SubMap' module in the portal (Hoshida et al., 2007). For Functional Annotation and Gene Ontology (FA/GO) and Pathway Maps analyses, we used either DAVID bioinformatics (da Huang et al., 2009a) or MetaCore (https://portal.genego.com/).
A support vector regression (SVR) analysis of the immunohistochemical section images from formalinfixed paraffin-embedded PT was performed as described previously (Smola and Vapnik, 1997). Briefly, each of the six representative HCC patients (three from the HR T RGC subgroup and three from the LR T RGC subgroup) had 54 (6 9 9) sub-images in the dataset. All sub-images from the LR T and HR T patients were labeled '0' and '1', respectively. This dataset was labeled with RGB covariance matrix-based features and used to train and test the SVR system. The image-processing pipeline, covariance-based feature extraction system and SVR system were implemented using IMAGEJ, OpenCV, C++ and R (see Supporting information: Image based analysis of immunohistochemistry slides). Categorical variables were analyzed using a two-sided Fisher's exact test or the Freeman-Halton extension of Fisher's exact test. The Mann-Whitney U-test was applied for continuous variables (CYTELSTUDIO, version 9; Cytel, Inc., Cambridge, MA, USA). Confidence intervals for the proportions of agreement were calculated according to the Wilson efficient-score method (Newcombe, 1998). Univariate and multivariate analyses were performed using the Cox proportional hazards regression model and the R 'survival' package.

ChIP-seq binding regions analysis in the proximal promoters of the RGC and other gene sets
To support the in silico prediction of the regulatory roles of MYC in the RGC and other gene sets, we used the publicly available ChIP-seq data for MYC

Identification of common prognostic genes in PT and AT samples and their specific biological characteristics
We retrospectively analyzed the clinical data and the hepatic tissue samples from 125 HCC patients (Singapore cohort) collected after surgical treatment. The clinicopathological characteristics of the HCC patients are presented in Tables 1 and S1. Figure 1A shows a conventional data analysis schema to identify prognostic biomarkers when a subset of survival significant genes is selected only in either PT or AT to identify diagnostic and prognostic biomarkers. This method may also use DEGs in the analysis and preselection of candidate prognostic genes ( Fig. 1B) (Mah et al., 2014;Orimo et al., 2008). By contrast, the predictor selection and prognostic method identifies the genes (represented by transcript isoforms) exhibiting common expression patterns in both the PT and AT samples termed PT-AT coexpressed CPGs (see Materials and methods). Here, we describe the procedure for selecting the CPGs used to construct the CPG-based prognostic classifier to predict HCC patient risks after curative resection.
By definition, the tumor-suppressor-like gene has higher expression values in a tissue sample set (e.g. PT or AT) than the corresponding 1D-DDg-defined gene expression cut-off value and the patient belongs to a LR subgroup (1-D DDg design 1). The pro-oncogenic gene has higher expression values in a tissue sample set than the 1D-DDg-defined gene expression cut-off value and the patient belongs to the HR subgroup (1-D DDg design 2). Figure S1 shows the results of the 1-D DDg method. We selected fructose-1,6-bisphosphatase 1, FBP1, encoding the gluconeogenesis regulatory enzyme, acting as a rate-limiting enzyme in gluconeogenesis and a well-known tumor suppressor in HCC (Hirata et al., 2016;Wang et al., 2014), and fibroblast growth factor 1, FGF1, encoding a well-known prooncogenic growth factor in HCC (Lee et al., 2015;Yang et al., 2017). Figure S1A shows the -log e P-value distribution over the gene expression level domain, and Fig. S1B shows two Kaplan-Maier survival functions representing the relatively LR and HR subgroups derived from 1-D DDg at the optimized gene expression cut-off value of FBP1 indicated in Fig. S1A. As expected, FBP1 exhibited a tumor suppressor-like expression pattern. In the case of FGF1, a pro-oncogenic expression pattern was observed (Fig. S1C,D). Notably, the terms 'pro-oncogenic' and 'tumor suppressor-like' are used in the specific context of the survival prognosis analysis (Fig. S1). However, such gene classifications are useful and often correlated with the functional classification of many known tumor suppressors and oncogenes, which are respectively defined based on the pathobiological roles of the genes and gene products in malignant cells (Chen et al., 2017).
We designated the CPG as a survival-significant gene, for which (a) gene expression was binarized according to the 1-D DDg gene expression cut-off value; (b) the gene expression cut-off value was used to stratify the cohort into LR and HR patient subgroups; and (c) the patient survival pattern in the given cohort was identical for PT and AT samples (e.g. either exclusively tumor suppressor-like or exclusively pro-oncogenic in both tissue types) (Fig. 1E). Figure S2 shows the main steps of the workflow, including descriptions of the input and output data sets and the analytical and experimental methods. We used the Singapore cohort to select CPGs observed in the 52 PT-AT paired samples. We used a multicriteria approach.
In the first step of the CPG selection process, we used 1-D DDg values as relatively weak selection criteria for the potential prognostic variables (Wald's statistics P ≤ 0.1), assuming that the next-level filters for accuracy, robustness and reproducibility criteria will enable optimization of the number and combination of high-confidence predictors composing the final prognostic signature. The Venn diagram analysis of the 1-D DDg-based whole transcriptome profile screen in PT and AT samples resulted in the identification of a large common gene subset (Fig. 1D) comprising 2390 unique (Hg19) gene IDs.
Next, we applied a FA/GO analysis using DAVID, version 6.7 (da Huang et al., 2009a) to shortlist the most significantly enriched biological processes and pathways, probably related to the most biologically or pathologically essential CPG subsets (Fig. S2). We identified the gene subsets of the survival significant pro-oncogenic (e.g. Figs 1E and S3A,B) and tumor suppressor-like (e.g. Fig. S3C,D) CPGs. We intended to use a sample size-balanced and multicriteria approach to select unbiased, specific and robust prognostic variables. The 1000 top most significant genes in each CPG subset have been selected. We did this because of a probable effect of sample disbalance: different gene list sizes may lead to a bias in the enrichment analysis test. This sample disbalance also may affect the ranking of large-sized GO categories, making it difficult to compare gene categories in the gene lists (da Huang et al., 2009b).
For the pro-oncogenic CPG subset, using DAVID, version 6.7, we found that most confidence biological process GO terms are determined by the genes encoding translational elongation proteins (P = 2.8 9 10 À25 ) and ribosome (P = 5.2 9 10 À19 ). We also found the moderately enriched term for mitochondrion (P = 0.01) (Fig. 1F). Figure 1E shows the results of the survival prediction analysis for the CPG RPS3A, the TER gene co-expressed in PT and AT samples. In each tissue type, RPS3A showed a pro-oncogenic prognostic pattern. The proportion of agreement between the PT and AT stratifications was 0.731 at the 95% confidence interval (CI) (0.587-0.84). Interestingly, no significant FA/GO terms were observed among the top 1000 survival significant tumor suppressor-like CPGs. Figure S3 shows two examples of the pro-oncogenic ribosomal CPG RPL3 and the tumor Biomarker candidates may be pre-selected using DEG analysis between PT and AT, followed by survival prognostic analysis. (C, D) Scheme for identification of tumor suppressor-like and pro-oncogenic CPGs in the Singapore HCC cohort using the 1-D DDg method for survival prognostic analysis. Only the CPG subsets with identical 1-D DDg design 1 (tumor suppressor-like CPGs) or only with 1-D DDg design 2 (pro-oncogenic CPGs; see Materials and methods) in both PT and AT were selected. (E) An example of pro-oncogenic CPG RPS3A identified in the Singapore HCC cohort. Kaplan-Meier survival curves were obtained using 1-D DDg by fitting the expression values to survival data. Analyses were performed independently in PT and AT for each gene in the Singapore (n = 52) HCC cohort. Vertical bars and P-values show the significant difference in the level of gene expression between the LR and HR patient subgroups (Mann-Whitney test). (F) FA/GO analysis of TER genes in two distinct biological contexts (DAVID bioinformatics software). The results of the FA/GO enrichment analysis are presented for the pro-oncogenic CPGs subset (1)  suppressor-like SPOP CPG according to the 1-D DDg analysis. Typically, the survival patterns predicted by CPG in PT-AT pairs were significantly concordant. For example, the proportion of agreement between the PT and AT stratifications given by the 1-D DDg SPOP classifier was 0.673 at the 95% CI (0.528-0.792) and that given by the 1-D DDg RPL3 classifier was 0.67 at the 95% CI (0.492-0.767). These findings suggest the pathobiological importance of SPOP, RPL3 and other translational elongation and ribosome biogenesis CPGs in AT as a novel class of potential non-malignant tissue clinical biomarkers for malignancy diagnostics, the prognosis of tumor aggressiveness and 'anti-cancerization' tissue therapeutic targeting.
We also analyzed the top 1000 significantly DEGs up-regulated in PT compared to AT (t-test q < 0.05; 115 PT versus 52 AT samples, respectively) (Fig. 1B). By contrast to the pro-oncogenic CPG subset, the list of the top 1000 DEGs up-regulated in PT compared to AT, included, as expected, cell cycle, cell division and mitosis gene enrichment; the FA/GO terms essentially predominated over the other FA/GO terms, including TER gene terms (Fig. 1F) (i.e. the FA/GO terms 'GO:0000278~mitotic cell cycle' with P = 3.7 9 10 À38 and 'GO:0006414~translational elongation' with P = 2.9 9 10 À4 ). Independently, PANTHER (Thomas et al., 2006) identified similar highly enriched and significant FA/GO terms (Fig. S4).
Thus, these findings suggest that, in AT, the genes encoding translation elongation, ribosome machinery components and ribosomal biogenesis in general might pre-exist in malignant predisposition (e.g. pre-cancer tissue initiated by mutations, viral, metabolic or epigenetic reprogramming) and/or PT-activated 'cancerization' behaviors. In both cases, pro-tumorigenic cellular/tissue behavior in AT may be the result of switching-on/exerting cellular extra-ribosomal functions (Coulouarn et al., 2006;Kim et al., 2004;Lindstrom, 2009;Wang et al., 2015b). In pathobiological and clinical contexts, the TER CPGs could be considered novel and perspective diagnostic factors of hostcancer interactions and prognostic biomarkers. The prognostic utility of TER CPGs is based on the detection of similar pathobiological alterations and disease outcome prediction patterns in PT and AT. The differences in the expression profiles between PT and AT, as expected, are predominantly defined by cell cycle/mitosis genes. As described below, we performed several basic computational and experimental analyses addressing the pathobiology characteristics of TER CPGs and their clinical significance.

RGC: identification, robustness and reproducibility across cohorts
Using Singapore cohort gene expression and patient's survival data, we tested the utility of the identified CPGs as multigene prognostic HCC biomarkers. Based on the previous steps of the workflow (Fig. S2), 'GO:0006412~translational elongation' displayed the strong term enrichment P-values among the other CPG GO terms (Figs. 1F and S4). We used the 44 gene symbols under this term, referred to as 'TER genes (Singapore)' (Table S3). The 44 gene subset includes two translational elongation genes (EEF1A1 and EEF1B2); the other 42 genes were specified by cellular component category as 'GO:0005840~ribosome'. By using the same method as we applied in Singapore cohort, we identified stratification cut-off values of individual genes and selected pro-oncogenic CPGs in the LGI cohort PT -AT paired samples. We also carried out GO enrichment analysis. As result, we identified 60 CPGs (Table S4) specified the highest enriched gene term 'GO:0006412-transcriptional elongation' (Fig. S5A). The next most significant GO term was 'GO:0005840~ribosome'. Remarkably, an independent identification of the pro-oncogenic CPGs in the Singapore and LCI cohorts led to a highly significant overlapping between the CPGs identified in the LCI and Singapore cohorts (hypergeometric test, P = 1.5 9 10 80 ; Fig. S5B). Next, we applied 1-D DDg to the 115 available PT samples in Singapore cohort as a discovery dataset and selected the CPGs satisfying the Wald statistics cut-off value at P < 0.05. As result of SWVg implementation (see Methods), the 24 most prognostically significant ribosome genes were selected from the PT TER genes (Singapore) and formed our HCC prognostic model (Table S5A). Table 2 provides an annotation of these 24 ribosomal genes. The integrated SWVg risk score was calculated for each individual patient. The SWVg scores for all patients were refitted to the survival data, and the optimal SWVg score cut-off value for patient stratification was determined (cut-off = 1.42; Fig. 2A). The classifier successfully stratified the Singapore HCC patients using either PT (Fig. 2C) or AT (Fig. 3A) microarray data sets: P = 9.3 9 10 À6 , hazard ratio = 8.20 (3.24-20.8) for PT-based prognostic stratification and P = 0.03, hazard ratio = 4.97 (1.21-20.35) for AT-based prognostic stratification. We referred to this new CPG-based HCC classifier as the RGC.
Next, to test the reproducibility and robustness of the RGC, we performed a validation analysis of the classifier using data from an independent HCC cohort (LCI cohort, 206 HCC patients, Methods). In the contexts of robustness and reproducibility of the potential RGC predictors, we carried out a comparison of the 1D DDg results obtained in Singapore and LGI datasets. Our results, presented in Table S5 suggest a high probability of occurrence of the same survival significant genes in PT of LGI cohort and ability to use Singapore PT data as a training set to develop multigene prognostic signature(s) (see also Supporting information: Comparison of 1D DDg results obtained in the Singapore and LGI datasets). The individual stratification gene expression cut-off values obtained from the 1-D DDg and the SWVg risk score cut-off value (cut-off = 1.42) in the Singapore cohort were used for LCI patient stratification (Fig. 2B). The strict fixation of the pro-oncogenic type of prognosis variables and parameter values in the 1-D DDg and SWVg procedures in the Singapore cohort (training) enabled a prognosis prediction blinded to the survival data (defined by overall survival, OS) in the validation (LCI) cohort (at Wald P = 0.004) (Fig. 2D). The stratification results in two datasets Singapore and LCI (Fig. 2C,D) can be represented by the fractions of patients within each cohort of particular interest, such as HR. This event is represented in the Singapore and LCI cohorts with the proportions 38/115 = 0.33 and 53/206 = 0.26, respectively. We found that the 95% CI for the difference between these proportions was not significant, which suggests subpopulation similarity across different cohorts. Independently, the intercohort subgroups agreement analysis using the Sub-Map algorithm (Hoshida et al., 2007) revealed that the ordered HR T and LR T subgroups based on the results of patient stratification displayed significant agreement between the Singapore and LCI cohorts (Fig. 2E).
Thus, these results suggest that the RGC enables the identification of the robust predicting system for HCC inter-cohort prognosis.
Finally, we also addressed the question of whether the highly enriched DEGs differentiating PT and AT could provide a reproducible prognostic signature of HCC. Using Singapore data, we selected the DEG subset identified under the three most highly enriched FA/GO terms ('GO:0000278~mitotic cell cycle', 'GO:0051301~cell division' and 'GO:0007067~mitotic nuclear division') (Fig. 1F). These genes were further processed using 1-D DDg, as described in the Materials and methods. We further selected the genes that displayed significant Wald P-values (P < 0.05) in the Kaplan-Meier survival analysis (Table S6). Next, SWVg analysis generated a 41-gene prognostic signature represented by mitosis/cell cycle genes (Table S6). The 41-gene prognostic signature displayed significant HCC patient partitioning in the Singapore cohort (Fig. S6A). However, this signature failed in validation in the LCI cohort (using the same procedure as that used for the RGC described above) (Fig. S6B). Non-significant result we also observed in AT of LCI cohort (Fig. S5D versus Fig. S5C). The lack of reproducibility could be explained by the relatively high inter-cohort variation in the environmental, ethnic, clinical and pathological parameters that are relevant to cell cycle/mitotic gene expression and relevant pathways. By contrast, the RGC signature (based on the CPG model; Table S5A) led to reproducible predictions and consistent results for individual predictor genes (see Tables 2 and S5A) and their multigene classifier (Fig. 2C,D; Table S5A).
Altogether, these results indicate the relatively high robustness of the prognostic model to inter-cohort variations in environmental, ethnic, clinical and pathological parameters to CPGs expression and relevant pathways.

RGC is an independent prognostic factor for HCC progression
The list of clinicopathological parameters available for analysis in Singapore and LCI cohorts is presented in Table 1. For many of these parameters, we detected significant differences in distributions between Singapore and LCI cohorts (Table 1; see also the Supporting information: Comparison of standard clinicopathological parameters between the Singapore and LCI cohorts). Such differences between the cohorts could probably be explained by the differences in the environmental factors, healthcare systems and non-identical study design between LCI (Roessler  et al., 2010) and Singapore cohorts and/or various accepted HCC patient diagnoses and treatment guidelines between LCI and Singapore cohorts (Han et al., 2011). We used univariate and multivariate Cox regression analyses to compare the prognostic performance of the RGC with these clinical factors (Table 3).
In the Singapore cohort, the parameters metastasis and extra-hepatic invasion were excluded from the analysis as a result of their low informativeness. HCV status was excluded because of a substantial amount of missing data in the Singapore and LCI cohorts (Table 1).

The PT RGC prognostic pattern is reproducible in AT data
We tested the performance of the RGC using the AT expression data of the Singapore (training) and LCI cohorts (validation). For prognosis using AT data, we implemented the same stratification algorithm as implemented for the discovery of RGC in the analysis of PT data (Fig. 2). The use of AT data for RGC training resulted in a significant Singapore HCC patient stratification (the HR A and LR A subgroups, Wald P < 0.05) (Fig. 3A). Additionally, the difference between the agreement scores of the HR proportions between PT and AT was not significant, suggesting a similarity of stratification between the PT and AT using RGCs in the Singapore cohort. The strict fixation of the pro-oncogenic type of prognosis variables and parameter values in the 1-D DDg and SWVg procedures in the Singapore cohort (training) enabled a prognosis prediction blinded to the survival data in the validation (LCI) cohort (at Wald P = 0.03) (Fig. 3B).
Notably, the 41-gene signature based on cell cycle genes generated from DEGs between PT and AT (Table S6) failed in cross-cohort validation in AT (Fig. S6C,D) and would have questionable clinical utility.

RGC power can be improved by combining PT and AT gene expression data
Next, we proposed that the grouping of common LR patients from relatively low-risk LR T and LT A prognosis subgroups identified in PT and AT could improve the stratifying power of the RGC. We combined personalized RGC data from the PT and AT datasets and constructed HR T&A and LR T&A subgroups (Fig. 3C). Using LCI data as an example, we observed a positive interaction effect. The Wald P-value in the combined classifier was P = 1.2 9 10 À4 (Fig. 3D) compared to P = 0.003 and P = 0.03 in the PT-based (Fig. 2D) and AT-based (Fig. 3B) stratification prognostic models, respectively. We also tested the other combination scenario, alternative to that shown in Fig. 3C, stratifying common HR patients from HR A and HR T subgroups as a combined HR T&A patient subgroup. However, an imbalanced stratification into risk subgroups (i.e. n = 196 in the LR subgroup and n = 10 in the HR subgroup) was observed that limits predictive power (not shown data). We concluded that the prognostic power of the original PT-based RGC can be improved using RGC obtained from AT.

The RGC-based stratification of HCC patients reveals deregulated pathways in PT and AT
The deregulated pathway associated with RGC-defined HCC patient risk subgroups might be useful for designing new molecular targets and developing therapies for RGC-stratified HCC patients. Using DAVID software, the FA/GO enrichment analysis of the upregulated DEGs observed in the HR subgroups (HR T and HR A ) revealed FA/GO terms related to ribosomes, translation and the cell cycle in both the PT and AT, although the high enrichment of terms related to cytoplasmic vesicles was only observed in AT (Fig. 4A). The DEGs down-regulated in the HR T and HR A subgroups displayed various common enriched terms, including deregulated liver metabolism [FA/GO terms 'lipid metabolism,' 'carboxylic acid metabolic process' (SP_PIR_KEYWORDS)] and mitochondrial dysfunction [e.g. FA/GO terms 'GO:0005739~mitochondrion' and 'oxidoreductase' (SP_PIR_KEYWORDS)] (Fig. 4A).
TGFBR2 and MYC were significantly overexpressed in the up-regulated DEGs of both HR T and HR A patient subgroups (Tables S8 and S9). Our results, shown in Fig. 5 and Supporting information (section: MYC as a key regulator of ribosomal pathway in HCC PT and AT), provide the evidences that MYC is a key positive transcription regulator of the RGC and TER genes in PT and AT. MetaCore identified MYC as the most significant transcription factor interactor among the DEGs in the both PT and AT ( Fig. 5D and Tables S8 and S9). Furthermore, combining gene expression correlation, FA/GO and ChIPseq analyses revealed that MYC might be the primary positive transcription regulator of the genes involved in the TER pathway ( Fig. 5; Tables S3 and S10) in both PT and AT cells (Figs 5C and S9; see also the Supporting information: MYC as a key regulator of ribosomal pathway in HCC PT and AT).
MetaCore Pathway Map analysis suggested that DNA double-strand break repair was characteristic of the PT but not AT (Fig. S7B). In both PT and AT, we identified the enrichment for cell cycle, cytoskeleton remodeling, and vascular endothelial growth factor and transforming growth factor-b receptor signaling. The 'GO localization' category identified common GO terms for PT and AT for extracellular vesicles and exosomes, cell adherence junctions, focal adhesions and ribosomes (Fig. 4B), which suggested a deregulation of extracellular exosomal transport in the HR T and HR A HCC subgroups. For the PT-specific gene subset, we found no association with extracellular vesicles. However, the AT-specific subset was additionally associated with extracellular exosomes and endosomes. In the common PT and AT gene subset, 15 RGC genes were annotated under the MetaCore term 'extracellular exosome' (Fig. S7C). In the AT-specific subset, at least eleven genes annotated under the term 'endosome' have previously been associated with endosome-specific functions (Fig. S7C and Table S11). Within the up-regulated AT-specific DEGs subset, we identified multiple enriched Pathway Maps, indicating an activated immune response in the AT (Fig. S7B). Importantly, the comparison between nonstratified PT and AT in the Singapore and LCI cohorts revealed global systematic differences in the gene expression profiles between these tissue types for common PT and AT prognostic pathways (Fig. 6A,B). However, after RGCbased stratification, the representative common prognostic pathways (TER, cell cycle/cell division, extracellular exosome, focal adhesion and liver metabolism) (Table S12) displayed distinct characteristic gene expression patterns within the PT and AT (Fig. 6C,D).

Validation of the RGC and RGC-associated genes using quantitative RT-PCR in the Singapore cohort
We assessed the reproducibility of the gene expression measurements in the microarray platform with quantitative RT-PCR experiments using RNA samples from the Singapore cohort. We selected eight representative ribosomal genes from the RGC, seven representative up-regulated WNT pathway genes and x-axis: Kendall's Tau correlation coefficient; y-axis: cumulative relative frequency. Black circles: correlation coefficients for TER gene sets ('TER gene set (Singapore)' and 'TER gene set (LCI)'; white circles: correlation coefficients for random control gene set (see also the Supporting information: MYC as a key regulator of ribosomal pathway in HCC PT and AT). Dashed lines indicate medians for correlation coefficients distributions. (C) Frequencies of MYC ChIP-seq binding regions in HepG2 cells in the vicinity of proximal promoters (+200/À500 bp) in TER gene sets and DEGs up-regulated in HR T subgroups. x-axis: various gene sets; y-axis: frequency of ChIP-seq binding regions (%). Differences in the frequencies assessed using Fisher's exact test (see the Supporting information: MYC as a key regulator of ribosomal pathway in HCC PT and AT). (D) MetaCore transcription factors interactors significantly enriched for the DEGs after RGC stratification in PT and AT. x-axis: Àlog 10 transformation of FDR-corrected P-values (P < 0.05); y-axis: MetaCore terms for transcription factor (TF) interactors. (E) Correlation analysis of MYC with eight representative TER genes and five genes involved in liver metabolism based on quantitative RT-PCR data from 92 PT samples from the Singapore cohort. PCC, Pearson's correlation coefficient. (F) Hypothetical data-driven model of the TER pathway in PT and AT of HCC patients. Red and blue arrows: genes up-regulated and genes down-regulated in HR HCC patient subgroups, respectively. TER, translation elongation/ribosomal genes; CC/CD, cell cycle/cell division genes; EE and FA, extracellular exosome and focal adhesion genes, respectively. LM, genes involved in liver metabolism. five representative down-regulated genes involved in liver metabolism in the HR T subgroups in both cohorts. For the eight tested RGC genes (RPL9, RPL12, RPL26, RPL31, RPL41, RPS9, RPS25 and RPS4X) and four tested up-regulated WNT pathway genes (MYC, ENC1, CTNNB1 and LEF1), we observed a strong and significant correlation with the microarray results (Fig. S10). We also tested the consistency in differential gene expression in these samples by comparing patients from the HR T and LR T subgroups originally classified using microarray gene expression data (Fig. S11). The quantitative RT-PCR analysis indicated that the 15 up-regulated genes from the HR T subgroup (based on the microarray data, Fig. S11A,B) were up-regulated in randomly selected patients from the HR T subgroup compared to the LR T subgroup (Fig. S11D); the five down-regulated genes from the HR T subgroup (based on the microarray data) (Fig. S11C) were downregulated in randomly selected patients from the HR T subgroup compared to the LR T subgroup (Fig. S11D).
Finally, the RGC stratification power using the quantitative RT-PCR expression data was comparable to that of the microarray expression data (Fig. S11E versus Fig. 2C). In total, we analyzed PT from 92 HCC patients from Singapore cohort. The patients were selected at random. Patient's stratification was  (Table S12) carried out according to the microarray identified RGC genes, detected in PT samples by the qRT-PCR (Fig. 2C). The quantitative RT-PCR based risk stratification analysis results (Fig. S11E) suggest the applicability of our prognostic model for quantitative stratification of the LR and HR HCC patients.
3.8. DKK1 is a potential prognostic factor and drug target DKK1 was found to be the most up-regulated gene in the HR T subgroups after whole transcriptome screening. We thus also tested whether DKK1 gene expression was associated with the RGC-based classification at the protein level in the Singapore HCC cohort. First, we selected six representative HCC patients from the Singapore cohort (three from HR T and three from LR T subgroups). Second, we analyzed liver tumor FFPE section images (Methods, Supporting information: Image based analysis of immunohistochemistry slides) using a training and cross-validation protocol for the SVR system (Smola and Vapnik, 1997). We tested all possible pairs of the HR T versus LR T patients (nine pairs) by measuring the SVR scores of the corresponding tumor tissue images. As expected, the average predicted DKK1 SVR score of the LR T patient images was significantly less than that of the HR T patient images of the nine pairs (Fig. S12). We proposed DKK1 as a potential prognostic factor and drug target for a preclinical study of anti-DKK1 therapy in HCC.
These results showed the superiority or independent prognostic power of the RGC compared to other proposed multigene prognostic signatures.

Discussion
Previous studies have identified the pathobiological pathways and clinically relevant HCC prognostic classifiers that utilized PT; PT with the AT serving as a negative control; or AT alone (Hoshida et al., 2008). However, increasing evidence indicates the presence of common co-regulatory processes, signaling pathways and molecular mechanisms that drive PT and AT compartments (Polyak et al., 2009), resulting in their interconnection and complex regulatory patterns.
In the present study, we developed a prognostic stratification approach to identify common oncogenic pathways and survival significant prognostic variables in the PT and AT of HCC patients with resectable PT. This methodological approach revealed common coexpression patterns for multiple PT-and AT-associated TER pathway genes that are reproducibly associated with aggressive HCC. Further analysis suggested that this might reflect the pathophysiological mechanics of the TER pathway in HCC progression.
We identified 24-ribosomal gene classifier (termed RGC) that displayed the significant PT-associated prognostic power in the studied HCC patient cohorts. The RGC was clinically (Fig. 2C,D) and genetically (i.e. identical enriched biological gene sets) (Fig. 2E) reproducible in a large independent HCC patient dataset. The RGC retained its significant and independent prognostic power in a multivariate analysis compared to the traditional clinicopathological parameters of HCC (Table 3). A comparison of RGC with the 65gene risk signature (Kim et al., 2012), the 16-gene G1-G6 signature (Boyault et al., 2007) and the vascular invasion gene signature (Minguez et al., 2011), previously proposed for prognosis in PT, demonstrated the outperforming prognostic power of the RGC. The RGC displayed comparable and independent prognostic potential (i.e. clinical reproducibility) compared to the 5-gene score signature of Nault et al. (2013) (Fig. S13A-C), which was also developed for HCC patients with curative resection. However, in contrast to the RGC, the genetic reproducibility of the 5-gene score signature remains unclear because it has not been tested (Nault et al., 2013). Notably, the alternative proliferative 41-gene cell cycle gene signature generated from DEGs between PT and AT failed in crosscohort validation (Fig. S6), indicating the limitations of cell cycle-based DEG approach(s) for generating a reproducible HCC prognostic classifier.
RGC could be considered for prognosis using gene expression information not only from PT, but also from AT in the same HCC patients (Fig. 4A-B). Comparison of the RGC with the 186-gene survival signature, previously developed for prognosis in AT (Hoshida et al., 2008), revealed a stronger and independent prognostic value for the RGC in the studied cohorts (see Materials and methods; Fig. S13; see also the Supporting information: RGC-based HCC stratification performance). Thus, the RGC could potentially be used for pre-surgery prognostic patient stratification using AT biopsy tissue samples. The AT-based RGC stratification could minimize the risks of post-biopsy tumor dissemination and intrahepatic metastasis. Interestingly, the combined use of the RGC with the 186-gene survival signature (Fig. S13F) or the RGC alone in both PT and AT (Fig. 3D) can significantly improve the post-surgical prognostic power of the RGC.
The tumor suppressor genes FBP1 and SPOP and their products acting as CPGs in HCC livers may also be important diagnostic and prognostic biomarkers and therapeutic targets.
The data analysis of DEGs in the RGC-derived HR patient subgroups in the PT revealed the deregulation of key WNT pathway-associated genes, including DKK1, MYC, CTNNB1 and LEF1. Therefore, the RGC prognostic stratification in the PT reflects the deregulation of the upstream regulatory molecular machinery towards the WNT-b-catenin-MYC axis.
DKK1 has previously been demonstrated as a promising diagnostic and prognostic factor of HCC (Shen et al., 2012;Yu et al., 2009). After RGC-based stratification, DKK1 was the most up-regulated DEG between the HR T and LR T subgroups (PT) in both LCI and Singapore cohorts (Table S8) and was found to be a pro-oncogenic prognostic factor (Fig. S12A, B). This finding was validated using quantitative RT-PCR (Fig. S11), strongly confirming the results of previous reports. Additionally, quantitative DKK1 immunostaining demonstrated a significant association between DKK1 overexpression and the RGC classification ( Fig. S12; see the Supporting information: Image based analysis of immunochemistry slides). Because DKK1 has been demonstrated to be a pathologically essential and strong prognostic factor in several studies, including ours, DKK1 could be a potential candidate for a future preclinical study of anti-DKK1 therapy in HCC, as described for myeloma (Fulciniti et al., 2009) and osteosarcoma (Goldstein et al., 2016).
The results of the present study suggest that the subset of TER pathway genes (including the RGC genes) and multiple DEGs associated with alterations of this pathway might simultaneously be controlled by MYC in the PT and AT cells of HCC patients (Fig. 5A-E). By contrast to PT, we did not observe a significant enrichment of deregulated WNT pathway genes in AT (Fig. S7B), indicating that other factors, such as progression of viral HBV and/or chronic liver disease (Chan et al., 2004), may deregulate AT-associated MYC expression and TER pathway genes during HCC progression. Interestingly, the HBx HBV viral protein can cooperate with MYC to support ribosome biogenesis in HCC cell lines (Shukla and Kumar, 2012).
CPG-based prognostic biomarker preselection resulted in the detection of enriched FA/GO terms, where the enrichment for TER genes substantially predominated over the enrichment for cell cycle/cell division genes (Figs 1F and S4; see also the Supporting information: MYC as a key regulator of ribosomal pathway in HCC PT and AT).
Furthermore, we found that HR patient HCC subgroups after RGC stratification were characterized by a predominance of up-regulated TER genes and a subset of down-regulated liver metabolism genes (Fig. 4). In this respect, an interesting study (Coulouarn et al., 2006) described a comparison of the global gene expression profiles of liver pre-neoplastic nodules in three transgenic mouse models with the over-expression of c-Myc alone, E2f1 alone, and both E2f1/c-Myc, respectively. Interestingly, the gene expression profiles for the liver pre-neoplastic nodules detected in the c-Myc over-expression mouse model were specifically characterized by induction of the genes involved in protein synthesis and the repression of the genes regulating liver metabolism. Noteworthy, the gene expression patterns observed in HR T and HR A subgroups (Fig. 4) displayed a remarkable similarity with that of the transgenic Myc mouse model of pre-neoplastic liver nodules. We suggest that the results and RGC-based prognostic model developed in the present study probably reflect the essential pathobiological processes in early-stage MYC-driven malignization in AT liver cells, which consequently affect tumor progression and poor OS of HCC patients.
Recent studies support this notion, highlighting many examples of the tumorigenic extra-ribosomal functions of the genes involved in translation (Kim et al., 2004;Wang et al., 2015b) and their active and important roles in cell cycle regulation and apoptosis (Stumpf et al., 2013). At certain stages of tumorigenesis, TER pathway genes could play a role in pro-oncogenic dysregulation, malignization and cancer progression in liver tissue. In this context, uncontrolled cell cycle and cell division, as major features of malignancy and progression drivers distinguishing PT from AT (Fig. 1F), may be considered just as another component of the entire pathobiology process.
Details of the molecular mechanisms underlying the co-expression of pro-oncogenic TER genes in PT and AT are not understood; this could be the subject of future studies. However, this does not exclude the possibility that the TER (and many other RGC-associated genes, some of which were described and considered in the present study) is responsible for the cross-tissue interactions between PT and AT during HCC progression. For example, TGFBR2, a key extracellular signaling regulator in multiple cancers (Morris et al., 2012), was significantly overexpressed in the HR RGC subgroups in both the PT and AT. The TER pathway genes were significantly associated with genes encoding extracellular vesicular and exosomal components, focal adhesions and adherence junctions (Fig. 4), assuming the hypothetical transfer of ribosomal mRNAs and proteins between the PT and AT as the potential cellular reprogramming/malignization factors. Notably, the mRNAs of multiple TER pathway-associated genes and rRNAs are often abundant in microvesicles/exosomes secreted by cancer cells (Jenjaroenpun et al., 2013;Van Deun et al., 2014).
Thus, we propose the HCC-liver tissue interaction model (Fig. 5F) in which the up-regulation of ribosomal CPGs expression could support tumor cell growth and proliferation in PT by enhancing protein synthesis and additionally inducing similar ribosome biogenesis pathways in AT via the extra-ribosomal functions. In addition, CPG products could be involved in tumorigenic signaling pathways through activated extracellular exosomal trafficking (Fig. 5F).
The HR patient subgroups identified using the RGC in PT and AT (Figs 2 and 3) would have wide-ranging targeted treatment options throughout a single molecular cascade (Figs 4 and S8), thus enabling the further identification of novel and efficient drugs for patient subgroups with poor clinical outcomes (see also the Supporting information: Potential therapeutic intervention strategies after RGC-based HCC risk stratification). Finally, the simultaneous targeting of common prognostic genes and pathways in PT and AT may be considered as an alternative approach for potentially reducing post-surgery disease relapse.
The RGC stratification is comparatively simple (24 prognostic genes) and robust, yielding biologically consistent and reproducible patient stratification (Fig. 2) with a thoroughly characterized pathway deregulation (Figs 4 and S8) for use as a practical HCC prognostic assay.
There are limitations associated with the present study. First, the tissues and clinical data were collected retrospectively. Thus, this prognostic model needs further validation in a prospective study. Second, the present study only included HCC patients who underwent surgical resection; they are a selected group and some of our results may be not automatically extrapolated to other clinical HCC groups. For example, the RGC was cross-cohort validated in the LCI cohort, which predominantly includes HBV-related HCC patients (91%) with liver cirrhosis (92%); the RGC prognostic utility for non-HBV-related HCC patients and/or those without liver cirrhosis needs to be confirmed in another study.
In summary, our predictor selection and survival prediction analysis identified the ribosome biogenesis genes co-expressed in PT and AT from 321 HCC patients. The 44 TER CPGs could be considered as perspective HCC malignancy and prognostic biomarkers and targets for therapeutic implementations. We proposed that the PT-inducted ribosomal biogenesis associated with the activation of TER pathways leads to extracellular signaling and 'assimilation' of a protumorigenic state in AT cells. In PT and AT, we introduced the 24-ribosome gene-based prognostic classifier suggesting a pathophysiological role of the ribosome biogenesis in both tissues.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Fig. S1. 1D-DDg method stratifies patients onto LR and HR subgroups based on 'tumor suppressor-like' and 'pro-oncogenic' prognostic gene expression patterns. Fig. S2. Workflow of data analysis and validation. Fig. S3. Survival analysis for RPL3 (ribosomal protein L3) and SPOP (speckle-type POZ protein) CPGs. Fig. S4. FA/GO analysis of TER genes using PANTHER bioinformatics software. Fig. S5. Identification of TER gene set in the LCI cohort and comparison with the TER gene set from the Singapore cohort. Fig. S6. Training and cross-cohort validation of the 41-gene cell cycle gene signature. Fig. S7. Up-regulated DEGs after RGC-stratification: MetaCore Pathway Maps and GO localization analyses. Fig. S8. Deregulated WNT pathway in HR T subgroups revealed after the RGC stratification of HCC patients in Singapore and LCI cohorts. Fig. S9. MYC as a key regulator of genes involved in TER pathway in HCC PT and AT.
Fig. S10. Concordance analysis between expression data from microarray and quantitative RT-PCR experiments in 12 representative genes. Fig. S11. Quantitative RT-PCR validation of 20 representative genes either involved in the RGC or genes differentially expressed in HR T and LR T subgroups. Fig. S12. Results of the testing of DKK1 relative expression in IHC liver tumor tissue images (PT) in six representative HCC patients using the SVR approach. Fig. S13. Comparison of RGC prognostic power with other prognostic gene signatures. Table S1. The flow of HCC patients and variables used in the present study. Table S2. Checklist of REMARK guidelines in the present study. Table S3. GO term 'Translational elongation' genes and their prognostic characteristics in Singapore dataset. Table S4. GO term 'Translational elongation' genes and their prognostic characteristics in LCI dataset. Table S5. The 24-ribosome gene prognostic classifier and the 1-D DDg-derived characteristics used in validation analysis. Table S6. 41-gene prognostic signature defined by the cell cycle genes differentiating PT and AT samples. Data include Illumina and Affymetrix microarray probes annotation support. Table S7. Clinical phenotypes associated with the RGC clinical subgroups in PT. Table S8. The common differentially expressed genes between HR T and LR T prognostic subgroups derived by the RGC in PT. Table S9. Common differentially expressed genes between HR A and LR A prognostic subgroups derived by the RGC in AT. Table S10. Unique overlaps of ChIP-seq binding regions with gene proximal promoters in various gene sets. Table S11. Literature support of the selected genes involved in endosome functions. Table S12. Common differentially expressed genes in HR RGC prognostic subgroups in both PT and AT. Table S13. Oligoprimers used for quantitative RT-PCR validation of the selected genes.