A multi‐omic study reveals BTG2 as a reliable prognostic marker for early‐stage non‐small cell lung cancer

B‐cell translocation gene 2 (BTG2) is a tumour suppressor protein known to be downregulated in several types of cancer. In this study, we investigated a potential role for BTG2 in early‐stage non‐small cell lung cancer (NSCLC) survival. We analysed BTG2 methylation data from 1230 early‐stage NSCLC patients from five international cohorts, as well as gene expression data from 3038 lung cancer cases from multiple cohorts. Three CpG probes (cg01798157, cg06373167, cg23371584) that detected BTG2 hypermethylation in tumour tissues were associated with lower overall survival. The prognostic model based on methylation could distinguish patient survival in the four cohorts [hazard ratio (HR) range, 1.51–2.21] and the independent validation set (HR = 1.85). In the expression analysis, BTG2 expression was positively correlated with survival in each cohort (HR range, 0.28–0.68), which we confirmed with meta‐analysis (HR = 0.61, 95% CI 0.54–0.68). The three CpG probes were all negatively correlated with BTG2 expression. Importantly, an integrative model of BTG2 methylation, expression and clinical information showed better predictive ability in the training set and validation set. In conclusion, the methylation and integrated prognostic signatures based on BTG2 are stable and reliable biomarkers for early‐stage NSCLC. They may have new applications for appropriate clinical adjuvant trials and personalized treatments in the future.

B-cell translocation gene 2 (BTG2) is a tumour suppressor protein known to be downregulated in several types of cancer. In this study, we investigated a potential role for BTG2 in early-stage non-small cell lung cancer (NSCLC) survival. We analysed BTG2 methylation data from 1230 earlystage NSCLC patients from five international cohorts, as well as gene expression data from 3038 lung cancer cases from multiple cohorts. Three CpG probes (cg01798157, cg06373167, cg23371584) that detected BTG2 hypermethylation in tumour tissues were associated with lower overall survival. The prognostic model based on methylation could distinguish patient survival in the four cohorts [hazard ratio (HR) range, 1.51-2.21] and the independent validation set (HR = 1.85). In the expression analysis, BTG2 expression was positively correlated with survival in each cohort (HR range, 0.28-0.68), which we confirmed with meta-analysis (HR = 0.61, 95% CI 0.54-0.68). The three CpG probes were all negatively correlated with BTG2 expression. Importantly, an integrative model of BTG2 methylation, expression and clinical information showed better predictive ability in the training set and validation set. In conclusion, the methylation and integrated prognostic signatures based on BTG2 are stable and reliable biomarkers for early-stage NSCLC. They may have new applications for appropriate clinical adjuvant trials and personalized treatments in the future.

Introduction
Lung cancer, predominantly non-small cell lung cancer (NSCLC), which constitutes more than 85% of all lung cancers, is the most commonly diagnosed malignant disease and is a leading cause of cancer-related deaths worldwide (Chen et al., 2014;Wood et al., 2016). Diagnosis often occurs in late-stage disease, when most patients have missed the optimal window for surgery, so prognosis is usually poor. However, genomic profiling of tumour tissues can identify biomarkers for survival prediction of NSCLC and help develop target therapy. Compared with patients diagnosed with late-stage disease, patients diagnosed with early-stage disease have a considerably more favourable prognosis, although different prognoses still exist among patients with similar clinical characteristics (Hirsch et al., 2017). This phenomenon indicates the importance of improved understanding of genetic and molecular heterogeneity among these patients. In addition to the traditional molecular biomarkers, DNA methylation has improved our understanding of tumour genomics by identifying key biomarkers for multiple cancers and has played an important role in the development of targeted therapy (Bock et al., 2016;Jones et al., 2016).
Recently, a number of studies have proposed lung cancer signatures for survival stratification with different types of data, including gene expression (Der et al., 2014;Shedden et al., 2008), DNA methylation (Karlsson et al., 2014;Sandoval et al., 2013) and microRNA expression (Raponi et al., 2009;Tan et al., 2011). However, none has been incorporated into clinical practice owing to issues such as lack of sufficient validation, small sample size and overfitting problems. Besides, each proposed signature was limited to only one type of omics data. Robles et al. (2015) proposed an integrated prognostic classifier for early-stage lung cancer, but their results found that different gene biomarkers of methylation and gene expression, when combined with the small sample size, made suggestions for a single target for therapy difficult. A large-scale multi-omics data integration is needed for lung cancer to build a cross-platform prognostic signature.
Two recent studies reported that B-cell translocation gene 2 (BTG2) plays an important role in cancer progression (Dolezal et al., 2017;Stupfler et al., 2016). BTG2, also called PC3/APRO1/TIS21, was the first identified gene in BTG/TOB family (Buanne et al., 2000). It is located on 1q32.1 and encodes 158 amino acids (Lim, 2006). Several studies have reported that BTG2 expression is downregulated in some cancers, including laryngeal carcinoma (Liu et al., 2009), pancreatic cancer (Coppola et al., 2013) and renal cell carcinoma (Struckmann et al., 2004). Further, BTG2 expression has also been found to be related to prognosis in bladder cancer (Wagener et al., 2013), breast cancer (Takahashi et al., 2011) and pancreatic cancer (Frampton et al., 2014). However, the study of BTG2 in lung cancer has been limited to cell lines (Sun et al., 2013;Wei et al., 2012). No studies have focused on the role of BTG2 in lung cancer prognosis, and no Lung Cancer cohort to date has validated its prognostic value.
In this study, using multi-centre cohorts with methylation and gene expression data, we carried out an integrative study to explore the prognostic role of BTG2 in early-stage (clinical stage I, II) NSCLC. The proposed prognostic signatures were successfully validated in all the cohorts and improved the survival prediction ability for early-stage NSCLC prognosis. In addition, we found BTG2 had a better prediction performance in cases with adjuvant therapy, which may provide a novel therapeutic target for early-stage cases.

Harvard
All patients in the Harvard cohort have been recruited at Massachusetts General Hospital (MGH) from 1992 to present, and all were newly diagnosed, histologically confirmed primary NSCLC at the time of recruitment. Snap-frozen tumour samples were collected from NSCLC patients during curative surgery with complete resection. Relatively complete survival information was available for the 151 early-stage patients who were selected for this study. Tumour DNA was extracted from 5-lm-thick histopathological sections. Each specimen was evaluated by an MGH pathologist for amount (tumour cellularity > 70%) and quality of tumour cells, and was histologically classified using WHO criteria. The study protocol was approved by the Institutional Review Board of MGH. All patients provided written informed consent.

Sweden
Tumour tissue specimens were collected from earlystage lung cancer patient who had been operated on at Skane University Hospital, Lund, Sweden (Karlsson   al., 2014). The study was approved by the Regional Ethical Review Board in Lund, Sweden (Registration no. 2004/762 and2008/702). All patients provided written informed consent.

Spain
Descriptions of this study population have been reported previously (Sandoval et al., 2013). In brief, tumours were collected by surgical resection from patients who provided consent and with approval from the institutional review boards. The median clinical follow up was 7.2 years. The study was approved by the Bellvitge Biomedical Research Institute institutional review board. All patients provided written informed consent.

Norway
As described previously (Bjaanaes et al., 2016), the participants were patients with operable lung cancer tumours who were seen at Oslo University Hospital-Rikshospitalet, Norway, from 2006 to 2011. Only early-stage (stage I, II) patients were selected for the current study. The project was approved by the Oslo University institutional review board and regional ethics committee (S-05307). All patients received oral and written information about the study and signed a written consent before entering the study.

GDC
Genomic Data Commons Data Portal (GDC) resources included 332 early-stage lung adenocarcinomas (LUAD) and 285 early-stage squamous cell carcinomas (LUSC) with both survival information and clinical information available for this analysis. In addition, 51 pairs (methylation) and 74 pairs (expression) of early-stage cases with both tumour and adjacent normal tissue data were used for the differential analysis. Level-1 HumanMethy-lation450 DNA methylation data (image data) for each patient were downloaded on 1 October 2015.
The study design is shown in Fig. 1. The data preprocessing details are provided in the Supporting Information. Descriptions of the demographic and clinical characteristics of early-stage lung cancer patients from the five international study cohorts are shown in Table 1. After data preprocessing, we extracted 13 CpG probes located in the BTG2 region from the microarray (Table S1), eight in the promoter region and five in the gene body or 3 0 UTR region.

Public GEO datasets
We collected 17 extra public datasets of 2209 earlystage NSCLC gene expression from the Gene Expression Omnibus (GEO) database (Table S2). Cases with data available on survival time, clinical stage and tumour tissue expression values were included. Fig. 1. Flow chart indicating study design. The whole study could be divided into three parts. First, we used the methylation data to compare the difference between tumour and normal tissue, build a prognostic model, and validate it in the different cohorts. Secondly, we used the gene expression data to evaluate the BTG2 expression and overall survival by meta-analysis. Lastly, we performed an integration analysis based on clinical information, methylation and expression data.

Statistical analysis
Continuous variables were summarized as mean AE standard deviation (SD), and categorized variables were described by frequency (n) and proportion (%). We used a paired Student's t-test to compare the differential methylation/expression values between tumour and adjacent normal tissues. We used a linear model to explore the relationship between different omics data. The false-discovery rate (FDR) correction q-value was used for multiple comparisons. We performed meta-analysis of summary-level results using an inverse-variance-weighted fixed-effects model with the R package meta.
In the survival analysis, associations between BTG2 CpG probes and overall survival were evaluated by univariable Cox proportional hazard models separately. The methylation prognostic model was calculated as per 1% methylation increments (Shen et al., 2017). Kaplan-Meier survival curves were drawn and compared among subgroups using logrank tests. In the multivariable Cox regression model, age, gender, clinical stage, smoking status, histology type and study site (if there were two or more sites) were included as covariates. In the integration analysis, the integrated model was built using a multivariable Cox regression model including age, stage, BTG2 methylation signature and gene expression to generate the coefficients. To evaluate the model prediction accuracy, a concordance statistic (C-index) was estimated using R package rms and compared using R package compareC (Kang et al., 2015).
Statistical analyses were performed using R version 3.4.0 (The R Foundation). P-values were two-sided, and P (FDR-q) < 0.05 was considered statistically significant.
Based on the three survival-related CpG probes, we built a multi-loci prognostic model. Using the training set to generate coefficients by Cox regression, the model is: prognostic score methylation = 0.0046 9 cg01798157 + 0.0026 9 cg06373167 + 0.0066 9 cg23371584. Increased DNA methylation levels of the three probes were associated with increased risk of death. Patients were  divided into high-risk (above the median) and low-risk (below the median) groups by the median score in the training set (0.292). We then validated the model separately within each cohort of the training set. Compared with cases in the low-risk group, cases in the high-risk group had the worse overall survival in the Harvard (log-rank test, P = 0.030), Sweden (P = 0.002), Spain (P = 8.71 9 10 À5 ) and Norway (P = 0.017) cohorts ( Fig. 2C-F). In the multivariable Cox regression model, the score retained significance in the Harvard (HR = 1.51; 95% CI 1.04-2.19; P = 0.031), Sweden (HR = 2.21; 95% CI 1.28-3.81; P = 0.004), Spain (HR = 2.12; 95% CI 1.41-3.17; P = 2.69 9 10 À4 ) and Norway (HR = 2.09; 95% CI 1.05-4.18; P = 0.036) cohorts.
To estimate the reproducibility and validity of the three-CpG-based classifier, we performed an independent validation in the GDC cohort. The prognostic score for each patient was calculated with the same formula and divided by the same cut-off value (0.292) used in the training set. Cases with lower risk scores generally had a better survival than those with higher risk scores (log-rank test, P = 0.010) (Fig. 2G). After adjusting for the same covariates used in the training set, the methylation model remained an independent prognostic factor (HR = 1.85; 95% CI 1.26-2.72; P = 0.001) (Table S5).

BTG2 gene expression is also associated with survival
To compare the BTG2 expression difference, we extracted 74 early-stage cases from the GDC cohort with data on both tumour and adjacent normal tissue gene expression. Using a paired Student's t-test, BTG2 was significantly downregulated in tumour tissues (fold change = 0.55, P = 7.79 9 10 À16 ) (Fig. 3A).
Of the five cohorts, gene expression data were available in four cohorts but not in the Spain cohort. In the survival analysis, using the median expression within each cohort as a cut-off to dichotomize expression levels, BTG2 over-expression was significantly associated with better survival in the Harvard (HR = 0.28, P = 0.036), Sweden (HR = 0.54, P = 0.023), Norway (HR = 0.44, P = 0.032) and GDC (HR = 0.68, P = 0.005) cohorts (Fig. S2).
Further, we performed a meta-analysis to examine the relationship between BTG2 expression and overall survival from the four consortium cohorts and 17 external public lung cancer cohorts. The analysis of these 3038 cases also revealed BTG2 as a tumour suppressor gene, with higher expression levels associated with longer overall survival (HR = 0.61, 95% CI 0.54-0.68, P = 1.87 9 10 À18 ) (Fig. 3B,C). In addition, we also performed a sensitivity analysis using the normalized continuous gene expression data (mean = 0, SD = 1) to test the model robustness. Metaanalysis also showed that BTG2 continuous gene expression was significantly associated with overall survival (HR = 0.79; 95% CI 0.74-0.84; P = 2.62 9 10 À13 ) (Fig. S3).

Integration analysis of clinical information, expression and methylation
To improve the accuracy of clinical prognosis prediction, we performed an integration model for BTG2 expression, methylation and clinical information. In the multivariate analysis, clinical variables, including age and clinical stage, were independent prognostic factors (Table S5) and were included in the integration model. Expression data were treated as a binary variable (low vs. high). We used a training set using the Harvard, Sweden and Norway cohorts to derive a prognostic score integration : 0.027 9 age + 0.233 9 stage À 0.586 9 BTG2 mRNA + 48.15 9 score methylation model . Using the the median risk score value of the training sets (2.36) as a cut-off, the integrated model showed a better ability to distinguish between prognosis compared with the methylation model alone in both the training set (HR = 2.80, 95% CI 1.96-4.28, P = 1.21 9 10 À5 ) and the GDC validation cohort (HR = 2.38, 95% CI 1.67-3.37, P = 1.40 9 10 À6 ) (Fig. 4B). The integration model also showed a superior predictive performance in comparison with the model using clinical characteristics only (age and clinical stage) (training set C-index: 0.676 vs. 0.550, z = 4.06, P = 4.82 9 10 À5 ; validation set C-index: 0.668 vs. 0.591, z = 2.48, P = 0.012) (Fig. 4C).

Stratification analysis for the prognostic signatures
We assessed the effect of methylation and integration prognostic scores on overall survival in subgroups of patients with different clinical profiles. When stratified by clinical variables [age (divided by the median value), gender, histology, clinical stage, smoking status and adjuvant therapy], the models remained statistically significant (Figs 5A and S4A). Interestingly, the effect of the integration signature was more pronounced in patients who received adjuvant therapy (HR = 3.76, 95% CI 1.46-9.68) than in those who did not (HR = 1.57, 95% CI 1.24-1.99) (Fig. 5B).
The Kaplan-Meier curves for overall survival for respective prognostic score categories are shown in Figs 5C and S4B. The classifiers successfully categorized patients into different subgroups with significant differences in clinical outcome (P methylation = 1.66 9 10 À7 , P integration = 4.86 9 10 À13 ).

Discussion
Early-stage NSCLC patients are at substantial risk for recurrence and death, even after curative surgical resection. The use of adjuvant therapy in early-stage disease, particularly for stage I cases, remains controversial because previous randomized trials have not demonstrated a consistent survival benefit (Li et al., 2017). Stable and reliable prognostic biomarkers are urgently needed to identify the subgroup at higher risk for death. In this study, we developed prognostic signatures that together with traditional clinical information, DNA methylation and gene expression from only one gene, BTG2, are practical for developing targeted therapy. The prognostic signatures could distinguish patient survival and were successfully validated in all cohorts, both in the whole set and in clinically defined subgroups (e.g. stage I, II, and LUAD, LUSC). The integrated model could add prognostic predictive value to the clinical information currently available.
BTG2 is one of the early growth response genes (Sukhatme et al., 1987) and is highly expressed in multiple organs and tissues, including lung, intestines, pancreas and prostate (Melamed et al., 2002). Several cancer-related biological functions have been found in this gene. First, over-expression of BTG2 is known to inhibit proliferation of cells and invasion in some  tumours, including lung cancer cells (Wei et al., 2012), and acts as an anti-proliferation gene in cooperation with PRMT1 (Dolezal et al., 2017). Secondly, BTG2 is involved in the development and differentiation of cancer cells that could promote retinoic acid-induced differentiation in haematopoietic cells (Passeri et al., 2006). Thirdly, a previous study has reported that BTG2 was able to promote or induce cell apoptosis and suppress cell invasion in triple-negative breast cancer cells (Zhang et al., 2013). Fourthly, BTG2 is one of the p53 target genes and is involved in the DNA damage repair process. It acts through the p53-dependent Ras signal transduction pathway and significantly increases expression when DNA is damaged (Boiko et al., 2006). Thus, BTG2 plays important roles in cell proliferation, differentiation, apoptosis and DNA damage repair.
BTG2 is involved in several important cancer-related pathways (Fig. 6). As described previously, it is a major downstream anti-activity effector in the p53dependent Ras pathway and is linked to the p53 pathway in human tumorigenesis (Boiko et al., 2006). Additionally, it inhibits the proliferation and metastasis of cancer cells by suppressing the PI3K/AKT pathway, which is an important pathway involved in the malignant progression of various tumours and mediates the cancer proliferation, migration and invasion (Li et al., 2015). Moreover, BTG2 over-expression inhibits interleukin-6 (IL-6) expression through downregulation in the STAT3 pathway, as well as inhibiting reactive oxygen species (ROS) generation in the JAK2-STAT3 signalling pathway (Quy et al., 2013); thus it has a negative effect on cancer cell growth. BTG2 expression is also upregulated by oxidative stress via the ROS-protein kinase C-ΝFjΒ pathway, which is independent of p53 status (Imran and Lim, 2013). Hence, BTG2 participates in some pathways that are crucial for cancer development and progression.
As BTG2 has been reported to relate to cancer via various biological mechanisms, it has a potential to be a target gene for precision treatment. In our stratification analysis, we found that the prognostic signature was more effective and had a better 5-year prediction performance in patients who received adjuvant therapy than in those who did not. In terms of clinical application, BTG2 has been demonstrated to be one of the hypoxia-inducible proapoptotic targets of p53, which can modulate apoptosis and radiosensitivity via AKT inhibition (Leszczynska et al., 2015). Further, previous reports suggest that BTG2 expression improved the radiosensitivity of NSCLC and breast cancer cells by affecting cell cycle distribution, enhancing radiation-induced apoptosis and inhibiting DNA repair-related protein expression (He et al., 2015;Hu et al., 2012), which suggests that BTG2 may be a novel target in radiotherapy for lung cancer. Whether BTG2 plays a role in chemosensitivity still needs further investigation.
We notice that the three risk CpG probes in the methylation prognostic model were all in the gene body or 3 0 UTR region, whereas most probes in the promoter region were not associated with survival. Recent studies have found that gene body methylation can also alter gene expression, with the genes serving as therapeutic targets (Ball et al., 2009;Jones, 2012;Yang et al., 2014), e.g. ITPKA (Wang et al., 2016). In addition, the three probes showed a strong negative correlation with BTG2 expression. Thus, the proposed  epigenetic silencing CpG probes might be important regulators of gene expression.
To our knowledge, this is the first multi-centre, large-scale integration analysis of BTG2 methylation and expression in early-stage NSCLC. We acknowledge some limitations. First, the sample size for some subgroups, such as patients with radiotherapy, was not large, which made some subgroup analyses difficult to perform. Instead, we chose to analyse cases with some form of adjuvant therapy. Secondly, the histological subtypes in the five cohorts were not in equilibrium. Specifically, no LUSC cases were included in the Norway cohort. However, the prognostic signatures we identified were significant in both major histological subtypes, reducing concerns of bias. Thirdly, the scope of this study is limited when compared with other whole-genome level studies.

Conclusions
The proposed methylation and integration signatures based on BTG2 are stable and reliable prognostic biomarkers for early-stage NSCLC overall survival. These prognostic signatures may have new applications for appropriate adjuvant trials and personalized treatments in the future.

Supporting information
Additional supplemental material may be found online in the Supporting Information section at the end of the article. Fig. S1. Boxplot depicting the distribution of the three CpG probes across the five cohorts.   Table S1. Annotation for 13 CpG sites located in BTG2 gene region. Table S2. Study characteristics of the 17 public lung cancer datasets. Table S3. Cox regression analysis for the 13 probes in the training set. Table S4. Differential analysis between tumour and adjacent normal tissues for the 13 probes. Table S5. Multivariable Cox regression analysis for the methylation prognostic signature.