Integrative analysis of gene expression and DNA methylation through one‐class logistic regression machine learning identifies stemness features in medulloblastoma

Most human cancers develop from stem and progenitor cell populations through the sequential accumulation of various genetic and epigenetic alterations. Cancer stem cells have been identified from medulloblastoma (MB), but a comprehensive understanding of MB stemness, including the interactions between the tumor immune microenvironment and MB stemness, is lacking. Here, we employed a trained stemness index model based on an existent one‐class logistic regression (OCLR) machine‐learning method to score MB samples; we then obtained two stemness indices, a gene expression‐based stemness index (mRNAsi) and a DNA methylation‐based stemness index (mDNAsi), to perform an integrated analysis of MB stemness in a cohort of primary cancer samples (n = 763). We observed an inverse trend between mRNAsi and mDNAsi for MB subgroup and metastatic status. By applying the univariable Cox regression analysis, we found that mRNAsi significantly correlated with overall survival (OS) for all MB patients, whereas mDNAsi had no significant association with OS for all MB patients. In addition, by combining the Lasso‐penalized Cox regression machine‐learning approach with univariate and multivariate Cox regression analyses, we identified a stemness‐related gene expression signature that accurately predicted survival in patients with Sonic hedgehog (SHH) MB. Furthermore, positive correlations between mRNAsi and prognostic copy number aberrations in SHH MB, including MYCN amplifications and GLI2 amplifications, were detected. Analyses of the immune microenvironment revealed unanticipated correlations of MB stemness with infiltrating immune cells. Lastly, using the Connectivity Map, we identified potential drugs targeting the MB stemness signature. Our findings based on stemness indices might advance the development of objective diagnostic tools for quantitating MB stemness and lead to novel biomarkers that predict the survival of patients with MB or the efficacy of strategies targeting MB stem cells.

Most human cancers develop from stem and progenitor cell populations through the sequential accumulation of various genetic and epigenetic alterations. Cancer stem cells have been identified from medulloblastoma (MB), but a comprehensive understanding of MB stemness, including the interactions between the tumor immune microenvironment and MB stemness, is lacking. Here, we employed a trained stemness index model based on an existent one-class logistic regression (OCLR) machine-learning method to score MB samples; we then obtained two stemness indices, a gene expression-based stemness index (mRNAsi) and a DNA methylation-based stemness index (mDNAsi), to perform an integrated analysis of MB stemness in a cohort of primary cancer samples (n = 763). We observed an inverse trend between mRNAsi and mDNAsi for MB subgroup and metastatic status. By applying the univariable Cox regression analysis, we found that mRNAsi significantly correlated with overall survival (OS) for all MB patients, whereas mDNAsi had no significant association with OS for all MB patients. In addition, by combining the Lasso-penalized Cox regression machinelearning approach with univariate and multivariate Cox regression analyses, we identified a stemness-related gene expression signature that accurately predicted survival in patients with Sonic hedgehog (SHH) MB. Furthermore, positive correlations between mRNAsi and prognostic copy number aberrations in SHH MB, including MYCN amplifications and GLI2 amplifications, were detected. Analyses of the immune microenvironment revealed unanticipated correlations of MB stemness with infiltrating immune cells. Lastly, using the Connectivity Map, we identified potential drugs targeting the MB stemness signature. Our findings based on stemness indices might advance the development of objective diagnostic tools for quantitating MB stemness and lead to novel biomarkers that predict the survival of patients with MB or the efficacy of strategies targeting MB stem cells.

Introduction
Medulloblastoma (MB) is the most commonly diagnosed embryonal tumor of the central nervous system (CNS) in children. Despite being initially characterized based on histological features, it is now clearly accepted that MB mainly comprises four distinct molecular subgroups: wingless (WNT)-activated, Sonic hedgehog (SHH)-activated, group 3, and group 4, as reflected in the 2016 World Health Organization (WHO) classification of tumors of the CNS (Louis et al., 2016;Ramaswamy et al., 2016a,b). These four subgroups have divergent transcriptional profiles, somatic mutations, copy number aberrations, and clinical outcomes (Morrissy et al., 2016;Northcott et al., 2012;Ramaswamy et al., 2013Ramaswamy et al., , 2016a. WNT and SHH MBs are clearly separable and identifiable across the majority of studies based on transcriptional and DNA methylation profiling data, demonstrating minimal overlap with other MB subgroups . Group 3 and 4 MBs share several copy number alterations such as enrichment of isochromosome 17q, and the transcriptomes of group 3 and group 4 MBs are more similar to each other (Ramaswamy et al., 2016a,b;Taylor et al., 2012). The 2016 WHO classification of CNS tumors includes group 3 and group 4 MBs as provisional variants under the umbrellas of non-WNT/non-SHH MB (Louis et al., 2016). The survival rate of patients with MB largely depends on the molecular and clinical features of the cancer, varying from > 90% 5-year overall survival (OS) for WNT MB patients to < 50% 5-year OS for patients with metastatic group 3 or SHH MB with a TP53 mutation (Ramaswamy et al., 2016a,b). Aggressive yet nonspecific multimodal therapies (surgery, radiation therapy, and chemotherapy) have significantly improved the survival of MB patients, but survivors experience severe late-onset cognitive and neurological side effects, including secondary malignancies (Crawford et al., 2007;Diller et al., 2009;Packer and Vezina, 2008;Packer et al., 2013). The cause of most MB-related deaths is leptomeningeal metastases (Ramaswamy and Taylor, 2017). The relapse of MB is an almost uniformly fatal event, with no significant salvage rate (Ramaswamy and Taylor, 2017). It is essential to define the mechanisms of MB growth, metastasis, and recurrence to develop tailored therapies to selectively eradicate tumor cells responsible for MB expansion, metastasis, and relapse while sparing the developing brain (Vanner et al., 2014).
Stemness is defined as the potential for self-renewal and differentiation from the cell of origin and was originally attributed to normal stem cells that have the ability to give rise to all cell types in the adult organism (Malta et al., 2018). Cancer stem cells (CSCs) are cancer cells that possess characteristics related to normal stem cells, specifically the ability to give rise to all tumor cell types (Bjerkvig et al., 2005). CSCs are considered to be responsible for tumor growth and maintenance, are often resistant to conventional chemotherapy and radiation therapy, and are involved in tumor metastasis and recurrence. Tumors are composed of a diverse, complex, integrated ecosystem of relatively differentiated tumor cells, CSCs, infiltrating immune cells, tumor-associated fibroblasts, and endothelial cells, among other cell types (Malta et al., 2018). The tumor immune environment plays an important role in prognosis and response to therapy in various cancer types (Thorsson et al., 2018). MBs are cancers in which the majority of cells possess an undifferentiated stem-or progenitorlike appearance (Fan and Eberhart, 2008), and CSCs have been identified from MB (Read et al., 2009;Singh et al., 2004;Ward et al., 2009). However, an integrated understanding of MB stemness, including the interface between the tumor immune environment and MB stemness, is lacking.
In this study, we analyzed cancer stemness in a cohort of primary MB samples (n = 763). First, we applied a trained stemness index model based on the previously existing one-class logistic regression (OCLR) machinelearning method (Malta et al., 2018;Sokolov et al., 2016), which includes a mRNA expression-based signature and a DNA methylation-based signature, to quantify MB stemness to acquire two independent stemness indices. One index [gene expression-based stemness index (mRNAsi)] was reflective of gene expression; the other index (mDNAsi) was reflective of epigenetic features. Second, we assessed correlations between the two stemness indices and clinical and molecular features and identified a stemness molecular signature that might be helpful in guiding the prognostic status of MB patients. In addition, by applying CIBERSORT (Gentles et al., 2015) to profile immune cell types in MB, we gained insight into the interaction of the immune system with cancer stemness. Finally, using the Connectivity Map (CMap) (Subramanian et al., 2017), we discovered candidate compounds targeting the MB stemness signature.

Data collection and processing
In this study, we collected 763 primary MB samples, which all had genome-wide methylation and expression array data deposited in Gene Expression Omnibus (GEO) under the accession number GSE85218 (Cavalli et al., 2017), to analyze MB stemness. Microarray data from GSE85218 dataset were downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE85218). Demographic and clinical information for the GSE85218 dataset is summarized in Table S1. For gene expression array data, background correction was carried out using the 'backgroundCorrect' function of the R package 'limma' with default parameters (Ritchie et al., 2015), and normalization was implemented with the 'normalizeBetweenArrays' function of the R package 'limma' with default parameters (Ritchie et al., 2015). The log2-transformed normalized values of gene expression data were used to generate the mRNAsi. The DNA methylation level was represented using b values ranging from zero (no DNA methylation) to one (complete DNA methylation). For DNA methylation data, we excluded probes located on the sex chromosome and probes containing known single nucleotide polymorphisms. We performed normalization utilizing the SWAN method as part of the R package 'minfi' with default parameters (Maksimovic et al., 2012). b values were used to generate the mDNAsi.

Calculation of gene expression-and DNA methylation-based stemness indices for MB
To calculate the mRNAsi and the DNA methylationbased stemness index (mDNAsi), Malta et al. (2018) built a predictive model using an OCLR algorithm on pluripotent stem cell samples from the Progenitor Cell Biology Consortium dataset (Daily et al., 2017;Salomonis et al., 2016) to train two stemness signatures. The mRNA expression-based signature contains a gene expression profile comprising 11 774 genes, and the DNA methylation-based signature contains a set of 151 differentially methylated CpG sites. The work flow to generate the stemness indices (mRNAsi and mDNAsi) is available on https://bioinformaticsfmrp. github.io/PanCanStem_Web/. We applied the stemness index model to score the 763 MB samples using the same Spearman correlation (RNA expression data) or linear model (DNA methylation data) operators. The stemness indices were subsequently mapped to the [0,1] range via utilizing a linear transformation that subtracted the minimum and divided by the maximum. The MB samples stratified by the stemness indices were utilized for the integrative analyses.

Evaluation of associations between stemness indices and clinical outcomes in MB
We regarded the stemness index (mRNAsi or mDNAsi) as a single continuous covariate. The associations between the two stemness indices and OS in MB were assessed in three phases. First, we applied univariate Cox proportional hazard regression to calculate hazard ratios (HRs) for OS. The variables we included were mRNAsi, mDNAsi, age, sex, subgroup, tumor histology, metastatic status, and immune score. Immune score was calculated from gene expression data using the ESTIMATE algorithm (Yoshihara et al., 2013) and represents the level of infiltrating immune cells in any given MB sample. We found that only mRNAsi significantly correlated with OS for all MB patients. Therefore, mRNAsi was retained for subsequent analyses. Second, each subgroup of patients was split into lowand high-risk groups based on the optimal cutoff value for mRNAsi, which was determined by using the 'cutp' function of the R package 'survMisc' (https://cran.r-pro ject.org/web/packages/survMisc) with default parameters, and the survival difference between patients with high mRNAsi and low mRNAsi was compared by Kaplan-Meier (K-M) survival plots. Finally, the statistically significant survival difference between patients with high mRNAsi and low mRNAsi was limited to SHH subgroup patients. We split the SHH MB dataset randomly into a 70% training set and 30% validation set splits by using the 'createDataPartition' function of the R package 'caret' (https://cran.r-project.org/web/ packages/caret). The following nondefault parameters for the 'createDataPartition' function were used: P = 0.7 and list=FALSE. Distribution of clinical characteristics between the training and validation sets was compared with the Kruskal-Wallis test for continuous parameters and the chi-square test for categorical parameters. In the training set, the differentially expressed genes (DEGs) between SHH subgroup samples with high mRNAsi and low mRNAsi were computed using the 'lmFit' function of the R package 'limma' with default parameters (Ritchie et al., 2015). In total, 3800 DEGs with an adjusted P value of < 0.05 were considered for the univariate Cox regression. The adjusted P value for multiple testing was computed utilizing the Benjamini-Hochberg (BH) correction. The univariate Cox regression analyses were performed to investigate the association between the OS of SHH MB patients in the training set and the expression level of each DEG. By performing the univariate Cox regression, 83 genes whose parameter P values were less than 0.001 were selected for subsequent analyses. In the training set, we employed Lasso-penalized Cox regression analysis (Tibshirani, 1997) to further reduce genes for SHH MB patients. For the Lasso-penalized Cox regression analysis, we subsampled the training set at a ratio of 7 : 3 with 1000 replacements and selected the genes with repeat occurrence frequencies of more than 200. A 23-mRNA-based risk score staging model was built based on a linear combination of the regression coefficient derived from the multivariate Cox regression (coef i ) multiplied by its expression level (expr i ). The formula for calculating risk scores is described as follows: Sonic hedgehog subgroup samples were split into low-and high-risk score groups according to the optimal cutoff value generated by using the 'cutp' function of the R package 'survMisc' with default parameters, and the two patient cohorts were compared by K-M curves. To determine whether the predictive power of the 23-mRNA-based prognostic model could be independent of other clinical variables (including age, gender, histology, and metastatic status) for SHH MB patients, the multivariate Cox regression analyses were conducted. In the validation set, we applied the same risk score formula and cutoff point and divided the SHH MB patients into low-and high-risk groups to test the robustness of the 23-mRNA-based prognostic model. We also employed the 23-mRNA-based prognostic model to predict survival of patients with other MB subgroups. Furthermore, we used a random model that was built by a random subset of 23 genes using the multivariate Cox regression analysis to predict survival of patients with SHH MB, and constructing the random model in the training set was also repeated 1000 times. The area under the curve (AUC) of the time-dependent receiver operating characteristic (ROC) analysis was used to evaluate the predictive accuracy of the models.

Evaluation of associations between stemness indices and prognostic copy number alterations in SHH MB
The prognostic copy number alterations in SHH MB, including MYCN amplifications, GLI2 amplifications, and PTEN deletions, were calculated from DNA methylation arrays utilizing the R package 'conumee' with default parameters (http://bioconductor.org/pac kages/conumee) (Capper et al., 2018). P values for the associations between stemness indices and the prognostic copy number alterations of SHH MB were computed using Pearson's correlation coefficient tests followed by multiple testing using the BH method.

Evaluation of relationships between stemness indices and the MB immune microenvironment
By using CIBERSORT (a gene expression-based deconvolution algorithm) (http://cibersort.stanford.ed u/) (Gentles et al., 2015), we scored 22 immune cell types for their relative abundance in the MB samples. For any given MB sample, we computed the associations between mRNAsi/mDNAsi and the estimated fractions of individual immune cell types. By applying ESTIMATE (Yoshihara et al., 2013), we calculated the individual immune score to predict the level of infiltrating immune cells in any given MB sample. We calculated the correlations between mDNAsi/mRNAsi and immune score or PD-L1 expression.

Identification of potential compounds targeting the MB stemness signature
We employed the recently updated CMap (September 2017) (Subramanian et al., 2017), a data-driven, systematic approach for discovering correlations among genes, chemicals, and biological conditions, to search for candidate compounds that might target pathways correlated with MB stemness. In the CMap database, a total of 42 080 perturbagens were profiled to generate 473 647 reference signatures. The CMap workflow involves interrogating the CMap dataset of reference signatures with a query (a list of DEGs related to a biological state of interest) utilizing the pattern-matching algorithms. The query results are scored ranging from À100 to 100. The molecule compounds are ranked according to their scores to yield most similar and opposing compounds. The CMap data and tools are available on https://clue.io. The DEGs between SHH subgroup samples with high mRNAsi and low mRNAsi were calculated using the 'lmFit' function of the R package 'limma' with default parameters (Ritchie et al., 2015). A list of genes differentially expressed between SHH subgroup samples with high mRNAsi and low mRNAsi was obtained, and the top 300 genes (150 upregulated and 150 downregulated) were selected to query the CMap database. Compounds with an enrichment score ≤ À95 were recorded as potential therapeutic agents for SHH MB.

Statistical analysis
R software version 3.4.4 (R Core Team, R Foundation for Statistical Computing, Vienna, Austria) was used for all statistical analyses. The OCLR method was implemented with the R package 'gelnet' with default parameters (Sokolov et al., 2016). P values for the associations between stemness indices and the MB immune microenvironment were computed using Pearson's correlation coefficient tests followed by multiple testing using the BH method. P < 0.05 was considered statistically significant.

mRNA expression-and DNA methylationbased stemness indices in MB
We ranked the MB samples according to their mRNAsi or mDNAsi values (from low to high stemness index) and tested whether any demographic/molecular/clinical feature was correlated with either a low or high stemness index (Fig. 1A,B). We observed an inverse trend between mRNAsi and mDNAsi for subgroup and metastatic status ( Fig. 1C-L). Group 3 and group 4 samples had higher mRNAsi values than WNT and SHH samples (Fig. 1C), while WNT and SHH samples had higher mDNAsi values than group 3 and group 4 samples (Fig. 1H). Similarly, patients with metastatic MB had higher mRNAsi values than patients with nonmetastatic MB (P = 0.025, Fig. 1G), whereas the mDNAsi value was higher in patients with nonmetastatic MB than in patients with metastatic MB (P = 4.6 9 10 À6 , Fig. 1L). In patients with group 3 MB, patients with nonmetastatic MB had higher mDNAsi values than patients with metastatic MB (P = 0.014, Fig. 1I). In addition, in patients with nonmetastatic MB, the mRNAsi value was higher in patients with group 3 MB and patients with group 4 MB ( Fig. 1F), while the mDNAsi was highest in patients with SHH MB (Fig. 1K). In patients with metastatic MB, the mDNAsi was highest in patients with SHH MB (Fig. 1J).

Correlations of stemness indices with clinical outcome in MB
By using the univariable Cox regression analyses, we found that mRNAsi had a statistically significant effect on OS for MB (HR, 11.43; 95% CI, 2.79-46.76; P = 7.03 9 10 À4 ), whereas mDNAsi had no significant association with OS for MB (Table 1). For each subgroup of MB patients, the statistically significant OS difference between patients with high mRNAsi and low mRNAsi was restricted to SHH MB patients [HR, 2.36; 95% CI, 1.2-4.6; P = 0.0086; P (cutoff) = 0.0193] (Fig. 2). Distribution of clinical characteristics [age (P = 0.24), gender (P = 0.66), histology (P = 0.21), metastatic status (P = 0.83)] was balanced between the training and validation sets (Table S2). The genes differentially expressed between SHH MB samples with high mRNAsi and low mRNAsi were screened, and univariate, Lasso, and multivariate Cox analyses were conducted to construct a 23-mRNA-based prognostic model. The gene expression-based prognostic model was characterized by the linear combination of the expression values of the 23 genes weighted by their relative coefficients in the multivariate Cox regression analysis. Table S3 shows the multivariate Cox regression coefficients of the genes in the 23-mRNA-based prognostic model. In this 23-mRNA-based prognostic model, higher expression levels of ADAMTSL3, CPE, EFEMP2, FAM214A, FKBP4, FRZB, HIST1H2APS4, ITIH2, KCNG1, LPCAT3, MTRR, NLGN4Y, and TIMM50 were related to a lower risk of death (coefficient < 0). In contrast, higher expression levels of COLGALT1, KIAA0825, LDB3, PIP4K2A, PROSER1, TMEM185B, TMEM38B, TOMM40, TRIM28, and TRMT1 were associated with worse OS (coefficient > 0). By applying this prognostic model, each patient with SHH MB was given a risk score in connection with individual prognosis. Then, patients with SHH MB in the training set were classified into a high-risk group (n = 22) and a lowrisk group (n = 99) by the cutoff value for the 23-mRNA-based risk scores. The K-M OS curves of the two groups in the training set, based on the 23 genes, were significantly different (HR, 20.93; 95% CI, 7.5-58; P < 0.0001; Fig. 3A). The predictive capacity of the 23-mRNA-based prognostic model was assessed by calculating the AUC of an ROC curve. The AUCs of the 23-gene biomarker prognostic model in the training set were 0.769, 0.842, and 0.862 for the 1-, 3-, and 5-year survival times, respectively (Fig. 3B). We incorporated age, gender, histology, metastatic status, and the 23-mRNA-based prognostic model into the multivariate Cox regression analysis. Based on the multivariate Cox regression analysis, the 23-mRNA-based prognostic model was an independent prognostic factor correlated with OS (Table 2). Ultimately, in the validation set, patients with SHH MB were classified into a high-risk group (n = 9) and a low-risk group (n = 42). The K-M OS curves of the two groups in the validation set were significantly different (HR, 3.2; 95% CI, 1.2-8.7; P = 0.0159; Fig. 3C). The AUCs of the 23-gene biomarker prognostic model in the validation set were 0.827, 0.763, and 0.753 for the 1-, 3-, and 5-year survival times, respectively (Fig. 3D). When we applied the 23-mRNA-based prognostic model to predict the survival of patients with other MB subgroups, we found that there were no statistically significant differences in OS between the high-risk group and the low-risk group (Fig. 3E-G), indicating that the 23-mRNA-based signature is not applicable to other MB subgroups. The 23-mRNA-based signature had a much higher predictive accuracy than a random model based on a random subset of 23 genes (Table S4). Moreover, we found the positive correlations between mRNAsi and the prognostic copy number alterations in SHH MB, including MYCN amplifications and   GLI2 amplifications (Fig. 4A,C). However, we found no significant correlations between mDNAsi and the prognostic copy number alterations in SHH MB, including MYCN amplifications (Fig. 4B), GLI2 amplifications (Fig. 4D), and PTEN deletions (Fig. 4F), and there was no statistically significant association between mRNAsi and PTEN deletions in SHH MB (Fig. 4E).

Discussion
Leveraging a large cohort of primary MBs profiled based on combined DNA methylation and gene expression, we performed a comprehensive analysis of MB stemness. By employing a stemness index model-based OCLR machine-learning algorithm to the MB samples, we obtained two distinct molecular metrics of stemness and then applied these metrics to assess the epigenomic and transcriptomic stemness features of MBs based on their molecular and clinical information. Moreover, we identified a 23-mRNA-based prognostic model that could effectively predict the survival of SHH MB patients and revealed the positive correlations between mRNAsi and the prognostic copy number changes in SHH MB, including MYCN amplifications and GLI2 amplifications. Using CIBERSORT, we obtained insight into the interaction of MB stemness and the immune microenvironment. Taking advantage of CMap, we identified potential drugs targeting SHH MB stem cells. With regard to the association between stemness indices and prognosis in MB patients, we showed that mRNAsi had a positive correlation with MB subgroup and a significant association with OS, while mDNAsi had a negative correlation with MB subgroup and no significant association with OS, suggesting that mRNAsi could recapitulate prognostic molecular subgroups of MB. According to mRNAsi, only patients with SHH MB could be divided into two groups with distinct prognoses, indicating that the SHH subgroup might have a higher degree of intrasubgroup heterogeneity than other subgroups with respect to the stemness phenotype. Previous studies have shown that WNT, SHH, and group 4 MBs have different cellular origins (Gibson et al., 2010;Lin et al., 2016;Sch€ uller et al., 2008;Yang et al., 2008). Given that the cancer methylome can reflect the cell of origin (Fernandez et al., 2012;Hovestadt et al., 2014), different mDNAsi of MB subgroups may provide additional evidence for distinct cellular origins for MB subgroups.
Stem cell signatures shared by leukemia and hematopoietic stem cells predict clinical outcomes in acute myeloid leukemia patients (Eppert et al., 2011). Similarly, in colon, breast, and non-small-cell lung cancer, stem cell signature expression correlates inversely with patient survival (Liu et al., 2007;Merlos-Su arez et al., 2011;Zheng et al., 2013a,b). Moreover, a medulloblastoma-propagating cell signature defines SHH MB patients with a poor prognosis (Vanner et al., 2014). These studies revealed that for multiple tumors, including MB, patients whose cancer exhibits higher expression levels of stem cell genes experience significantly worse clinical outcomes. In the present study, we built and validated a 23-mRNA-based prognostic model associated with stem cell genes. To our knowledge, all predictive genes in this 23-mRNA signature have not been reported for MB and may provide some clinical indications for the development of novel prognostic factors for MB. One of the advantages of predictive genes is that they do not require the identification of somatic mutations in patients and reduce the cost of sequencing, which may make the application of panel testing based on specific mRNAs more routine. Additionally, when applied to single-cell transcriptomic profiles of MB, the stemness indices could reveal intratumor heterogeneity for the stemness of individual MB cells and identify the MB cells that exhibit greater proliferation and tumorpropagating potential.
We found that mRNAsi had a negative association with the immune score for all of the MB subgroups, suggesting that immune cells in MB may repress MB   stem cells by affecting the transcriptome of MB stem cells. In addition, the excellent prognosis of WNT MB may be explained in part by the result that the negative correlation between mRNAsi and immune score was stronger in the WNT subgroup than in the other subgroups. The absence of PD-L1 expression in MB (Aoki et al., 2016;Majzner et al., 2017;Vermeulen et al., 2018) might explain in part why the stemness indices had no significant associations with PD-L1 expression, indicating that the therapeutic potential of immunotherapy with PD-L1 inhibitors seems limited in MB. For all of the MB subgroups, the mRNAsi was associated positively with the fraction of activated NK cells, suggesting that NK cells may promote MB stem cell-associated phenotypes and that the added value of NK cell-based therapies in MB may be limited. We observed that the fractions of M2 macrophages in WNT, SHH, and group 4 MBs were

Correlation of the immune cells with mDNAsi
Correlation of the immune cell activity with mRNAsi  negatively associated with mRNAsi, indicating that M2 macrophages might suppress MB stem cells by impacting the transcriptome of MB stem cells. A recent study showed that M1 rather than M2 macrophages correlate more strongly with worse clinical outcome in SHH MB (Lee et al., 2018). These two results contradict the common view of tumor-promoting M2 macrophages and tumor-suppressing M1 macrophages. In many cancer types, M2 macrophage counts are associated with adverse outcomes Jensen et al., 2009;Kawachi et al., 2018;Medrek et al., 2012), and M1 macrophage infiltration is correlated with better prognosis (Ma et al., 2010;Mei et al., 2016). However, several studies suggest that the dichotomous M1/M2 classification of macrophages is oversimplified, and the role of tumor-associated macrophages is still controversial (Martinez and Gordon, 2014;Van Overmeire et al., 2014). Furthermore, our analyses showed only weak associations between mDNAsi and immune cells in MB. This result suggests that immune cells in MB are likely to have a weak effect on the methylome of MB stem cells. We interrogated CMap utilizing the gene expression signatures from SHH MB samples with high and low mRNAsi levels. The CMap analysis precisely identified some compounds that have been shown to specifically impact CSCs in other tumor types (Angeletti et al., 2016;Batsaikhan et al., 2014;Battula et al., 2017;Bonuccelli et al., 2017;Bozok Cetintas et al., 2016;Chen et al., 2015Chen et al., , 2016Cheng et al., 2017;Dominguez-Gomez et al., 2018;Garulli et al., 2014;Hong et al., 2011;Hou et al., 2018;Malkomes et al., 2016;Xiang et al., 2017;Xu et al., 2016;Yeh et al., 2013;Yin et al., 2018;You et al., 2009;Zhang et al., 2013;Zheng et al., 2013a,b). These compounds include the CDK inhibitors palbociclib and alvocidib, the AMPK inhibitor dorsomorphin, the IKK inhibitor BMS-345541, the smoothened receptor antagonist cyclopamine, the topoisomerase inhibitors topotecan and doxorubicin, the GABA receptor agonist ivermectin, the NF-jB pathway inhibitor auranofin, the MTOR inhibitor dactolisib, the AKT inhibitors MK-2206 and pyrvinium-pamoate, the HMGCR inhibitor simvastatin, the HDAC inhibitors apicidin, vorinostat, and givinostat, and the DNA synthesis inhibitor anisomycin. In addition, the survivin inhibitor YM155 (Brun et al., 2015), the AKT inhibitor pyrvinium (Li et al., 2014), and the RNA polymerase inhibitor triptolide  have been shown to exert anticancer effects on MB cells, although there were no results regarding effects on MB stem cells. More importantly, the CMap analysis identified the PI3K inhibitor GDC-0941, which has been demonstrated to target CD133-positive stem cell-like MB subpopulations (Ehrhardt et al., 2015). The mentioned compounds may present an avenue for the implementation of targeting MB stem cells. Given that the survival rates of MB patients treated with nonspecific multimodal therapies have reached a plateau (Ramaswamy and Taylor, 2017), targeting MB stem cells in parallel to nonspecific multimodal therapies may yield the most durable SHH MB remission. However, several limitations should be acknowledged for the current study. First, the ethnicities of populations in the GSE85218 dataset are primarily limited to Caucasian and African American, and the extrapolation of our findings to other ethnic groups needs to be further substantiated. Second, the 23-mRNA-based signature was not subjected to external validation because the appropriate independent cohorts with survival data were not available, and a robust signature should be validated externally in different datasets; thus, the prospective multicenter clinical trials are required to further validate the findings. Finally, the mechanisms underlying our findings have not been clearly elucidated here, and experimental studies on our findings should be carried out to facilitate our understanding of their functional roles in MB and their clinical application.

Conclusions
Taken together, our results provide a comprehensive characterization of MB stemness. The prognostic signature based on mRNAsi may contribute to personalized prediction of SHH MB prognosis and act as a potential biomarker for SHH MB prognostication and response to differentiation therapies in clinical practice.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Clinicopathological features of patients in the GSE85218 dataset. Table S2. Comparison of distribution of clinical characteristics between the training and validation set. Table S3. The multivariate Cox regression coefficients of the genes in the 23-mRNA-based prognostic model. Table S4. Comparisons of the predictive value of the 23-mRNA-based prognostic model and the random model based on a random subset of 23 genes. Table S5. Compounds with an enrichment score ≤ À95 that could target pathways associated with MB stemness.