Individual and joint performance of DNA methylation profiles, genetic risk score and environmental risk scores for predicting breast cancer risk

DNA methylation patterns in the blood, genetic risk scores (GRSs), and environmental risk factors can potentially improve breast cancer (BC) risk prediction. We assessed the individual and joint predictive performance of methylation, GRS, and environmental risk factors for BC incidence in a prospective cohort study. In a cohort of 5462 women aged 50–75 from Germany, 101 BC cases were identified during 14 years of follow‐up and were compared to 263 BC‐free controls in a nested case–control design. Three previously suggested methylation risk scores (MRSs) based on methylation of 423, 248, and 131 cytosine‐phosphate‐guanine (CpG) loci, and a GRS based on the risk alleles from 269 recently identified single nucleotide polymorphisms were constructed. Additionally, multiple previously proposed environmental risk scores (ERSs) were built based on environmental variables. Areas under the receiver operating characteristic curves (AUCs) were estimated for evaluating BC risk prediction performance. MRS and ERS showed limited accuracy in predicting BC incidence, with AUCs ranging from 0.52 to 0.56 and from 0.52 to 0.59, respectively. The GRS predicted BC incidence with a higher accuracy (AUC = 0.61). Adjusted odds ratios per standard deviation increase (95% confidence interval) were 1.07 (0.84–1.36) and 1.40 (1.09–1.80) for the best performing MRS and ERS, respectively, and 1.48 (1.16–1.90) for the GRS. A full risk model combining the MRS, GRS, and ERS predicted BC incidence with the highest accuracy (AUC = 0.64) and might be useful for identifying high‐risk populations for BC screening.


Introduction
Breast cancer (BC) is the most commonly diagnosed cancer and the leading cause of cancer death among women worldwide, accounting for nearly 2.08 million new cases and 630 000 deaths in 2018 (Bray et al., 2018). Among women aged 50-69 years, the detection of early-stage disease through mammography has led to a decline in BC mortality (Independent, 2012). Although mammography is widely used for BC screening, it has limitations such as high rates of false-positive results and overdiagnosis (Independent, 2012;Pace and Keating, 2014). Furthermore, screening offers so far (except for women who had family history of BC or mutation in BRCA1 or BRCA2) do not take interindividual variation of BC risk into account (Winters et al., 2017).
DNA methylation markers detected in whole blood have emerged as potential candidates for the identification of high-risk populations for earlier or further specific BC screening in recent years (Guan et al., 2018). While these markers have been identified in diverse study populations through different approaches (Joo et al., 2018;van Veldhoven et al., 2015;Xu et al., 2013;Xu et al., 2019;Yang et al., 2019), their predictive value in prospectively collected samples needs to be further validated. Additionally, genome-wide association studies (GWASs) have identified an increasing number of single nucleotide polymorphisms (SNPs) that are independently associated with BC risk (Mavaddat et al., 2015;Michailidou et al., 2015;Michailidou et al., 2017). Although these SNPs confer small risk individually, their combined effect can substantially influence BC risk. Genetic risk scores (GRSs) based on multiple SNPs can be used to stratify women according to their risk of developing BC which may lead to more refined, personalized prevention strategies (Burton et al., 2013;Mavaddat et al., 2015). Recently, Mavaddat et al. (2019) constructed a GRS based on 313 SNPs which discriminated BC cases from controls with an area under the receiver operating characteristic curve (AUC) of 0.63 in prospective studies conducted among white European populations. Besides epigenetic and genetic factors, reproductive (e.g., menarche, pregnancy, and menopause), lifestyle (e.g., alcohol consumption), and anthropometric (e.g., body mass index) factors, as well as the use of hormone medications (Madigan et al., 1995;Peto, 2011), have long been identified to be related to BC risk. Risk prediction models combining these known risk factors in 'environmental risk scores' (ERSs) have been developed (Dierssen-Sotos et al., 2018;Gail et al., 1989;Maas et al., 2016;Novotny et al., 2006;Park et al., 2013;Rudolph et al., 2018;Wang et al., 2016), but their predictive accuracy was found to be modest (Anothaisintawee et al., 2012).
In this study, we aimed to simultaneously assess the individual and joint performance of whole-blood DNA methylation markers, GRS, and ERS for BC incidence in a prospective cohort study.

Study population and data collection
We performed a case-control study nested within the ESTHER (Epidemiologische Studie zu Chancen der Verh€ utung, Fr€ uherkennung und optimierten Therapie chronischer Erkrankungen in der € alteren Bev€ olkerung) cohort, a population-based study, conducted in Saarland, Germany. Details of the ESTHER cohort have been previously described (Schottker et al., 2013). As shown in Fig. 1, 9949 older adults aged 50-75 years of whom 5462 were women were recruited by their general practitioners during routine health checkups between July 2000 and December 2002, and followed up thereafter. The participants completed a standardized self-administered questionnaire (collecting information on sociodemographic, reproductive, and lifestyle factors) and donated blood samples at baseline. Prevalent and incident cancer was determined by self-report and record linkage with the Saarland Cancer Registry. DNA methylation measurements and genotyping were performed in the baseline blood samples of the ESTHER participants. Overall, we identified 101 women with incident BC and 263 women without BC at baseline or during the follow-ups for whom both DNA methylation and genetic data were available. The study was approved by the ethics committees of the University of Heidelberg and of the state medical board of Saarland, Germany. Written informed consent was provided by all participants.

Methylation assessments
Blood samples collected at baseline (available for 98.8% of participants) were stored at -80°C until further processing. DNA was extracted from wholeblood samples using a salting out procedure (Miller et al., 1988). DNA methylation levels of 866836 cytosine-phosphate-guanine (CpG) loci were quantified by the Infinium Methylation EPIC (850K) BeadChip Assay (Illumina Inc., San Diego, CA, USA) (Zaimi et al., 2018). Briefly, 1 µg DNA was bisulfite converted, and 250 ng bisulfite-treated DNA was applied to the EPIC BeadChips following the manufacturer's instruction. GenomeStudioÒ (version 2011.1; Illumina Inc.) was used to extract DNA methylation signals from the scanned arrays (module version 1.9.0; Illumina Inc.). Due to its straightforward biological interpretation, the methylation level of a specific CpG site was quantified as a b-value ranging from 0 (no methylation) to 1 (full methylation) (Du et al., 2010;Xie et al. 2019). Illumina normalization and preprocessing methods implanted in Illumina's GenomeStudioÒ were utilized. Data were normalized to internal controls provided by the manufacturer. All controls were checked for inconsistencies in each measured plate. Probes with a detection P-value > 0.01 or with missing value > 10% of our samples were excluded from analysis (Zhang et al., 2017). Leukocyte composition was estimated using the algorithms of Houseman et al. (Houseman et al., 2012).

Genotyping
Extracted DNA from whole blood was genotyped using the Illumina Infinium OncoArray BeadChip (Illumina). General genotyping quality control assessment was done as previously described (Anderson et al., 2010). Genotypes for common variants across the genome were imputed using data from UK10K-1000 Genomes Project (phase 3, October 2014) with IMPUTE v2.3.2 (https://mathgen.stats.ox.ac.uk/impute/ impute_v2.html#download) after prephasing with SHA-PEIT software v2.12 (https://mathgen.stats.ox.ac. uk/genetics_software/shapeit/shapeit.html#download). Thresholds were set for imputation quality to retain both common and rare variants for validation. Poorly imputed SNPs were defined by an information metric I < 0.70 and excluded for the subsequent analysis. All genomic locations are given in NCBI Build 37/UCSC hg19 coordinates. All SNPs with a minor allele frequency (MAF) < 1% were excluded. After imputation, the SNP set consisted of 9 198 808 genotyped and imputed SNPs. PLINK v1.90b5.4 was then used to extract SNPs for the required regions of interest (Chang et al., 2015). risk that were identified using genetic variants as instrument (Yang et al., 2019). The second set included 280 BC-related CpGs identified in previous prospective epigenome-wide association studies (Joo et al., 2018;van Veldhoven et al., 2015;Xu et al., 2013). The third set included 144 CpGs which was identified and validated in independent prospective studies (Xu et al., 2019). CpGs with missing value > 10% in our sample were excluded from subsequent analyses. Furthermore, CpGs only included in the Infinium Methylation 27K BeadChip Array or the Infinium Methylation 450K BeadChip Array but not the Infinium EPIC Array were excluded, which left 423 CpGs from the first set (423-CpGs), 248 CpGs from the second set (248-CpGs), and 131 CpGs from the third set (131-CpGs). The individual CpGs and their associations with BC risk (none of which was statistically significant after adjustment for multiple testing by the Benjamini-Hochberg method) are provided in Table S1. These CpGs were classified into hyper-and hypomethylated CpGs according to the previously reported relationship with BC (Joo et al., 2018;van Veldhoven et al., 2015;Xu et al., 2013;Xu et al., 2019;Yang et al., 2019). MRSs were calculated as the sum of hypermethylated CpGs with methylation levels in the upper quartile of the distribution among controls, and of hypomethylated CpGs with methylation levels in the lower quartile of the distribution among controls.

Genetic risk score (GRS)
A newly established set of 313 SNPs which discriminated BC cases from controls with a modest AUC = 0.63 (95% CI: 0.63-0.65) in recent GWAS was used to build GRS (Mavaddat et al., 2019). SNPs were excluded from further analyses if they were missing in > 10% of our sample after imputation, if they were in high linkage disequilibrium (D' ≥ 0.95 and r 2 > 0.8) with each other, or if the MAF was low (< 1%). Specifically, we searched the website (proxy SNP website: https://ldlink.nci.nih.gov/) to find surrogates for the missing SNPs. By setting the criteria of D' ≥ 0.95, r 2 > 0.8, MAF ≥ 1%, distance < 250K bp, and missing value < 10% of our sample, only six SNPs (rs6656241, rs3791976, rs3008281, rs28489579, rs521667, and rs965352) could be used as surrogates for the six missing SNPs (rs56168262, rs3791977, rs66823261, rs4774565, rs527616, and rs6030585) in the ESTHER study. This resulted in inclusion of 269 SNPs in the GRS. GRSs for all eligible participants were calculated using the formula: GRS = b 1 X 1 + b 2 X 2 + . . . b k X k . . . + b n X n, where b k is the perallele log odds ratio (OR) for BC associated with the risk allele for SNP k reported in the previous independent GWAS (Mavaddat et al., 2019), X k is the number of risk alleles for the same SNP (0, 1, or 2), and n is the total number of SNPs. SNPs and corresponding effect sizes for risk alleles reported in the derivation of the GRS (Mavaddat et al., 2019) are summarized in Table S2.

Environmental risk scores (ERSs)
For building ERSs, we conducted a literature search to identify previously published ERSs used for BC risk prediction. We identified and included seven risk scores (Dierssen-Sotos et al., 2018;Gail et al., 1989;Maas et al., 2016;Novotny et al., 2006;Park et al., 2013;Rudolph et al., 2018;Wang et al., 2016), including two scores from two multicenter studies (study populations were mainly from Europe, the United States, and Australia) and one each from the United States, Czech, South Korea, China, and Spain. Four of the seven risk scores were built according to the previously reported score prediction algorithms (Gail et al., 1989;Novotny et al., 2006;Park et al., 2013;Wang et al., 2016). The remaining three scores (Dierssen-Sotos et al., 2018; Maas et al., 2016;Rudolph et al., 2018) were derived from beta coefficients of the corresponding risk factors which were reported from multivariable logistic regression used in each study. The proportion of missing values for all variables including in the ERSs was below 5%, and the missing baseline values were imputed by mean value of each incomplete variable [age at menarche, age at first live birth, parity, menopausal status, age at menopause, current use of menopause hormone therapy (MHT)] within groups of cases and controls, respectively. In case the variables were not available in our data sets and could not be replaced, we built the risk scores without them. Algorithms applied to obtain the risk scores are summarized in Table S3.

Descriptive analyses
Main characteristics of cases and controls were described using frequencies for categorical variables, and means and standard deviations (SDs) for continuous variables. The correlations between the risk scores were estimated by Pearson's correlation coefficients.

Associations of MRS, GRS, and ERS with breast cancer risk
Crude associations of MRS, GRS, and ERS with BC risk were assessed by unconditional logistic regression with adjustment for leukocyte composition in case of MRS in Model I. In Model II, associations were adjusted for the complementary risk scores. Risk estimates for each risk score were included in the models either as quartiles (lowest quartile defined as the reference category) or as continuous variables (calculating ORs for an increase in risk scores by 1 SD). The individual association of previously identified CpGs with BC incidence was also estimated by unconditional logistic regression with adjustment for age, batch effects, and leukocyte composition. Areas under the receiver operating characteristic curves (AUCs) were estimated for evaluating the performance of the three types of risk scores in BC risk prediction. In addition to exploring the predictive value of risk scores over the entire period of followup, analyses were repeated with follow-up time restricted to the initial 7 years after recruitment or to subsequent years, respectively. Tenfold cross-validation was employed to correct for potential overoptimism in prediction models. In 10-fold crossvalidation, the original sample was randomly partitioned into 10 subsamples, the nine subsamples were then used as training data for the derivation of the prediction model and the remaining single subsample was used as the validation data for testing the model (Lanfear et al., 2017). This process was repeated 1000 times. Additionally, the goodness of fit of the combined model was tested using the Hosmer-Lemeshow test to evaluate the calibration of multiple risk scores and using the tailed-based test by Song et al (Song et al., 2015). All statistical analyses were carried out in SAS 9.4 (SAS Institute, Cary, NC, USA), and 2-sided P-values < 0.05 were considered statistically significant. Table 1 presents the baseline characteristics of the study population. Of 5462 women aged 50-75 years recruited between July 2000 and December 2002, a total of 101 BC cases and 263 controls were included for whom both methylation and genomic data were available for the current analysis (Fig. 1). The median time to diagnosis for cases, defined as the time between recruitment/sample collection and BC diagnosis ranged from 0.24 months to 13.6 years [median (interquartile range): 6.8 (3.7-8.9) years]. Mean age was about 61 years for both cases and controls. The distributions of participant characteristics were similar between cases and controls except for age at menopause and parity. Controls were more likely to have had an early menopause and to have a higher parity.  Table 2 provides an overview of risk factors included in the previously proposed ERS and the performance of these ERSs in our study. The most commonly included environmental risk factors were age, age at menarche, age at first live birth, BMI, parity, alcohol consumption, menopausal hormone therapy, and number of first-degree relatives (FDR) with BC. Factors such as light at night (Wang et al., 2016), physical activity (Park et al., 2013), and sleep quality (Wang et al., 2016) were rarely included. The predictive accuracy of the ERS was limited with AUCs ranging from 0.517 to 0.594 in our cohort. The score of Dierssen-Sotos (Dierssen-Sotos et al., 2018) which was derived from a case-control study in a Spanish population achieved a relatively good performance and was used for subsequent analyses. For the score of Wang et al. (2016) who had provided separate algorithms for premenopausal women and postmenopausal women, the predictive performance was estimated by the algorithm for postmenopausal women only since most of the women in our study population were postmenopausal. Figure S1 shows the pairwise correlations between the three types of risk scores. Very low, insignificant positive correlations were observed for each pair of scores.

Results
Individual associations for MRS, GRS, and ERS (Dierssen-Sotos Score) with BC risk are summarized in Table 3. For GRS and ERS, having a score in the top quartile was associated with a significantly increased risk of BC compared to the lowest quartile without adjustment. However, none of the current MRS presented a significant association with BC risk. These results did not materially change after adjusting for the complementary risk scores and leukocyte composition. For example, after adjusting for MRS, ERS, and leukocyte composition, women in the highest GRS quartile had a 2.21-fold increased risk of BC compared with women in the lowest quartile. Adjusted ORs (95% CI) per standard deviation increase were 1.07 ( (Table S4).

Discussion
To our knowledge, this is the first study evaluating and comparing the predictive performance of whole-blood DNA methylation, and genetic and ERSs for BC incidence in a prospective cohort with up to 14 years of follow-up. All three types of risk scores were predictive of BC risk. A GRS based on multiple common variants (269 SNPs) predicted BC incidence with much higher accuracy than a MRS based on previously identified CpGs and an ERS derived from previous studies. The combination of MRS, GRS, and ERS enhanced the risk prediction with an AUC of approximately 0.64. Similar predictive accuracies of either individual or combined risk scores were observed in specific subgroups defined by time to diagnosis.  Previous studies have demonstrated the diagnostic efficiency of aberrant DNA methylation for BC (Guan et al., 2018). In the current study, we constructed MRSs based on previously identified CpGs and evaluated its predictive performance for BC incidence. Although the discriminatory power of the MRS was limited, the poor performance should not be misinterpreted as implying that the methylation status of those CpGs does not play a role in BC development. A potential explanation for low predictive value could be differences between the study populations of the previous studies in which the CpGs were identified and our study population. For example, the majority of the 248-CpGs (227 out of 248 CpGs) and all of the 131-CpGs were both identified within a prospective cohort of women who were BC-free themselves at recruitment, but had a biological sister with BC (Xu et al., 2013;Xu et al., 2019). So, the association of methylation at these CpGs with BC may not be generalizable to the women from the general population (Xu et al., 2019). Interestingly, the MRS based on 423 CpGs, which was derived from genetically predicted rather than directly measured CpG methylation levels,  performed even better in our study than the MRS of the other two sets that had been derived from measured methylation levels. It is also worth noting that there was hardly any overlap between the genetically predicted CpGs and methylation measured CpGs (423-CpGs and 248-CpGs, 423-CpGs and 131-CpGs) which supports suggestions that there may still be much room for the improvement in the derivation of MRS.
The performance of our version of GRS score is generally consistent with the observation from a recent GWAS used to establish a GRS for risk stratification and early detection of BC (Mavaddat et al., 2019). In the current analysis, the performance of our version of GRS (269 SNPs) with an OR (95% CI, per SD increase) of 1.48 (1.16-1.90) and an AUC (95% CI) of 0.61 (0.59-0.63) was similar albeit slightly less favorable compared with performance previously reported for the original GRS containing 313 SNPs [OR = 1.61, (1.57-1.65), AUC = 0.63 (0.63-0.65)]. A potential reason for the slightly weaker performance in our study could be incomplete validation, as 29 out of 313 SNPs were excluded because of missing values in > 10% of our sample.
We also conducted a literature search to identify ERSs used for BC risk prediction and validated their predictive performance for BC incidence in our study sample. Although many researchers have put substantial efforts in developing models for risk prediction, the overall results are not promising (Anothaisintawee et al., 2012). In the current study, most of the ERSs yielded poor predictive accuracies with AUCs ranging from 0.52 to 0.59. The lack of predictive value of ERSs might be explained by heterogeneity of study populations used to derive the scores and missing information on some risk factors (i.e., numbers of previous breast biopsies, breast inflammation, and light at night) in several models (Gail et al., 1989;Novotny et al., 2006;Wang et al., 2016). However, the main reason for the low predictive values would probably be that the risk factors included in these scores were not good predictors of BC risk. Notably, improved predictive accuracy was observed when ERS was combined with GRS, suggesting that rather than developing more risk scores based on environmental risk factors, future studies should explore possibilities of enhancing predictive performance by combining risk factors with novel laboratory markers, such as MRSs or GRSs.
We performed time-to-diagnosis-specific analyses to explore whether the individual risk scores or score combinations performed better in cases who had shorter time to diagnosis. In a subgroup analysis, only the performance of ERS (Dierssen-Sotos) and a combination of three risk scores supported the hypothesis, but differences between shorter and longer time to diagnosis were small and did not reach statistical significance. However, power for such subgroup analyses was very limited due to sample size limitations.
The predictive values of the three individual risk scores or score combinations for subgroup (e.g., defined by age and follow-up time) or subtype-specific BC screening warrant further exploration.
Compared with GRS, MRS was less predictive of BC risk. This, however, does not imply that MRS does not hold potential for risk stratification for BC risk. Whereas large-scale GWASs have been conducted since more than 10 years (Easton et al., 2007), largescale epigenome-wide association studies have been initiated only recently (Xu et al., 2013) and may yield substantially better MRS in the future. In contrast to GRS, MRS and ERS are though not static over lifetime. Although this may be considered a disadvantage for straightforward risk stratification (e.g., determining a starting age of screening), MRS and ERS may have additional use in reflecting the merits of specific prevention efforts.
Specific strengths of our study include its longitudinal design in which blood samples for methylation and genotyping analysis, along with environmental risk factor data, were collected years before BC diagnosis. This allowed for simultaneous evaluation and comparison of the ability of three different types of risk variables for BC risk prediction. In addition, the selection of incident BC cases through linkage to the Saarland Cancer Registry ensured an almost complete ascertainment of incident cancer cases in the population from which our study participants originated. However, our study has several limitations that require careful discussion. First, the limited sample size restricted the study's power and precision of estimates, particularly in stratified analysis. For example, the AUC for score combination in cases with time to diagnosis ≤ 7 years was larger than that in cases with time to diagnosis > 7 years, but this difference did not reach statistical significance. Future studies with larger sample sizes should address predictive values of MRS, GRS, and ERS and their combination for BC risk prediction in specific population groups. Likewise, future studies with larger sample size should also address potential improvement of prediction by including interactions between the included factors such as gene-environment interactions (Barrdahl et al., 2014;Campa et al., 2011;Maas et al., 2016;Nickels et al., 2013;Rudolph et al., 2015;Schoeps et al., 2014). Second, it is difficult to compare the results between our study and previous studies, where different methylation analysis techniques were used to investigate different CpG sites. Although the EPIC assay covers more CpGs across the whole genome compared with the Illumina 27 or 450K used in previous studies, it cannot be ruled out that some key loci with powerful diagnostic performance were missed. Finally, although we paid careful attention to internal validation, further validation in external, independent cohorts would be highly desirable. In particular, our results pertain to a Caucasian study population from Germany; hence, the performance of our versions of MRS, GRS, and ERS needs to be validated and potentially adapted for ethnically diverse populations.

Conclusion
In summary, despite these limitations, our study provides new detailed insights into the individual and joint associations of MRS, GRS, and ERS with BC risk. Although the contribution of all three types of risk scores to risk stratification is still modest for the time being, with GRS so far slightly outperforming MRS and ERS, our findings demonstrated that combing MRS, GRS, and ERS can enable more precise risk prediction and therefore holds potential to improve risk stratification in BC screening. With further improvements of GRS and MRS by large-scale international GWAS and EWAS efforts, and incorporation of additional risk signatures, such as microRNA-based signatures (Hamam et al., 2017), substantial further improvement of risk stratification should become possible which will enable more targeted, risk-adopted approaches in BC screening.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Individual association of the previously identified CpGs with BC risk in the ESTHER study and reported methylation direction. Table S2. SNPs used for genetic risk score construction. Table S3. Environmental risk scores from previous studies and availability/modification of related variables in ESTHER. Table S4. Performance of risk scores with respect to time to diagnosis in cases. Fig. S1. Correlation among risk scores.