Comparative genomics reveals that loss of lunatic fringe (LFNG) promotes melanoma metastasis

Metastasis is the leading cause of death in patients with advanced melanoma, yet the somatic alterations that aid tumour cell dissemination and colonisation are poorly understood. Here, we deploy comparative genomics to identify and validate clinically relevant drivers of melanoma metastasis. To do this, we identified a set of 976 genes whose expression level was associated with a poor outcome in patients from two large melanoma cohorts. Next, we characterised the genomes and transcriptomes of mouse melanoma cell lines defined as weakly metastatic, and their highly metastatic derivatives. By comparing expression data between species, we identified lunatic fringe (LFNG), among 28 genes whose expression level is predictive of poor prognosis and whose altered expression is associated with a prometastatic phenotype in mouse melanoma cells. CRISPR/Cas9‐mediated knockout of Lfng dramatically enhanced the capability of weakly metastatic melanoma cells to metastasise in vivo, a phenotype that could be rescued with the Lfng cDNA. Notably, genomic alterations disrupting LFNG are found exclusively in human metastatic melanomas sequenced as part of The Cancer Genome Atlas. Using comparative genomics, we show that LFNG expression plays a functional role in regulating melanoma metastasis.

Metastasis is the leading cause of death in patients with advanced melanoma, yet the somatic alterations that aid tumour cell dissemination and colonisation are poorly understood. Here, we deploy comparative genomics to identify and validate clinically relevant drivers of melanoma metastasis. To do this, we identified a set of 976 genes whose expression level was associated with a poor outcome in patients from two large melanoma cohorts. Next, we characterised the genomes and transcriptomes of mouse melanoma cell lines defined as weakly metastatic, and their highly metastatic derivatives. By comparing expression data between species, we identified lunatic fringe (LFNG), among 28 genes whose expression level is predictive of poor prognosis and whose altered expression is associated with a prometastatic phenotype in mouse melanoma cells. CRISPR/Cas9mediated knockout of Lfng dramatically enhanced the capability of weakly metastatic melanoma cells to metastasise in vivo, a phenotype that could be rescued with the Lfng cDNA. Notably, genomic alterations disrupting LFNG are found exclusively in human metastatic melanomas sequenced as part of The Cancer Genome Atlas. Using comparative genomics, we show that LFNG expression plays a functional role in regulating melanoma metastasis.

Introduction
Melanoma is an aggressive cancer that develops from the pigment-producing cells of the skin. In melanoma, as in other cancers, metastasis accounts for the majority of the mortality of patients with advanced disease (Chaffer and Weinberg, 2011;Damsky et al., 2010).
This complex multistep process requires melanoma cells to invade adjacent tissues, intravasate into the lymphatics or blood vasculature, extravasate at distant sites and ultimately colonise an organ or tissue. For this to happen, melanoma cells must evade the immune system and sculpt the host microenvironment (Fidler, 2003).
Several models of metastasis have been proposed including those that describe monoclonal and polyclonal seeding. It is also clear that once a cell has left the primary tumour, it may undergo further evolution (Turajlic and Swanton, 2016). This complex pattern of tumour cell dissemination and ongoing evolution complicates the identification of the genetic events that drive the metastatic process. Importantly, transcriptome profiling of primary tumours has identified expression changes shown to be predictive of metastasis (Paik et al., 2004;van de Vijver et al., 2002), and alterations found in metastases have been shown to be present in subclones in early primary lesions (Wardwell-Ozgo et al., 2013). These data support the idea that a proportion of cells within primary tumours may evolve, acquire or have intrinsic metastatic capabilities. Identifying those patients with tumours at high risk of metastasising could help identify individuals who may benefit from adjuvant therapies or more regular screening (Eggermont, 2016).
In this study, we set out to identify clinically relevant genes that confer enhanced metastatic capabilities upon melanoma cells. To do this, we used comparative functional genomics applied to gene expression predictors of patient survival, combined with expression data from murine cell line models that have different capabilities to colonise the lung, a major site of human melanoma metastasis. In this way, we identified a set of 28 genes associated with patient outcome that were also differentially expressed when weakly and highly metastatic mouse melanoma lines were compared. We focused on lunatic fringe (LFNG) that encodes for a glycosylating enzyme (O-fucosylpeptide 3-beta-N-acetylglucosaminyltransferase) that regulates NOTCH signalling (Moloney et al., 2000), and show an important role for this gene in controlling melanoma metastasis.

Survival analysis
Gene expression data generated using whole-genome cDNA-mediated annealing, selection, ligation and extension (DASL) arrays (Illumina Inc., San Diego, CA, USA) from 217 (Leeds) (Nsengimana et al., 2015) and 222 (Lund) (Jonsson et al., 2010) primary melanomas (209 cutaneous, 13 mucosal) were obtained. The Leeds data set (Leeds melanoma cohort, N = 204, and chemotherapy study, N = 13) was profiled on the human HT12.4 array, while the Lund cohort was profiled on the earlier HT8.3 version. Quality control and normalisation of these data sets has been published elsewhere (Jonsson et al., 2010;Nsengimana et al., 2015). Briefly, the HT8.3 version had a lower performance with only 7752 genes passing QC filters. The overlap between this set and the Leeds data set using HT12.4 was 7584 genes. Survival benefit of each gene (log 2 scale) was assessed in a Cox proportional hazards model using STATA v14.2 (STATACorp, Texas, USA) for melanoma-specific survival (MSS) in the Leeds data and overall survival (OS) using the data from Lund. Analysis of the Leeds data set was adjusted for patient age and sex. P-values were corrected for multiple testing (Benjamini-Hochberg false discovery rate, FDR). Kaplan-Meier curves were plotted comparing high to low gene expression relative to the median. Functional gene annotation and enrichment analyses of the genes that showed the same direction of association in both patient cohorts were performed using DAVID (Huang et al., 2009).

Cell lines
B16-F0 and B16-F10 cell lines were purchased from the American Type Culture Collection (ATCC), and the B16-BL6, K1735-P and K1735-M2 lines were obtained from the University of Texas M.D. Anderson Cancer Centre. All cell lines were screened for the presence of mycoplasma and other mouse pathogens (Charles River Laboratories, Wilmington, MA, USA). Cells were cultured at 37°C in 5% CO 2 in high glucose Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum, 29.2 mgÁmL À1 L-glutamine, 10 000 unitsÁmL À1 penicillin and 10 000 lgÁmL À1 streptomycin.

Nucleic acid extraction and sequencing
For whole-genome sequencing, DNA was extracted from cell pellets using the QIAGEN Puregene Core Kit A. Paired-end 75-bp libraries were prepared and sequenced using the Illumina HiSeq platform. Data have been deposited in the European Nucleotide Archive (ERP001691). RNA was extracted from cell pellets using the QIAGEN RNeasy Mini Kit. Five different vials were cultured and extracted per cell line, to obtain five independent biological replicates for each line. 1 lg of total RNA per sample was submitted for sequencing. Unstranded 75-bp paired-end barcoded libraries were prepared with the standard Illumina library preparation kit. RNA libraries were sequenced on the Illumina Hiseq platform and the data deposited in public databases (European Nucleotide Archive (ERP001690) and ArrayExpress (E-ERAD-94)).

Whole-genome data processing and somatic variant calling
Raw reads were mapped to the mouse reference genome (GRCm38p1) using bwa-mem (Li, 2013) v0.7.5 and PCR duplicates marked using Picard tools MarkDuplicates v1.72 (http://broadinstitute.github.io/ picard). Single nucleotide variants (SNVs) and short indels were called using Samtools mpileup (Li et al., 2009) v0.1.19-58-g3d123 cd, and the resulting variants were filtered using VCFTools (Danecek et al., 2011). Variants with variant quality QUAL < 20, or number of reads supporting the variant less than 5 (DP < 5) or SNPGAP < 10, were discarded. Due to the absence of a matched germline/normal sample from the exact mouse from which the cell lines were generated, we removed all variants reported by the mouse genomes project (Keane et al., 2011) for the genetic backgrounds of each cell line group. Similarly, variants located within AE 50 bp of structural variants reported by the mouse genomes project (Keane et al., 2011) were also discarded. Finally, functional consequences were predicted using ENSEMBL's variant effect predictor (v74) (McLaren et al., 2016).

Orthogonal validation of single nucleotide variants using Sequenom
A total of 262 SNVs (116 for the K1735 lines and 146 for the B16 lines) were selected for orthogonal validation using the Sequenom platform. These variants were randomly chosen using GATK's ValidationSiteSelector (v2.8-1-g932cd3a) from the set of variants that were identified to be present in all the cell lines from each group. All assays using the Sequenom platform were performed with three biological replicates of each line.

Somatic signature identification and comparison
Somatic mutational signatures were identified for each mouse cell line group using the filtered somatic single nucleotide variants (above). Signatures were identified using the non-negative matrix factorisation method from the SomaticSignatures R package (Gehring et al., 2015) (v 2.6.1). To compare these signatures to those reported in COSMIC (http://cancer.sanger.ac. uk/cancergenome/assets/signatures_probabilities.txt), we calculated cosine similarities as previously reported (Alexandrov et al., 2015).

Copy number calling
Copy number alterations were identified using Control-FREEC (Boeva et al., 2011) v6.7 with 50-Kb windows. Due to the lack of a matched normal for each cell line, CNVs were called relative to parental cell lines (B16-F0 and K1735-P); somatic CNVs for the B16-BL6 line were called using the BAM file for B16-F10.

Identification of differentially expressed genes
Raw paired-end reads were aligned to the mouse reference genome (GRCm38p1) using the splice-aware aligner Tophat2 (Kim et al., 2013) guided by ENSEMBL mouse annotation (v73). Subsequently, the number of uniquely mapped read pairs that were aligned to each gene within the annotation with a mapping quality > 10 were counted using htseq-count (Anders et al., 2015). Raw counts were normalised by calculating the fragments per kilobase per million (FPKM) values for each gene for each replicate. As a 'fit for use' quality control, blind pairwise comparisons across all the RNA-Seq samples were performed by calculating the Pearson's correlation coefficient based on the FPKM values of all protein-coding genes of the 25 sequenced samples. This information was used to group the samples using unsupervised hierarchical clustering using the package gplots in R (Gregory et al., 2013). To identify differentially expressed genes, all of the four possible paired comparisons between cell lines and their more metastatic derivatives were made (B16-F10 vs B16-F0, B16-BL6 vs B16-F10, B16-BL6 vs B16-F0 and K1735-M2 vs K1735-P) using DESeq2 (Love et al., 2014). Once dispersion estimates and normalised counts were calculated, genes with mean normalised counts < 10 were filtered out and P-values were re-adjusted using the Benjamini-Hochberg correction for multiple testing. All genes with P < 0.01 and a log 2 (foldchange) ≤À2 or ≥ 2 were considered as differentially expressed.

Mouse-human orthologue identification
To identify the human orthologues of mouse genes, the Compara module from the ENSEMBL Perl API was used (Herrero et al., 2016). In cases where a mouse gene had multiple orthologues in humans, the gene with the highest percentage of identity when comparing the human and mouse proteins was selected. Genes that had an ortholog classification of 'many2many' were not considered for further analysis. 2.10. Randomisation test to identify the expected number by chance of differentially expressed mouse genes overlapping and concordant with the list of genes associated with poor survival in melanoma patients Two independent randomisation tests were performed using two different sample sizes 1290 or 388. A total of 1000 samples with randomly selected mouse genes (out of the 15 412 genes with normalised fragment counts > = 10 expressed by B16-F0 or B16-BL6) of each sample size were generated. Genes were selected without replacement. For each randomly selected gene, a random direction of expression was assigned with the same probability as the one observed in the mouse data: underexpressed (0.4621429) or overexpressed (0.5378571). Then, each random sample was compared to the list of human genes associated with poor outcome with an FDR < 0.1 in our combined patient survival analysis, to identify the number of overlapping and concordant genes. Finally using the distribution of the number of overlapping and concordant genes across the 1000 samples, we calculated the probability of obtaining a number of overlapping genes or more as the one observed in the mouse cell line/human data comparison.

Cas9 gRNA selection
To select suitable gRNAs, we identified sequences in the exons of candidate genes in the ENSEMBL v71 annotation of the GRCm38 mouse reference genome ( 0 5-NNNNNNNNNNNNNNNNNNNNNGG-3 0 ). For each sequence, possible off-targets were identified using Cas-Offinder (Bae et al., 2014). We then used biomaRt (Durinck et al., 2009) to identify all possible off-targets with up to three mismatches whose expected cutting site overlapped an exon. Targeting sequences with zero exonic off-targets with up to three mismatches were selected. See Table S7.

Lfng cDNA rescue experiments
To confirm that the metastasis phenotypes we observed were due to the disruption of Lfng, we used plasmid rescue in the L1 cell line using the vector PB533A-2 carrying a flag-tagged full-length Lfng cDNA (synthesised by GeneArt) to generate the cell line L1-Lfng. L1-PB cells carrying the empty vector were used as a control.

Assessment of Lfng expression in cell lines by quantitative RT-PCR
For the comparison of Lfng expression levels between cell lines, RNA was extracted from 1 9 10 6 cells using the RNAeasy mini Kit (QIAgen, Manchester, UK) and cDNA was prepared using the SuperScript VILO Master Mix (Thermo) according to the manufacturers' instructions. RT-qPCR was performed using the Taq-Man  Fast  Advanced  Master Mix. Lfng (Mm01201988_m1) and B2m (Mm00437762_m1) assays were used for these studies. Reactions were performed in quadruplicate using the StepOnePlus system (Thermo Fischer Scientific, Waltham, MA, USA), and analysis was performed using the DDCt method (Schmittgen and Livak, 2008).

In vivo experimental metastasis assays
The experimental metastasis assay was performed as described previously (van der Weyden et al., 2017). For testing of the K1735-P and K1735-M2 cell lines, 1 9 10 5 cells were tail-vein-dosed into six-to eight-week-old wildtype C3H/HeJ mice. After 10 days, mice were humanely sacrificed and their lungs were collected into 10% neutral buffered formalin and then processed for histopathological analysis. For testing of the B16-F0, B16-F10, and B16-BL6 cell lines, 0.75 9 10 5 cells were tail-vein-dosed into six-to eight-week-old wild-type C57BL6/NTac mice and their pulmonary metastatic burden was determined 7 days later by macroscopic counting. For testing the Lfng-targeted g2d1 cells (and respective ca4 control cells), 4 9 10 5 cells were tail-vein-dosed, and for testing the Lfng-targeted L1 cells (and respective C1/C2 control cells), 5 9 10 5 cells were tail-vein-dosed; both into six-to eight-week-old wild-type C57BL/6NTac mice. cDNA rescue experiments, using the cell line L1-Lfng and the control L1-PB, were performed using 4 9 10 5 cells. The pulmonary metastatic burden was determined 10 days postdosing by macroscopic counting. In all cases, sexmatched mice were used. The care and use of all mice in this study was in accordance with the Animals (Scientific Procedures) Act 1986 Amendment Regulations 2012, and all procedures were performed under a UK Home Office Project licence (PPL 80/2562). All mice were housed in individually ventilated cages (Techniplast GM500) receiving 60 air changes per hour, in a specific pathogen-free environment with ad libitum access to autoclaved water and food (Mouse Breeders Diet, Laboratory Diets, 5021-3). Cages were filled with aspen bedding substrate, with a nestlet and fun tunnel for environmental enrichment. There was a 12-h light/dark cycle with no twilight period with a temperature of 21°C AE 2°C and a humidity of 55% AE 10%. Throughout the experiment, the welfare of the mice was monitored with daily visual checks.
2.18. Whole exome sequencing of the L1 cell line DNA from L1 cells was exome-sequenced using Agilent mouse whole exome baits. A 75-bp paired-end library was prepared and sequenced on the Illumina HiSeq2500 platform. Data were analysed as above and are available in the European Nucleotide Archive (ERP015062).

mRNA expression predictors of prognosis in primary melanoma
Both tumour depth (Breslow thickness) and ulceration are established predictors of melanoma metastasis (Nsengimana et al., 2015), but the underlying mechanisms that drive metastasis are unknown. We first set out to identify genes whose expression levels were associated with poor outcome. To do this, we analysed the expression profiles of primary melanomas from two previously published studies from Leeds (the Leeds Melanoma Cohort and chemotherapy studies, n = 217) and from the Lund Melanoma Research Group (n = 222) (Jonsson et al., 2010;Nsengimana et al., 2015). Demographic information for these cohorts is provided in Table S1. Tumours from both cohorts have been analysed using Illumina DASL arrays such that the expression of 7584 genes may be assessed. For the Leeds cohort, melanoma-specific survival (MSS) data were available, whereas overall survival (OS) was recorded for the Lund cohort. Survival analyses stratifying by gene expression were performed using the Cox proportional hazards model. In this way, we identified 976 genes whose expression levels were significantly associated with patient outcome in both cohorts (FDR < 0.1; Fig. 1A, Table S2). Of these genes, 78.17% (763/976) showed the same direction of association in both cohorts. These genes included SKP2 (Chen et al., 2011), TOP2A (Song et al., 2013), SOX4 (Jafarnejad et al., 2010), MAP2 (Soltani et al., 2005) and CTLA4 (Hannani et al., 2015), all of which have been associated with patient outcome in melanoma. Gene enrichment analysis found that biological processes including epidermis development, keratinocyte differentiation and immune response were overrepresented (FDR < 0.01; Table S3)biological processes previously reported to be important in the development of melanoma metastasis (Bald et al., 2014;Golan et al., 2015).

Genomic characterisation of murine melanoma cell lines with contrasting metastatic capabilities
To facilitate comparative analyses, we selected five mouse melanoma cell lines with different metastatic capabilities: B16-F0, B16-F10 and B16-BL6 derived from C57BL/6 mice and K1735-P and K1735-M2, derived from the C3H strain (Fidler, 1970(Fidler, , 1973Kripke, 1979;Kripke et al., 1978). Prior to genomic analysis, we validated the metastatic capabilities of these lines in vivo using an experimental metastasis assay ( Fig. 1B-C). Consistent with previous reports (Poste et al., 1980;Talmadge and Fidler, 1982), B16-BL6 cells were highly metastatic when compared to B16-F10 or B16-F0 cells and K1735-M2 cells were highly metastatic when compared to K1735-P cells. Spectral karyotyping of these cell lines showed high levels of polyploidy and multiple chromosomal aberrations (Figs. S1-S2 and Table S4). We sequenced each of these lines to 30-56x whole-genome coverage, using the Illumina HiSeq platform. To identify somatic mutations (SNVs), we mapped these data to the reference C57BL/6J genome (GRCm38) and filtered the calls using variants described by the Mouse Genomes Project (Keane et al., 2011) and for quality (as detailed in Fig. S3). The number of variants shared among the lines within the B16 and K1735 groups is shown in Fig. 1D. The B16 lines showed higher numbers of somatic SNVs and short indels (~< 50 bp) than the K1735 lines, with an average of 267 566 and 243 913 SNV, respectively (Fig. S4). A copy number analysis was also performed ( Fig. S5 and Table S5). To assess our variant calling, we randomly selected 262 variants for validation by Sequenom genotyping (146 identified from the B16 cell lines and 116 from the K1735 lines), obtaining an overall validation rate of 90.86% for the B16 lines and 76.72% for the K1735 lines (Fig. S6). In B16 cell lines, the predominant mutation type was T > G ( Fig. 2A) and the predominant mutational signature was Mmus-S1 (Fig. 2B), which shows highest similarity to human mutational signature, signature 17 (Alexandrov et al., 2013) (cosine similarity 0.872)a signature whose aetiology is currently unknown but has been observed in melanoma tumours (http://cancer.sanger.ac.uk/cosmic/signatures). In K1735 cell lines, the predominant mutational signature was Mmus-S2 (Fig. 2B), with similarity to the UV light signature reported by COSMIC (signature 7; cosine similarity 0.597), which is in keeping with the genesis of these lines following the combined administration of UV light and croton oil (Kripke, 1979;Kripke et al., 1978;Talmadge and Fidler, 1982). Further characterisation revealed that both K1735 cell lines carried a homozygous activating mutation in Nras (p.G13D) and deletion of the first exon of Cdkn2a (p19 gene), as previously reported (Melnikova et al., 2004) (Fig. S7), as well as an unreported Trp53 mutation (p.T74P) in K1735-P cells (Fig. 2C). B16 cell lines carried a deletion of the entire Cdkn2a locus, as previously reported (Melnikova et al., 2004) (Fig. S7), as well as an unreported heterozygous missense mutation in Braf (p.C263R; predicted to be deleterious by SIFT), a missense Trp53 mutation (p. N125D) and a mutation in Pten (p.T131P; Fig. 2C). Finally, we could observe that mutations in Rac1 and Nf1 were only present in the more invasive derivative lines. For instance, the Rac1 missense mutation Mmus−S2 (p.A59S) was present only in B16-F10 cells. Similarly, the splice site variant (Chr11.79408779T>G) within Nf1, was observed in B16-BL6 and K1735-M2.

Transcriptomic characterisation of murine melanoma cell lines
We generated RNA-seq data from five biological replicates for each of the five melanoma cell lines. We mapped the reads against the GRCm38 mouse reference genome, counted the number of read pairs and verified the correlation among biological replicates (r > 0.95; Fig. S8). In an effort to identify changes in RNA levels that associate with higher metastatic capabilities, we identified all genes that were differentially expressed between the parental lines (B16-F0, K1735-P) and their more metastatic derivatives (B16-F10, B16-BL6 and K1735-M2). To do this, we performed all possible paired comparisons within each group using DESeq2 (Love et al., 2014). Genes were classified as differentially expressed if their P-value, after multiple testing correction, was P-adj < 0.01, with an expression change of fourfold or more. In this way, we identified a total of 1430 genes that were differentially expressed (Table S6). qPCR was performed on selected genes for validation (Fig. S9). Notably, no genes were consistently differentially expressed across the comparisons of B16 and K1735 parental lines to their more metastatic derivatives (Fig. 2D-E), suggesting that different mechanisms confer metastatic potential in these cell lines series.

Identification of conserved putative regulators of metastatic colonisation in melanoma
To identify putative regulators of metastatic colonisation in melanoma, we next took a comparative genomics approach (Fig. 3A). We identified the human orthologues for all 1430 differentially expressed genes identified from the mouse melanoma cell line comparisons. For each mouse gene, we selected the orthologue with the highest protein sequence identity between mouse and humans. All paralogous genes were discarded. These criteria retained 1290 of the 1430 differentially expressed genes. We intersected these genes with the 7584 genes analysed in both human cohorts, which left 338 genes; 61 of which were significantly predictive of survival in both human cohorts (FDR < 0.1). Of these 61 genes, 28 genes showed the same direction of expression change (up-or downregulation) in relation to poor patient outcome and cell line metastasis phenotype (Table 1, Fig. 3B). A summary of the gene numbers obtained through each stage of our analysis is presented in Fig. S10. To assess the statistical significance of this result, we performed two independent randomisation tests revealing that the probability of obtaining 28 concordant genes by chance when intersecting a gene set with the human survival data was P (x ≥ 28) = 0.024 (when n = 1290) and P (x ≥ 28) = 0 (when n = 388) (Fig. S11).
Of the above-mentioned 28 genes, only one was upregulated in poor outcome patients, specifically MID1. Notably, 5 of 28 genes we identified have previously been reported to affect melanoma metastasis: CD82, NDRG2, RUNX3, CCL5 and HDC. For example, reports suggest that CD82 expression in melanoma cells inhibits tumour cell extravasation and lung metastasis formation in vivo (Khanna et al., 2014); upregulation of CD82 predicted better outcome in our analysis of two independent cohorts (Table 1). In addition to this, the tumour suppressor N-myc downstream-regulated gene 2 (NDRG2) is known to restrict melanomagenesis by regulating Mitf expression (Kim et al., 2008). Similarly, the tumour suppressor RUNX3 has been shown to be downregulated in metastatic melanoma lines when compared to primary melanoma or healthy skin (Kitago et al., 2009), while expression of the chemokine and leucocyte chemoattractant CCL5 in B16 cells strongly suppresses lung metastasis (Aravindaram et al., 2009). Finally, the histidine decarboxylase (HDC) gene encodes a protein whose function is to convert L-histidine to histamine. Histamine has been shown to play an important role in immune cell function (Hansson et al., 1999), and a histamine/IL-2 combination has been used to increase T-cell responses in stage IV melanoma patients (Asemissen et al., 2005). In addition to the five genes mentioned above, an additional nine genes (CCBE1, GPX3, PTK2B, LSP1, DDX60, PARP14, GABRE, JUP and MID1) have been associated with metastasis in other types of epithelial or solid cancers (Aktary and Pasdar, 2013;Bachmann et al., 2014;Hao et al., 2010;Jeltsch et al., 2014;Koral et al., 2015;Yue et al., 2015;Zhang et al., 2014b).
Next, we ranked the 28 identified genes based on their hazard ratios and overall patient survival, as well as their P-values after multiple test correction (Table 1). Based on these criteria, we selected the gene LFNG for further analysis.
Lunatic fringe (LFNG) is a glycosylating enzyme that post-translationally modifies Notch receptor proteins (LeBon et al., 2014). LFNG-mediated glycosylation of Notch receptors alters the binding affinity of Notch proteins with their ligands and activation of Notch receptors by delta-like ligands (Kakuda and Haltiwanger, 2017). In our analyses, low RNA levels of LFNG were shown to be associated with poor outcome in both patient cohorts (Fig. 3C-D). Moreover, LFNG is prognostic for melanoma-specific survival independent of Breslow thickness. Furthermore, B16-BL6 cells had decreased Lfng expression when compared to its B16-F0 parental line (log 2 (foldchange) = À2.185773, P = 1.069061 9 10 À6 , negative binomial Wald test with Benjamini-Hochberg correction), which was confirmed by qPCR (Fig. S12A).

Lfng disruption enhances the lung colonisation capabilities of CRISPR/Cas9-targeted melanoma cells
To test the effect of Lfng disruption on the metastatic capabilities of melanoma cells, we used CRISPR/Cas9 to target Lfng in order to determine whether this may confer enhanced metastatic capabilities upon the weakly metastatic B16-F0 cell line (Fig. S12B). We generated two independently targeted Lfng null B16-F0 clones, termed g2d1 and L1 (Fig. 4A-B). G2d1 cells carried a single base insertion resulting in a frameshift lossof-function mutation (Fig. S13) and L1 cells carried a 4.8-kb deletion encompassing exons 1-4 which we verified by exome sequencing (Fig. S14). Additionally, exome sequencing identified that L1 cells carried 413 variants (334 SNVs and 79 Indels) not present in B16-F0 cells. Of these variants, 120 were missense, two were nonsense (altering genes Aqp3 and Vmn2r115) and three were frameshift mutations (affecting Cyp7b1, Olfr657, and Vmn2r115). In an experimental metastasis assay, both g2d1 cells (Fig. 4C-D) and L1 cells (Fig. 4E-F) showed significantly increased numbers of pulmonary metastases when compared to control cells (transfected with an empty guide vector). These results directly demonstrate that Lfng loss enhances the metastatic capabilities of B16-F0 cells. It was notable that compared to clone g2d1, clone L1 reproducibly produced numerous smaller lung foci. To further confirm the role of Lfng in metastasis, we used a full-length Lfng cDNA to rescue the metastatic phenotype of L1 cells (Fig 4G-H).

Analysis of somatic LFNG mutations in human melanomas
Using two large patient cohorts (Jonsson et al., 2010;Nsengimana et al., 2015), we showed that reduced expression of LFNG is associated with poor patient outcome. We next evaluated the prevalence of inactivating somatic LFNG mutations in The Cancer Genome Atlas (TCGA) cutaneous melanoma collection (Cancer Genome Atlas Network, 2015) comparing primary vs metastatic melanoma. In this way, we identified five samples (5/481) with nonsilent mutations (four missense and one nonsense), all of which were in metastases. Due to short follow-up times reported for melanoma primary tumours by TCGA, assessment of the association between LFNG expression and survival is not possible.

Analysis of NOTCH pathway gene expression in B16-derived melanoma lines
To explore the role of LNFG in metastasis further, we next used our cell line transcriptome sequence data to examine the expression of Notch pathway components (Fig. 5). We observed significant changes in Notch pathway elements in B16-BL6 such as upregulation of Notch2, Jag1, Hes1, Esr1 and Rbpj, as well as downregulation of Rfng and Dll3, when compared to the B16-F0 line. This result is in keeping with a functional role for Lfng expression in the metastatic phenotypes observed.

Discussion
In this study, we combined expression data from two cohorts of melanoma patients with the analysis of a selection of mouse cell lines with different metastatic capabilities to identify LFNG as a regulator of metastasis. We validated these results using CRISPR genome editing to transform a weakly metastatic mouse cell line to a highly metastatic line by the disruption of Lfng. Although our focus here was LFNG, many of the other genes we discovered in our analysis may also regulate melanoma metastasis. For example, the DEXD/H box helicase 60 (DDX60) gene, a known regulator of the antiviral response and a DNA-/RNAbinding protein, has been reported to be associated with the development and prognosis of squamous cell  carcinoma (Fu et al., 2016). Similarly, PTK2B has been reported to be involved in the CCR7-mediated regulation of metastasis in squamous cell carcinoma (Yue et al., 2015) and was found to be downregulated in our study. Notch signal transduction occurs when a Notch ligand (Jag or delta-like) from a sender cell binds to a Notch receptor on an adjacent receiver cell (Bray, 2016). This event triggers the proteolytic cleavage of the Notch intracellular domain (NICD). Subsequently, NICD migrates to the nucleus and binds RBP-JK to transcribe target genes. Notch receptor-delta-like ligand binding affinity is regulated by the fringe glycosylating enzymes (LFNG, MFNG and RFNG) (Kakuda and Haltiwanger, 2017). In this study, we show that low RNA levels of LFNG predict worse outcome in patients with melanoma. Further, we find that Lfng is downregulated in B16-BL6 cells when compared to parental B16-F0 cells. Lfng ablation in B16-F0 cells using CRISPR enhanced the number of pulmonary metastases in support of its role in regulating metastasis. Previous work in pancreatic cancer has showed that deletion of Lfng in a Kras LSL-G12D mouse model upregulates Notch3 and Hes1, accelerating cell proliferation (Zhang et al., 2016). A role for LFNG in breast and prostate cancer has also been reported (Xu et al., 2012;Zhang et al., 2014a). In highly metastatic B16-BL6 cells, we observed overexpression of the notch effectors, Hes1 and Notch2 (Fig. 5) as well as Rbpj, Jag1, Maml2 and Mitf, relative to B16-F0 cells. Notably, delta-like ligand (Dll3), as well as the fringe genes, Lfng and Rfng, were underexpressed (Fig. 5) in comparison with the Jag ligands, Jag1 and Jag2. This suggests that B16 cells are more suited to activate the Notch signalling pathway via Jag ligands.

G G C G A T G A A G A C G T C G C G A G G G G PAM
In summary, this study shows how a cross-species approach provides a useful framework for the identification of clinically relevant genes that play a role in metastasis. Identifying genetic markers such as LNFG is important as it helps to identify those patients at greatest risk of disease spread and may help guide their management.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Fig. S1. Spectral Karyotyping of the B16 cell lines. Spectral karyotype analysis of ten different metaphases from (A) B16-F0, (B) B16-F10 and (C) B16-BL6 cells.
High levels of polyploidy, multiple chromosomal aberrations and at least one event of whole genome amplification can be observed. Fig. S2. Spectral Karyotyping of the K1735 cell lines. Spectral karyotype analysis of ten different metaphases from (A) K1735-P and (B) K1735-M2. High levels of polyploidy, multiple chromosomal aberrations and at least one event of whole genome amplification can be observed. Fig. S3. Cell line somatic variant calling and filtering strategy. Diagram showing the multiple steps followed to call single nucleotide variants and short indels from whole genome data of the murine lines in the absence of a matched normal sample from the same mouse.    (Keane et al., 2011), the K1735-P and K1735-M2 cell line genomes. Fig. S8. Hierarchical clustering of the murine cell line RNA-seq data. Heat map showing the hierarchical clustering of different biological replicates sequenced based on the Pearson correlation coefficient obtained from all log2(TPM + 1) values across all the protein coding genes. The two groups of cell lines can be clearly observed. (C-F) Gene expression levels with DDCT value being relative to the respective parental line. Fig. S10. Summary of genes assessed to identify regulators of metastatic colonisation by comparative genomics. Flow chart showing the number of genes obtained throughout the different stages of our analysis to identify regulators of metastatic colonisation in melanoma. Fig. S11. Number of overlapping and concordant genes on random simulated samples. The null distribution of overlapping genes observed across 1000 samples in a set of randomisation tests with sample sizes of (A) n = 1290, (B) n = 388. The dashed red line shows the number of genes observed in our main analysis. The probability of obtaining the same number of overlapping genes as the ones observed in the real data is shown. Fig. S12. Validation of reduced Lfng expression in B16-BL6 cells and plasmid constructs used to generate Lfng-deficient B16-F0 cells. (A) Fold change in expression of Lfng in B16-BL6 cells against B16-F0 cells as measured by qPCR, whiskers shows the standard error and P-value was calculated using two tailed t test from 3 biological replicates. (B) Schematics of the different plasmids used. cells against control cells as measured by qPCR, whiskers shows the standard error and P-value was calculated using two tailed t test from 3 biological replicates. This frameshift mutation, although disrupting the gene, appears to cause an upregulation of Lfng mRNA expression although the expression difference is not statistically significant. (C) Pairwise alignment using CLUSTALX 2.1 between mouse Lfng protein (from Transcript ENSMUST00000031555) and the resulting predicted protein in clone (g2d1) mutated Lfng alleles. The single base insertion at the Lfng locus causes a frameshift that introduces a stop codon 36 amino acids downstream of the mutation site.  Table S1. Patient cohort demographic information. Demographic information for the two patient cohorts analysed where available. Table S2. Predictors of patient outcome in both melanoma patient cohorts. Gene name, hazard ratios, confidence intervals, P-value and corrected P-values for all the genes with a P-val < 0.1 after applying the FDR correction. Table S3. Survival predictors gene set functional annotation enrichment. Results of the functional annotation enrichment analysis performed with DAVID with the list of gene expression predictors of outcome .  Table S4. Summary of chromosomal aberrations identified by spectral karyotyping of the murine melanoma cell lines. Summary of chromosomal aberrations detected by spectral karyotyping in the different mouse melanoma cell lines. Table S5. Summary of Copy Number Variants identified on the mouse melanoma cell line genomes. Copy number variants (CNV) calls identified in the cell line genomes for the parental lines (B16-F0 and K1735-P). Somatic CNVs are reported for the metastatic lines (B16-F10, B16-BL6 and K1735-M2). Table S6. Differentially Expressed genes identified across the comparisons of all the murine melanoma cell lines. Information on the 1430 genes that were differentially expressed throughout all the comparisons. Table S7. Oligos and gRNA sequences.