The fusion landscape of hepatocellular carcinoma

Most cases of hepatocellular carcinoma (HCC) are already advanced at the time of diagnosis, which limits treatment options. Challenges in early‐stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated. Here, we employed STAR‐Fusion and identified 43 recurrent fusion events in our own and four public RNA‐seq datasets. We identified 2354 different gene fusions in two hepatitis B virus (HBV)‐HCC patients. Validation analysis against the four RNA‐seq datasets revealed that only 1.8% (43/2354) were recurrent fusions. Comparison with the four fusion databases demonstrated that 19 recurrent fusions were not previously annotated to diseases and three were annotated as disease‐related fusion events. Finally, we validated six of the novel fusion events, including RP11‐476K15.1‐CTD‐2015H3.2, by RT‐PCR and Sanger sequencing of 14 pairs of HBV‐related HCC samples. In summary, our study provides new insights into gene fusions in HCC and may contribute to the development of anti‐HCC therapy.

Most cases of hepatocellular carcinoma (HCC) are already advanced at the time of diagnosis, which limits treatment options. Challenges in early-stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated. Here, we employed STAR-Fusion and identified 43 recurrent fusion events in our own and four public RNA-seq datasets. We identified 2354 different gene fusions in two hepatitis B virus (HBV)-HCC patients. Validation analysis against the four RNA-seq datasets revealed that only 1.8% (43/2354) were recurrent fusions. Comparison with the four fusion databases demonstrated that 19 recurrent fusions were not previously annotated to diseases and three were annotated as diseaserelated fusion events. Finally, we validated six of the novel fusion events, including RP11-476K15.1-CTD-2015H3.2, by RT-PCR and Sanger sequencing of 14 pairs of HBV-related HCC samples. In summary, our study provides new insights into gene fusions in HCC and may contribute to the development of anti-HCC therapy.

Introduction
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide (Zhou et al., 2016). Even though advanced early surveillance technology has improved the life of patients diagnosed at an early stage, most patients are diagnosed with latestage of HCC. Furthermore, HCC patients do not show improved long-term disease-free survival or overall survival after surgical resection and auxiliary medication (Eggert et al., 2013;Kamiyama et al., 2009;Llovet et al., 2015;Ye et al., 2003). One of the main reasons for this may lie in the complexity of the genetic background of HCC (Llovet et al., 2015).
Fortunately, recent advances in high throughput sequencing technology have helped provide deeper Abbreviations HBV, hepatitis B virus; HCC, hepatocellular carcinoma; NSCLC, non-small-cell lung cancer; qRT-PCR, quantitative real-time polymerase chain reaction.
insights into the genomic and transcriptome landscape of cancer. Using these sequencing technologies, researchers could identify large numbers of mutations, insertions, deletions, and fusions as well as chromosome rearrangements in different types of cancers (Gerlinger et al., 2012;Miao et al., 2014;Shibata and Aburatani, 2014;Xue et al., 2016) . Previous studies have demonstrated that fusion genes play an important role in tumorigenesis and cancer progression (Mitelman et al., 2007;Soda et al., 2007) and represent one of the most promising therapeutic targets in human malignancy (Cortes et al., 2012;Kazandjian et al., 2014;Rutkowski et al., 2010;Shaw et al., 2014). For example, the first fusion gene, Philadelphia chromosome, was discovered in 1960 and approved as the therapeutic biomarker of chronic myeloid leukemia in 2001 (Cohen et al., 2002;Nowell, 1960;Topaly et al., 2001). In addition, several highly recurrent fusion genes in specific tumor types have been well characterized. For example, Soda et al. (2007) showed that nearly 6.7% of non-small-cell lung cancer (NSCLC) patients carry the EML4-ALK fusion. Approximately 55% of prostate cancer showed the presence of ERG fusion (Hessels and Schalken, 2013). The DNAJB1-PRKACA fusion was found in 100% of fibrolamellar HCC (15/15) (Honeyman et al., 2014).
However, the identity of fusion genes in HCC has not been comprehensively investigated. A previous study re-analyzed RNA-seq data of normal liver tissue and HepG2 cells from the National Center for Biotechnology Information Sequence Read Archive database and identified 46 fusion genes (Lin et al., 2014a). Another study only detected five fusion genes from 11 HCC tissues and 11 paired portal vein tumor thrombus tissues . Owing to the limited numbers of samples and different analysis strategies, the studies did not identify recurrent fusion genes.
Here we used RNA-seq data of multiple lesions of two HCC patients to explore gene fusions in HCC and validate potential fusions using publicly available RNA-seq datasets of HCC. Our efforts unveiled several novel and recurrent fusions in HCC, suggesting their potential as diagnostic markers or molecular therapeutic targets.

Patients and clinical samples
Two typical multi-focal hepatitis B virus (HBV)-HCC patients (Fig. 1A) were enrolled for the study, which was previously reported (Miao et al., 2014). The raw RNA-seq data were deposited at the European Genome-phenome Archive with accession number EGAS00001000372.

Analysis of transcriptome data
For raw RNA-seq data, we first assessed the quality of sequencing reads by FASTQC software (Babraham Institute, Cambridge, UK) and then discarded low-quality reads with a quality score < 20 using the trimmomatic tool. Next, for each sample, we aligned the cleaning read to the hg19 reference genome using STAR v2.4.1 (The Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA), an ultrafast RNA-seq aligner. For each read, no more than two mismatches were allowed in the alignment process. Reads that were mapped to distinct genes in the reference genome were output into the chimeric-reads file and used for detection of fusion genes.

Identification of fusion transcripts
We used STAR-Fusion integrated into STAR software, which was sufficient for fusion RNA prediction compared with other methods (Haas et al., 2017;Kumar et al., 2016;Nicorici et al., 2014;Stransky et al., 2014), to identify potential fusion genes. Reads deposited in the chimeric-reads file indicated putative fusions. STAR-Fusion used reads that aligned with distinct genes to detect candidate fusion genes. To reduce the number of false-positive fusion genes, the length of one read aligned to distinct genes was not less than 15 bp. Putative fusions between homologous genes were also discarded. Furthermore, we remained fusions with at least three junction reads, which provided direct evidence about a fusion. We also removed the fusions between mitochondria and autosomes.

Patients and clinical samples
Eleven pairs of frozen HBV-related HCC samples combined with their corresponding adjacent non-tumor liver tissues and three multiple lesions of patient samples were obtained from HBV-HCC patients who underwent hepatectomy at Peking Union Medical College Hospital (PUMCH). All patients had pathologically confirmed HCC and did not receive any anticancer treatment prior to surgery. Fresh tissue samples were collected in the operating room and processed within 15 min of resection. Snap-frozen tissues were stored at À80°C for subsequent analyses. The experiments were undertaken with the understanding and written consent of each subject. The study methodologies conformed to the standards set by the Declaration of Helsinki and were approved by ethics committee of PUMCH.

RT-PCR and Sanger sequencing
Total RNA was isolated using TRIzol reagent (Life Technologies, Carlsbad, CA, USA). First-strand cDNA was synthesized using a High Capacity cDNA Reverse Transcription kit (Life Technologies) according to the manufacturer's instructions. The amplified bands were gel-purified and later subjected to Sanger sequencing.

Quantitative real-time PCR (qRT-PCR)
The cDNA was synthesized from 1.5 lg of total RNA using the High Capacity cDNA Reverse Transcription Kit (Life Technologies). Real-time PCR was performed using Power SYBRÒ Green Master mix (Applied Biosystems, Foster City, CA, USA) and a 7500Fast TM Real-Time PCR System (Applied Biosystems). GAPDH gene expression was included as an internal control. The relative expression levels of the fusion genes were calculated using 2 ÀDDCT values. All statistical analyses were performed using SPSS 17.0 software (SPSS Inc., Chicago, IL, USA). For statistical comparisons, Student's t-test was performed. The fusion gene-specific primers and the primers for GAPDH are listed in Table S7.

Landscape of fusion events in HCC
Transcriptome analysis has been widely used to identify gene fusions in human cancers (Bao et al., 2014;Stransky et al., 2014;Yoshihara et al., 2015). To gain insight into fusion events in HCC, we aligned transcriptome reads to the hg19 reference genome by using STAR software (Dobin et al., 2013). An average of~86 million reads per sample were uniquely mapped to the reference genome (Table 2). We found that 80% (15 302) of expressed genes were protein-coding genes. We also found that 14% (2651) of expressed genes were long noncoding RNA (lncRNA) and 5% were pseudogenes annotated in GENCODE (Fig. S1). We then used the parameters (Stransky et al., 2014) to identify fusion events.
As a result, 2354 different gene fusions with more than three junction reads were identified in the seven samples. Among them, HCC samples possessed more fusion events (Fig. 1B) and more involved fusion genes than adjacent normal tissues. Furthermore, the number of gene fusions gradually increased across the PI_N, PI_P, PI_V and PI_M samples (Fig. 1B), which was consistent with intrahepatic metastasis process. The right lesion of patient PII showed 241 fusion events, compared with the 193 of the left lesion. Among the 2354 fusions, we found 20 (0.9%) fusions appearing in more than two samples and they could clearly classify all samples into two patients (Fig. 1E).
We next classified fusion events into eight types according to the gene type. As expected, most fusion events (on average 85%) were between protein-coding genes in all samples (Fig. 1D). There were fewer fusion types in adjacent normal tissues than those in HCC samples, suggesting that more complex fusions were involved in HCC. Moreover, the proportions of distinct fusion types were different across all samples, and the proportion of protein-ncRNA fusion events in PI was higher than PII. Further, by analyzing the genome position of fusion genes, we found that more than 85% of fusion events were between two different chromosomes (Fig. 1C) and a few genes fused with more than one partner gene (Fig. S2).

Recurrent fusion events in HCC
To obtain recurrent fusions, we conducted the same analysis in four available HCC RNA sequencing datasets. In the public datasets, we obtained 24 960 fusion events in total, including 23 795 fusions in GSE65485, 1132 fusions in GSE55758, 40 fusions in GSE33294, and 92 fusions in SRP007560. In our own data, we obtained 43 gene fusions that occurred in at least two samples in our HCC samples or presented in one sample in our data and also occurred in at least one sample in the 79 public samples (Table S1). The remaining 2311 gene fusions occurred just once in our seven samples, with no events detected in the public data (Fig. 2). Among 2311 gene fusions, we found many kinase genes in fusion events in both HCC and adjacent non-tumor tissues. For example, MAP3K11 fusion was detected in 23.5% (4/17) of adjacent nontumor tissues. Both BRD4 and NRBP2 were detected in 25.8% (16/62) of HCC tissues. MET fusion was only detected in 4.8% (3/62) and 5.9% (1/17) of HCC tissues and adjacent non-tumor tissues, respectively (Fig. S3). We speculated whether recurrent gene fusions significantly presented in public samples. We thus grouped the 2354 fusions into two classes: 20 recurrent fusions present in at least two samples in the seven HCC samples, and 2334 fusions that occurred in one of seven samples. We found that 15 recurrent fusions presented in at least one of the 79 public samples and five recurrent fusions only were detected in at least two samples of the seven HCC samples (Table S1). Among the 2334 fusions, 23 fusions were detected in at least one of 79 public samples. We found that recurrent fusions were significantly supported by public datasets (Fisher's exact test, P-value < 2.2e-16), suggesting that recurrent fusions were likely functional.

Validating candidate recurrent fusion genes in clinical patients with HCC
We further employed RT-PCR and Sanger sequencing to validate the 43 recurrent fusion events (Table S6) in 11 pairs of HBV-related HCC samples combined with their corresponding adjacent non-tumor liver tissues and three multiple lesion patient samples (Patients II, A and B). Samples from Patient II included noncancerous liver and two distant HCCs located in the left and right lobes; from Patient A, noncancerous liver, tumor lobe and portal vein tumor thrombus; and from patient B, noncancerous liver and two distant HCCs. We successfully obtained primer sequences of 26 fusion genes (Table S7). Six fusion genes were validated to exist in clinical samples (Table 3, Figs 3 and S4-S7). The detailed sequences of these fusion genes are listed in Data S1. Five of the six validated fusion genes (except for IGLV4-69-IGLJ3) were detected in many clinical samples in both the tumor samples and adjacent noncancerous samples.
We further analyzed the relative gene expression in tumor samples and adjacent noncancerous samples using qRT-PCR (Figs 3D,H and S4-S7). Among the 26 candidate recurrent fusion genes, six fusions were confirmed by RT-PCR and Sanger sequencing (Figs 3A,B,E,F and S4-S7). Though some fusions were detected more frequently in tumor samples than in adjacent noncancerous samples, many fusions were frequently detected both in clinical tumor and adjacent noncancerous samples. For instance, the newly identified fusion RP11-476K15.1-CTD-2015H3.2 was detected in our HCC samples PI-P and PII-R. For the validated fusions that occurred in tumor and benign samples, relatively higher expression in tumor samples compared with noncancerous samples (Fig. 3A-C). RP11-476K15.1-CTD-2015H3.2 was identified in 71% (10/14) of patients, 29% (4/14) of noncancerous samples and 59% (10/17) of tumor samples (Fig. 3C). These findings suggest that RP11-476K15.1-CTD-2015H3.2 is a novel HCC-related fusion gene that may be a new therapeutic biomarker or therapy target. Another fusion C15orf57-CBX3 was detected and showed a considerable expression level in tumor samples and noncancerous samples, similar to another four fusions (Figs 3G,H and S4-S7). C15orf57-CBX3 was identified in 100% (14/14) of patients, 86% (12/ 14) of noncancerous samples and 76% (13/17) of tumor samples (Fig. 3G). We suspect that these clinical patients have had a history of HBV infection for several years, taking place in cirrhotic liver, which is not fully normal liver tissue. In the process of liver cirrhosis, the genome of liver tissue changes dramatically, resulting in fusion events.

Identification of candidate fusion events associated with HCC
We further used four known fusion databases including ChiTaRS (Gorohovski et al., 2017), ChimerDB (Lee et al., 2016), FusionCancer  and Mitelman (Mitelman et al., 2016) to examine whether the identified fusion events were associated with human diseases. Three  the researcher Sia et al. (2015) demonstrated that LOC9610-IGLJ3 fusion was associated with intrahepatic cholangiocarcinoma. C15orf57-CBX3, which was present in 18 public HCC samples and four normal liver samples (Table S2, Figs S8 and S10), was also associated with glioblastoma (Bao et al., 2014). Moreover, the C15orf57-CBX3 fusion was associated with cervical cancer, melanoma and Burkitt lymphoma in the ChiTaRS and FusionCancer database. Thus, the C15orf57-CBX3 fusion may be involved in the development of HCC.

Discussion
Detection and characterization of fusion genes has been critical in understanding tumorigenesis, anticancer drug screening and clinical application (Hessels and Schalken, 2013;Mertens et al., 2015;Stransky et al., 2014;Yoshihara et al., 2015). However, few fusion events are demonstrated recurrent events. Fortunately, with the development of high throughput sequencing technology as well as bioinformatics algorithms, large amounts of fusion events have been detected (Kim and Salzberg, 2011;Li et al., 2013). In the present report, we detected 2354 candidate fusion events and only 1.8% (43/2354) of these events were recurrent. Similarly, Yoshihara et al. (2015) and Stransky et al. (2014) reported large numbers of kinase gene fusions in 13 and 20 types of cancer, respectively. However, only 7.4 and 12% were recurrent events, respectively.
In addition to recurrent gene fusions involving known fusion genes or fusion events, we found 19 novel and recurrent fusions that have not been previously annotated to diseases (Table S6). For instance, the recurrent fusion DCUN1D3-GSG1L occurred in PI-P, PI-V and PI-M without any public sample support (Table S4). Overexpressing DCUN1D3 gene may promote mesenchymal to epithelial-like changes and inhibit colony formation in soft agar (Huang et al., 2014). GSG1L is a component of the inner core of the AMPAR complex, which modifies AMPA receptor gating. Furthermore, both fusion genes harbored the exact same breakpoint in three HCC samples (Fig. S9, Table S4). The breakpoint (chr16:20871370) in the third exon of DCUN1D3 is located in the DUF298conserved domain that binds to cullins and Rbx-1, components of an E3 ubiquitin ligase complex for neddylation. The protein structure is affected by the fusion. The breakpoint in the sixth exon of GSG1L (chr16:27802788) is located downstream of the protein's conserved domain without affecting any domains. Another novel recurrent fusion SERPINA5-SERPINA9 occurred in PI-N and PI-M, as well as in nine public HCC samples (Table S5, Fig. S9). SER-PINA5 is a serpin peptidase inhibitor with serine-type endopeptidase inhibitor activity. The breakpoint (chr14:95053889) in SERPINA5 and the breakpoint (chr14:94935978) in SERPINA9 were both located in conserved domains of each protein, suggesting that the domains are truncated in the fusion protein. The fusion can generate a chimeric protein that may be involved in the tumorigenesis of HCC and should be further validated. Thus, DCUN1D3-GSG1L and SER-PINA5-SERPINA9 may be involved in the development of HCC and should be examined.
Noncoding genes also play an important role in human disorders, human pluripotency and cancers (Guarnerio et al., 2016;Xu et al., 2017;Yu et al., 2018). However, to our best knowledge, up until now, few studies have focused on noncoding gene fusion events. Lau et al. (2014) reported that HBx-LINE fusion, which functions as an lncRNA, affected b-catenin transitivity and was involved in liver cancer development and progression. Qin et al. (2016Qin et al. ( ,2017 showed that SLC45A3-ELK4 and D2HGDH-GAL3ST2 regulate cancer cell proliferation and cell motility in prostate cancer. Dong et al. (2015) found that HCC patients carrying the HBV-MLL4 fusion have a distinct gene expression profile. In our analysis, we identified many fusion events involving noncoding genes. Notably, a higher percentage of noncoding gene fusion events were detected in the advanced HCC patient. This supports the idea that noncoding gene fusions play key roles in the progression of cancer.
We also detected many recurrent fusion events in adjacent normal tissue. For example, 27.9% (12/43) of fusion events were detected in only adjacent normal tissues and 30.2% (13/43) recurrent fusion events were observed in both HCC and adjacent normal tissues. Some were highly detected in adjacent normal tissue compared with the paired HCC tissue. Similarly, the TEL-AML1 fusion gene was reported to occur 100 times more frequently in normal individuals than in leukemia patients, and contributes to initiation of childhood ALL (Mori et al., 2002;Zelent et al., 2004). Thus, fusion events in adjacent normal tissue may serve as biomarkers of hepatitis disease progression into HCC and should be pursued in future experimental studies. Successfully applied drugs targeting BCR-ABL1 fusion in hematological malignancy ALK fusion in NSCLC, have dramatically ignited enthusiasm for deep exploration of the landscape of gene fusions (Mertens et al., 2015). Moreover, several drugs, such as imatinib and crizotinib, have been approved for targeting gene fusions in human malignant diseases. These success stories support the notion that fusion events represent promising anticancer targets. Our present result provides insight into the landscape of gene fusions in HCC and might pave the way for anti-HCC therapy.

Conclusions
In our study, we conducted analysis of RNA-seq data of 67 HCC tissues and 19 adjacent normal tissues to describe the fusion landscape of HCC. As a result, we identified 27 314 non-redundant fusion events. Among them, 43 recurrent fusions were identified. Except for protein-protein gene fusion, in our analysis we also found that a lot of noncoding sequences could participate in gene fusion. Finally, we validated six of the novel fusion events by RT-PCR and Sanger sequencing. Our study provides new insights into gene fusions in HCC and could contribute to the development of anti-HCC therapy. These findings may broaden our horizon about fusion events in HCC.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.        Table S1. The number of fusions with fusions occurring once and recurrent fusions supported by public HCC samples .  Table S2. The breakpoint and junction reads of C15orf57-CBX3 across all samples where it occurred .  Table S3. The breakpoint and junction reads of AP3D1-SLC6A8 across all samples where it occurred .  Table S4. The breakpoint and junction reads of DCUN1D3-GSG1L across all samples where it occurred .  Table S5. The breakpoint and junction reads of SERPINA5-SERPINA9 across all samples where is occurred. Table S6. The detail information of 43 candidate recurrent fusion genes. Table S7. The primer sequences for candidate recurrent fusion genes and internal control gene(GAPDH). Data S1. The sequences for validation recurrent fusion genes.