Identification of lncRNAs associated with early‐stage breast cancer and their prognostic implications

Breast cancer is the most common malignancy among women, with the highest incidence rate worldwide. Dysregulation of long noncoding RNAs during the preliminary stages of breast carcinogenesis is poorly understood. In this study, we performed RNA sequencing to identify long noncoding RNA expression profiles associated with early‐stage breast cancer. RNA sequencing was performed on six invasive ductal carcinoma (IDC) tissues along with paired normal tissue samples, seven ductal carcinoma in situ tissues, and five apparently normal breast tissues. We identified 375 differentially expressed lncRNAs (DElncRNAs) in IDC tissues compared to paired normal tissues. Antisense transcripts (~ 58%) were the largest subtype among DElncRNAs. About 20% of the 375 DElncRNAs were supported by typical split readings leveraging their detection confidence. Validation was performed in n = 52 IDC and paired normal tissue by qRT‐PCR for the identified targets (ADAMTS9‐AS2, EPB41L4A‐AS1, WDFY3‐AS2, RP11‐295M3.4, RP11‐161M6.2, RP11‐490M8.1, CTB‐92J24.3, and FAM83H‐AS1). We evaluated the prognostic significance of DElncRNAs based on TCGA datasets and report that overexpression of FAM83H‐AS1 was associated with patient poor survival. We confirmed that the downregulation of ADAMTS9‐AS2 in breast cancer was due to promoter hypermethylation through in vitro silencing experiments and pyrosequencing.

Breast cancer is the most common malignancy among women, with the highest incidence rate worldwide. Dysregulation of long noncoding RNAs during the preliminary stages of breast carcinogenesis is poorly understood. In this study, we performed RNA sequencing to identify long noncoding RNA expression profiles associated with early-stage breast cancer. RNA sequencing was performed on six invasive ductal carcinoma (IDC) tissues along with paired normal tissue samples, seven ductal carcinoma in situ tissues, and five apparently normal breast tissues. We identified 375 differentially expressed lncRNAs (DElncRNAs) in IDC tissues compared to paired normal tissues. Antisense transcripts (~58%) were the largest subtype among DElncRNAs. About 20% of the 375 DElncRNAs were supported by typical split readings leveraging their detection confidence. Validation was performed in n = 52 IDC and paired normal tissue by qRT-PCR for the identified targets (ADAMTS9-AS2, EPB41L4A-AS1, WDFY3-AS2, RP11-295M3.4, RP11-161M6.2, RP11-490M8.1, CTB-92J24.3, and FAM83H-AS1). We evaluated the prognostic significance of DElncRNAs based on TCGA datasets and report that overexpression of FAM83H-AS1 was associated with patient poor survival. We confirmed that the downregulation of ADAMTS9-AS2 in breast cancer was due to promoter hypermethylation through in vitro silencing experiments and pyrosequencing.

Introduction
Breast cancer is the most common cancer among women (ASR-43.1) with highest mortality rates (Ferlay et al., 2013). Breast cancer is broadly classified into noninvasive ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Understanding the mechanism of breast carcinogenesis at genetic and transcriptional level can aid in characterization of DCIS or early-stage IDC tumors. Gene expression signatures are used to classify IDC subtypes of hormone receptor-positive (estrogen and progesterone receptors), that is, luminal A and B, and hormone receptor-negative HER2 and basal-like (Perou et al., 2000;Sorlie et al., 2001) breast cancer subtypes. Next-generation sequencing has enabled global profiling of mRNAs and noncoding RNAs (ncRNAs) including long ncRNAs (lncRNAs) and microRNAs. lncRNAs have gained immense importance in gene regulation and are known to play an important role in cancer development and prognosis (Huarte, 2015;Prensner and Chinnaiyan, 2011;Rao et al., 2017). Understanding the divergent expression of lncRNAs in early-stage breast tumors can help elucidate its functional role in carcinogenesis.
Specific lncRNAs signatures are known to be associated with different molecular subtypes of breast cancer. DSCAM-AS1 was identified specifically in ER-positive breast tumors and shown to increase aggression and drug resistance (Niknafs et al., 2016). Similarly, AFAP1-AS1 was predominantly found to be dysregulated in HER2 and triple-negative breast cancers (TNBC) (Shen et al., 2015;Yang et al., 2016a). H19 was identified to be overexpressed in ER/PR-positive breast adenomas, and BC200 was implicated to be distinctly elevated in benign tumors and not in invasive subtypes and hence is of prognostic significance (Adriaenssens et al., 1998;Iacoangeli et al., 2004). HOTAIR was demonstrated to gain activity in BRCA1-mutated tumors. In a normal cell, BRCA1 competes with HOTAIR in binding to EZH2 of PRC2 (Wang et al., 2013). The functional characteristics of certain lncRNAs, such as UCA1, GAS5, and XIST, have established them as breast cancer-associated tumor suppressors, while HOTAIR, TINCR, and DSCAM-AS1 are known as oncogenic lncRNAs (Wang et al., 2017;Xu et al., 2017). Support vector machine-based prediction of breast cancer intrinsic subtype using lncRNA expression profile and PAM50 gene signature using TCGA datasets was recently proposed as an improved prediction model (Zhang et al., 2018).
Despite known association of lncRNA expression with molecular subtype, recently reported lncRNAs have emerging role in relevant signaling or druggable pathways. lncRNA CYTOR was reported to be associated with breast cancer progression through EGFR signaling pathway (Van Grembergen et al., 2016). NKILA was observed to promote heterotrimeric complex formation (p50/p60/IjB) and inhibit IjB phosphorylation, thus regulating NF-jB signaling . LINK-A was reported to aid in stabilizing HIF1a in normoxic conditions of TNBC. Through BRK/PTK6 activation and phosphorylation of HIF1a, LINK-A substantiates its kinase activation and cancer signaling potential (Lin et al., 2016). Alternatively, breast cancer-associated lncRNAs important in drug targeting pathways can also be useful prognostic biomarkers. In the present study, we have done RNA sequencing in early-stage tumors (stage I-IIA IDC, DCIS) and noncancerous breast tissue samples to identify lncRNAs that play a role in early-stage breast cancer. We speculate that aberrant expression of lncRNAs could be an early event in breast cancer development, and hence, the study was aimed to identify dysregulated lncRNAs and the mechanism of dysregulation in breast cancer.

Study population and sample classification
The study cohort includes patients diagnosed and treated for breast cancer at Cancer Institute (WIA), Chennai, Tamil Nadu, India. These patients were histologically confirmed of invasive ductal carcinoma (IDC-stage I-II A) and DCIS. Apparently, normal breast tissues were obtained from patients undergoing surgery for breast conditions other than malignancy. Samples having > 70% for cancer cells following histopathological examination were included in the study. Paired normal and apparently normal tissues completely free of tumor cells were selected and kept frozen (À80°C) until further processing. Total RNA sequencing was done for 24 samples, that is, tumor (n = 6), paired normal (matched normal; n = 6), DCIS (n = 7), and apparently normal (n = 5). Validation cohort of IDC (n = 52) and corresponding paired normal tissue were used to gauge candidate lncRNAs. The clinicopathological features of patients in the discovery and validation cohort are detailed in Table S1. All patients were informed about the study, and their written consent for participation was obtained. The Institutional Ethical Committee approved the study and the protocol duly conforming the guidelines set by the Declaration of Helsinki.

RNA isolation and library preparation
Total RNA was isolated from frozen tissues using TRIzol method and purification by NucleoSpin RNA Isolation Kit (Macherey-Nagel, D€ uren, Germany), which includes an on-column DNase treatment. The quality and quantity of total RNA was evaluated through 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Ribosomal RNA was depleted (EpiGentek, USA), and cDNA library was prepared using Illumina TruSeq Stranded Total RNA Library Prep Kit. The library profile was verified using 2100 Bioanalyzer (Agilent Technologies). Subsequent RNA sequencing of cDNA libraries with paired-end reads (2 9 100 bps reads) was performed according to the standard Illumina protocol using HiSeq 2500 sequencing platform.

RNA sequencing and data analysis
Raw reads were assessed for Phred quality using FASTQC (Andrews, 2010), and low bases and adaptor sequences were trimmed off using Fqtrim (Pertea, 2015) retaining reads of length ≥ 75 bases. Clean reads were aligned against human reference genome (GRCh38 assembly) with Gencode V24 annotation using HISAT2 (Baruzzo et al., 2017) with default parameters. Exon centric read counts were obtained from binary alignment map using HTSeq (Anders et al., 2015) using the script 'htseq count' for all samples independently. lncRNAs identified with ≥ 15 reads in at least three samples per cohort, that is, IDC, paired normal, DCIS, and apparent normal were further investigated for differential expression using DESeq (Anders and Huber, 2010). Read counts obtained from HTSeq were normalized using 'estimateSizeFactors' variance and were modeled using 'estimateDispersions'. The differentially expressed genes were computed using 'nbinomTest' functions of DESeq. Significant differential expression was defined if |log 2 (fold-change)| > 1 and qvalue (Bonferroni and Benjamini-Hochberg adjusted P value) < 0.1. Expression profile of lncRNAs from TCGA breast cancer dataset (TCGA-BRCA; n = 837 invasive tumors and n = 105 normal samples) was used for survival analysis . Kaplan-Meier plots for differentially expressed lncRNAs (DElncRNAs) were generated for tumor stages as well as molecular subtypes and evaluated using log-rank test.

lncRNA-mRNA co-expression network analysis
Pearson 0 s correlation coefficient (PCC) was used to determine linear correlation between mRNA and lncRNA expression profiles using R package. DElncRNAs DElncRNAs -mRNA pairs with |PCC| ≥ 0.9 were considered for network analysis using STRING v10 (Szklarczyk et al., 2015) with organism 'Human' as backend database and Cytoscape (Shannon et al., 2003).

Real-time quantitative PCR
Total RNA of 500 ng was used for preparing cDNA libraries using QuantiTect Reverse Transcription Kit (Qiagen, Hilden, Germany). Gene expression was estimated by QuantStudio 12K Flex Real-Time PCR System using TaqMan TM gene expression assays (Applied Biosystems, Thermo Fisher Scientific, Waltham, Massachusetts, USA) containing primers and probes specific for lncRNA and GAPDH. The expression values were calculated using the 2 DC t method (DC t = DC t target gene-DC t reference gene).

siRNA-mediated knock-down of DNMT1
Expression of ADAMTS9-AS2 was evaluated in MDAMB-231 and MCF7 cells. The cells were cultured in Dulbecco's modified Eagle's medium with 10% FBS at 37°C. Knock-down was carried out using Lipofectamine 3000 (Life Technologies, Carlsbad, CA, USA), siRNA targeting DNMT1 (Ambion, USA) with cells maintained in Opti-MEM (Life Technologies) during and after transfection. Transfected cells were collected after 48 and 72 h for total RNA and DNA isolation.

Statistical analyses
GRAPHPAD PRISM7 (GraphPad Prism Software Inc., San Diego, CA, USA) was used for evaluating qRT-PCR gene expression data. Student's t-test was used for pairwise analysis of tumor and paired normal samples. Welch correction was done if significant difference in variance was observed and Wilcoxon rank sum test was applied whenever non-Gaussian distribution was followed.

Results
3.1. Expression profile of lncRNAs in ductal carcinoma in situ and invasive ductal carcinoma RNA sequencing resulted in generation of~89 million reads per sample with~87.24% alignment against human genome build Hg38. We identified~2 689 lncRNAs and~18 132 mRNAs with ≥ 15 reads in at least three samples per cohort (Table 1, Table S2). In agreement with previous reports, lncRNAs were expressed at comparatively lower levels than mRNAs  Table S3-S6).

Prognostic lncRNAs in early-stage breast cancer
Survival analysis was done to investigate the prognostic potential of candidate lncRNA using TCGA datasets. We observed FAM83H-AS1 was significantly overexpressed by~4-fold in TN, TA, as well as DA pairs and its overexpression is associated with overall poor survival in luminal A, ER-positive tumors, stage 3 datasets, and overall breast tumor datasets irrespective of subtypes ( Fig. 5A-D). Overexpression of WDFY3-AS2 in luminal A, ER-positive tumors, and breast tumor datasets irrespective of subtypes (Fig. 5E, F,H) is significantly associated with adverse outcomes, whereas downregulation of RP11-161M6.2 in breast cancer and CTB-92J24.3 in stage 3 was observed significantly associated with poor overall survival (Fig. 5K). We observed significant association with overexpression of WDFY3-AS2 (Fig. 5G) and downregulation of RP11-161M6.2 in stage 2 of breast cancer based on TANRIC analysis indicating them as potential early prognostic markers (Fig. 5G,J).

Co-expression and pathway analysis
Guilt-by-association method was employed to speculate the putative functions of lncRNAs. This approach investigates the association of mRNA expression patterns with lncRNAs using Pearson's correlation analysis. A correlation analysis between DElncRNA-DEmRNA pairs was done, and only those with PCC ≥ |0.9| were considered significantly co-expressed. The co-expressed pairs were filtered for lncRNA with typical junctional read evidence which led to the identification of 2398 pairs consisting of 78 lncRNA and 1097 mRNA between IDC and paired normal samples and 385 pairs consisting of 24 lncRNA and 245 mRNA between IDC and apparent normal samples. Similarly, 26 pairs were co-expressed in DCIS vs. apparent normal samples consisting of 11 lncRNA and 26 mRNA and 10 co-expressed lncRNA-mRNA pairs in IDC compared to DCIS representing three lncRNA and 10 mRNA (Tables S7-S10 Co-expressed mRNAs were further analyzed using StringDB for network analysis. To augment guilt-byassociation concept, we further focused on mRNA network that is reported to co-express irrespective of lncRNA. We observe that partial sets of mRNAs from 22 DElncRNAs in IDC compared to paired normal samples were co-expressed according to StringDB analysis. After removing disconnected nodes and filtering high confidence nodes from the network, genes co-expressed with RP11-142C4.6 ( Fig. S5A) were found enriched for extracellular regions (red nodes) and overrepresented for extracellular matrix organization (green nodes) and disassembly (blue nodes) whereas genes co-expressed with RAMP2-AS1 were enriched on the cell membrane (red nodes; Fig. S5A, B). Genes co-expressed with RP11-701H24.4 were enriched for integral component of membrane (green nodes) and activation of cellular processes (blue nodes; Fig. S5C). In case of PSMB8-AS1, we observed overrepresentation of immune response and (red nodes) involved in type I interferon signaling pathway (blue nodes; Fig. S5D). We observed enrichment of biological process such as cell division (yellow nodes), cell cycle process (pink nodes), and microtubule cytoskeleton (red nodes) in genes positively coexpressed with TINCR and negatively co-expressed with LINC01359 (Figs S6 and S7). Interestingly, most genes co-expressed with PSMB8-AS1, TINCR, and LINC01359 are also known to co-express with each other according to StringDB. Using Cytoscape, we were able to segregate the subnetwork of 76 genes potentially governed jointly by TINCR (65 genes) and LINC01359 (55 genes), which resulted in submodules of genes with core histone protein domains (green nodes) and involved in pathways in cancer (blue nodes).

Discussion
Aberrant expression of lncRNAs is documented in various cancers (Huarte, 2015;Prensner and Chinnaiyan, 2011). In recent years, lncRNAs have gained importance in early detection and better prognosis of tumors (Chandra Gupta and Nandan Tripathi, 2017).
Although several lncRNAs associated with breast cancer have been reported previously, studying aberrantly expressed lncRNAs specific to early-stage breast cancer will provide insight into molecular mechanisms associated with breast cancer development. It will also result in identification of putative markers that might be useful in diagnosis or prognosis of breast cancer. Previous studies have associated altered expression of lncRNAs with specific breast cancer subtypes. For example, HOTAIR is a lncRNA that is highly expressed in HER2+ breast cancers whereas HOTAIRM1 is highly expressed in basal-like subgroup of breast cancers (Su et al., 2014). Luminal A types showed overexpression of LINC00160, and abundance of DSCAM-AS1 was reported in luminal B subtypes of breast cancer (Jonsson et al., 2015;Vu et al., 2016). MALAT, lncRNA-ATB, BC200, XIST, and H19 are some of other lncRNAs frequently associated with breast tumorigenesis and progression (Hansji et al., 2014). Functionally important lncRNAs in early-stage breast cancers are less reported. Our study evaluated the landscape of lncRNA expression in early-stage breast cancer [IDC (stage I-IIA) and DCIS breast tissues] to identify aberrantly expressed lncRNAs.   L   APN719  APN768  APN87  APN91311  APN93277   1123N  138N  148N  204N  272N  690N  1086DCIS  1102DCIS  1151DCIS  299DCIS  66DCIS  712DCIS  803DCIS  1123T  138T  148T  204T  272T

M
The DESeq analysis resulted in identification of 375 DElncRNAs in IDC compared to paired normal samples and 94 DElncRNAs in IDC compared to apparent normal samples. The analysis also identified 69 DElncRNAs in DCIS compared to apparent normal samples. We identified several antisense lncRNAs including ADAMTS9-AS2, EPB41L4A-AS1, WDFY3-AS2, FAM83H-AS1, ST8SIA6-AS1, CTB-92J24.3, and CTB-131K11.1 that were aberrantly expressed. Twelve candidate lncRNAs that showed significant differential expression were further validated in 52 paired tumor and normal breast samples. We observed significant downregulation of ADAMTS9-AS2, WDFY3-AS2, RP11-295M3.4, RP11-490M8.1, and CTB-92J24.3 and significant overexpression of FAM83H-AS1 in breast cancer. We found ADAMTS9-AS2 to be significantly downregulated in tumor compared to paired normal breast tissues. ADAMTS9-AS2 is an antisense transcript originating from the opposite stand coding for ADAMTS9 which is a known inhibitor of angiogenesis and is implicated to have a tumor-suppressive role. Functional importance of ADAMTS9 in nasopharyngeal and esophageal cancers has been described (Lo et al., 2010). ADAMTS9-AS2 like ADAMTS9 is downregulated in glioblastoma (Yao et al., 2014), colorectal cancer , bladder cancer, lung adenocarcinoma, and ER+ breast cancers (Li et al., 2017). Yao et al. have shown that promoter methylation regulates ADAMTS9-AS2 expression by knocking down DNMT1 in glioma cells. We found that methylation of ADAMTS9-AS2 controls its expression through correlative DNMT1 knock-down in MDAMB-231 and MCF7 cells. Similar results were observed when methylation levels at ADAMTS9-AS2 promoter were compared between tumors and paired normal tissues using pyrosequencing. We observed DNA methylation-mediated loss of ADAMTS9-AS expression in stage I breast cancer. Among other downregulated lncRNAs, WDFY3-AS2 has recently been reported with TGF-B-induced EMT of breast cancer cells through hnRNP-R modulated positive regulation of STAT3 and WDFY3 (Richards et al., 2016). Downregulation of WDFY-AS2 was found in diffuse glioma and strongly associated with poor prognosis (Wu et al., 2018). EPB41L4A-AS1 (also known as TIGA1) has been shown to be transcribed during growth arrest but has not been extensively studied in cancer to elucidate its role (Yabuta et al., 2006). RP11-161M6.2 was found to be overexpressed in ER/ PRÀnegative and HER2À positive breast cancers in our samples. The finding indicates an association of RP11-161M6.2 and estrogen receptor and is possibly downregulated in estrogen-mediated signaling. Similarly, MIAT was dominantly expressed in ER/PR/ HER2+ breast cancers samples. FAM83H-AS1 was consistently overexpressed in breast tumor samples and overall survival, analysis of TCGA datasets showed poor prognosis of the upregulated group which are in agreement with other studies in breast, colorectal, and lung cancers (Lu et al., 2018;Yang et al., 2016a,c;Zhang et al., 2017). Functional studies have demonstrated proliferative potential of FAM83H-AS1 through MET/EGFR signaling in lung adenocarcinoma and NOTCH1 signaling pathway in colorectal cancer. Overexpression of FAM83H-AS1 in luminal-type breast cancer was associated with good prognosis in patients (Yang et al., 2016a). Detection of FAM83H-AS1 expression levels in plasma could be a potential diagnostic and prognostic biomarker for breast cancer.

Conclusion
In summary, this study has shed light on novel lncRNA and substantiated several previous findings on lncRNA involved in early-stage breast cancers. We report 375 and 94 lncRNA differentially expressed in tumor samples compared to paired and apparent normal samples, respectively, and 69 DElncRNAs in DCIS compared to apparent normal samples. Seven downregulated and five upregulated lncRNA were further validated to discover significant lncRNA candidate with potential role in breast carcinogenesis. ADAMTS9-AS2 was one of the lncRNAs consistently downregulated in patient samples, and experimental evidence proved promoter methylation as major cause

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Expression pattern of lncRNAs and protein coding genes in various pathological subtype and comparison of DElncRNAs in different groups. (A) Comparative histogram represents relatively lower expression of lncRNAs (blue bars) compared to protein coding genes (grey bars) based on raw read count profile of apparent normal samples (n = 5) (B) Comparative histogram represents relatively lower expression of lncRNAs (green bars) compared to protein coding genes (grey bars) based on raw read count profile of paired normal samples (n = 6) (C) Comparative histogram represents relatively lower expression of lncRNAs (yellow bars) compared to protein coding genes (grey bars) based on raw read count profile of DCIS samples (n = 7) (D) Comparative histogram represents relatively lower expression of lncRNAs (red bars) compared to protein coding genes (grey bars) based on raw read count profile of IDC samples (n = 6) (E) PCA using normalized read counts of protein coding genes. Color legend.   Fig. S6. High confidence interaction network (score: 0.7) representing differentially expressed mRNA that are known to co-express with each other as per String analysis and with lncRNA with Pearson correlation coefficient ≥ 0.9 with TINCR. Fig. S7. High confidence interaction network (score: 0.7) representing differentially expressed mRNA that are known to co-express with each other as per String analysis and with lncRNA with Pearson correlation coefficient ≤ À0.9 with LINC01359. Table S1. List of clinicopathological features of patients' tissue samples used in discovery and validation cohort in the study. Table S2. Read alignment statistics and number of genes identified in different samples. Table S3. Complete list of DElncRNAs identified to be differentially expressed in IDC (T) vs. paired normal (N) samples with adjusted P-values < 0.1 in this study along with normalized read counts from individual samples. Table S4. Complete list of DElncRNAs identified to be differentially expressed in IDC (T) vs. apparent normal (APN) with adjusted P-values < 0.1 in this study along with normalized read counts from individual samples. Table S5. Complete list of DElncRNAs identified to be differentially expressed in DCIS vs. apparent normal (APN) with adjusted P-values < 0.1 in this study along with normalized read counts from individual samples. Table S6. Complete list of DElncRNAs identified to be differentially expressed in IDC (T) vs. DCIS with adjusted P-values < 0.1 in this study along with normalized read counts from individual samples. Table S7. Complete list of dysregulated mRNA co-expressed with dysregulated lncRNAs supported by split reads in IDC vs. paired normal with Pearson correlation coefficient (PCC) ≥ 0.9. Table S8. Complete list of dysregulated mRNA co-expressed with dysregulated lncRNAs supported by split reads in IDC vs. apparent normal with Pearson correlation coefficient (PCC) ≥ 0.9. Table S9. Complete list of dysregulated mRNA co-expressed with dysregulated lncRNAs supported by split reads in DCIS vs. apparent normal with Pearson correlation coefficient (PCC) ≥ 0.9. Table S10. Complete list of dysregulated mRNA co-expressed with dysregulated lncRNAs supported by split reads in IDC vs. DCIS with Pearson correlation coefficient (PCC) ≥ 0.9. Table S11. List of gene expression assays