T-cell receptor profiling in cancer†
Abstract
Immunosequencing is a platform technology that allows the enumeration, specification and quantification of each and every B- and/or T-cell in any biologic sample of interest. Thus, it provides an assessment of the level and distribution of all the clonal lymphocytes in any sample, and allows “tracking” of a single clone or multiple clones of interest over time or from tissue to tissue within a given patient. It is based on bias-controlled multiplex PCR and high-throughput sequencing, and it is highly accurate, standardized, and sensitive. In this review, we provide evidence that immunosequencing is becoming an important analytic tool for the emerging field of immune-oncology, and describe several applications of this approach, including the assessment of residual disease post therapy in lymphoid malignancies, the prediction of response to immunotherapeutics of solid tumors containing tumor infiltrating lymphocytes, the identification of clonal responses in vaccination, infectious disease, bone marrow reconstitution, and autoimmunity, and the exploration of whether there are population-based stereotyped responses to certain exposures or interventions.
1 Introduction
For much of the past fifty years, cancer has been viewed as a cell-autonomous state: a cell becomes malignantly transformed through alteration of its genome, and then proliferates in a dysregulated fashion. While this inherent genetic basis of malignant transformation has been thoroughly demonstrated, it has also become increasingly clear that the clinical impact of this cellular transformation is greatly influenced by host–tumor interaction factors in the context of the microenvironment in which the transformed cell exists. Within that microenvironment are cells and proteins that provide structure, vascular supply and “housekeeping” functions, including components of the innate and adaptive immune systems. In this review we will focus on the adaptive immune response.
Defining the influence of the adaptive immune response in cancer has been complicated and controversial. Pre-clinical models from 40 years ago together with studies of cancer incidence in humans with various types of immunodeficiencies have suggested a role for immune surveillance in keeping cancers at bay. However, being able to demonstrate that effect clinically (particularly in patients whose tumors have already developed) has until recently been marked by futility and frustration. The question of how and why a cell that was derived from the same host tissue as every other cell in the body would even be viewed as “foreign” and therefore subject to immune attack was widely debated. During this period, however, researchers isolated T-cells from a patient's growing tumor, expanded those tumor-infiltrating lymphocytes (TILs) ex vivo, and then re-infused them into the same patient. This “brute force” cellular immunotherapy approach was successful in debulking large tumor masses, at least for certain types of cancers (Rosenberg and Restifo, 2015). As a more refined understanding of the checks and balances that modulate both systemic and localized immune responsiveness has emerged, new therapeutics that aim to control the immune response at the levels of antigenic recognition, cellular activation, and target killing have been developed (Sharma and Allison, 2015).
2 The advent of “immunosequencing”
With the cloning and characterization of the genes that encode the immunoglobulin (Ig) and T-cell receptor (TCR) loci, “probes” for the analysis of the immune system became available. Early studies using these probes focused on basic properties of the immune receptors, such as Southern-blotting experiments that demonstrated the sequential lymphocyte-specific rearrangements of immune receptor loci during lymphoid development, or those that showed the inherent clonality of lymphoid malignancies with regard to the unique immune receptor – at the DNA level- found in any given lymphocyte and all of its clonal progeny (Bertness et al., 1985; Korsmeyer et al., 1983). More recently, the advent of technological advances in polymerase chain reaction (PCR), DNA sequencing, and large database interrogation made possible the consideration and exploration of questions that had been previously unapproachable because of the limitations of sensitivity, accuracy, and statistical power of previous technologies.
The adaptive immune system generates a remarkable breadth of diversity in antigen-specific TCRs and Igs by combinatorial recombination of gene segments in lymphocytes. For example, the TCR is composed of two peptide chains, one encoded by the TCRA or TCRD genes, and the second encoded by the TCRB or TCRG genes, respectively. There are thus two types of T-cell receptors, αβ and γδ, that differ both by the TCR heterodimer type and their immune function, with the vast majority of T-cells carrying an αβ TCR. The existence of multiple V, D and J gene segments at these T-cell loci permits a large combinatorial diversity in receptor composition; while the non-templated insertion and/or deletion of nucleotides at the V-J, V-D, and D-J junctions further adds to the potential diversity of receptors that can be encoded. The antigenic specificity of T-cells is in large part determined by the amino acid sequence of the hypervariable complementarity-determining region 3 (CDR3) of the T-cell receptor (which, for most T-cells, consists of a heterodimer formed by an alpha (A) and a beta (B) chain). Because of the potential diversity of receptors, it is highly improbable to randomly converge on the same TCRA or TCRB nucleotide CDR3 sequence, effectively making each CDR3 sequence a unique tag for a T-cell clone. Similarly, B cells can be identified by their unique tags (see Section 3.2 below).
Immunosequencing is a multiplex PCR-based based method that amplifies rearranged CDR3 sequences for a given immune receptor locus (Figure 1), and exploits the capacity of high-throughput sequencing (HTS) technology to enumerate and quantify hundreds of thousands of CDR3 sequences simultaneously (Calis and Rosenberg, 2014; Robins, 2013). The technology can be applied to both cDNA and genomic DNA. When applied to genomic DNA, the frequency of each CDR3 sequence identified is highly representative of the relative frequency of each B- or T-cell containing that CDR3 sequence in the biologic sample. Thus, the immunosequencing assay captures both specific individual clones as well as the full repertoire. Given the capacity of HTS, this approach is extremely sensitive and only limited by the amount of DNA that is analyzed. Routinely, if one million cells' worth of DNA is analyzed, the assay can detect clones at a sensitivity that approaches 1:1,000,000. This is about 100-fold greater than other current detection methods for finding any given B- or T-cell within a complex cellular mixture (e.g., flow cytometry) and less subject to the occurrence of “false positive” and “false negative” results than TCR PCR (i.e., PCR analyses of TCR loci using stock commercial primers followed by electrophoresis and signal to noise determination based on size of the amplified sequence) (Carlson et al., 2013; Ladetto et al., 2014; Wu et al., 2014). Thus, this technology provides a highly accurate and standardized method for assessment of lymphoid clonality in healthy, diseased, or malignant tissues, and for identifying and tracking the presence and frequency of common and rare clones within the total adaptive immune system.
The data generated by the immunosequencing assay is a combination of receptor sequences and their frequencies. For each sample, DNA (or RNA) is extracted and the relevant immune receptor CDR3 region is amplified and sequenced. In brief, bias-controlled V and J gene primers are used to amplify rearranged V(D)J segments for high throughput sequencing at ∼10× coverage. Next, sequencing errors in the raw sequence data are corrected via a clustering algorithm, and the primary nucleotide sequence of the amplified regions from the immune receptors' unique CDR3 segment is determined, quantified, and annotated according to the International ImMunoGeneTics collaboration (Yousfi Monod et al., 2004), identifying which V, D, and J genes contributed to each rearrangement.
3 Defining an immune repertoire and its metrics
As previously stated, immunosequencing allows the enumeration, specification, and quantification of each and every B- and/or T-cell in any sample of interest. It provides information critical to appreciation of the four most important components of immune responsiveness: clonality, diversity, somatic allelic mutation, and tracking of any given clonal sequence or sequences between anatomic sites or over time in an individual in response to an internal or external “intervention”, or within populations that may share some common exposure or experience.
3.1 Clonality and diversity
Knowledge of the quality and quantity of a collection of lymphocytes from a diagnostic tissue sample can be a critical factor in addressing questions of health and disease. It can support an impression of normal immune diversity, development, or reconstitution. It can suggest inflammation, infection, vaccination, autoimmunity, or cancer. There are a number of analytic parameters that are used to assess the quality and quantity of a lymphoid infiltrate. Among these are metrics of diversity, richness, evenness, clonality, and entropy.
The diversity metric accounts for both “richness” and “evenness” components: while richness is a measurement of the number of different specificities in the sample (e.g., the number of T-cell clones with unique TCRs), evenness measures the relative abundance of these different specificities. Diversity can be measured in many ways; one of them uses Shannon's entropy (Shannon, 1948); in which higher diversity values indicate a more diverse distribution of the receptor sequences.
The richness metric can be readily estimated in a given sample by counting the number of unique antigen receptor sequences. The estimation of richness in a population for which a small subsample is observed (such as in the case of trying to estimate the total richness of the peripheral blood from a small aliquot) can be accomplished using rarefaction techniques, such as those based on solutions to the ‘unseen species problem’ in ecology (reviewed in (Bunge and Fitzpatrick, 1993)).
The evenness or relative abundance metric can be calculated in different manners, such as the gini index, originally developed to describe the inequality of a population's income (Gini, 1912) or the clonality metric, which is defined as 1- Pielou's evenness (another measurement derived from ecology (Pielou, 1966)). This conversion is intended to generate values that agree with intuition so that larger clonal expansions have a larger clonality score.
Finally, entropy corresponds to an information theoretic measurement of a probability distribution that captures a property of the distribution itself without the need of a particular model. The mathematics of this metric have been thoroughly described elsewhere. In terms of its application to the adaptive immune repertoire, each clonotype has an estimated probability of occurrence of its fraction in the repertoire of total adaptive immune cells.
3.2 Immunoglobulin-specific diversity mechanisms
Although the focus of this review is T-cell receptor profiling, we will briefly discuss two processes that happen during B-cell development. Heavy chain class-switch recombination (CSR) and somatic hypermutation (SHM) are unique to the immunoglobulin loci and further modify the Ig genes in B-cells. After the DNA breakage and rejoining event of V(D)J recombination described above, which generates the variable region of the immune receptors, the initial functional immunoglobulin heavy chain (IgH) locus that is expressed contains a μ constant region (C), so that the encoded antibody belongs the IgM class. However, read-through combined with alternative splicing results in the co-expression of IgD isotype heavy chains, which contain the same variable region followed by a δ constant region instead. Upon the generation of functional IgM/IgD immunoglobulin following Ig light chain (IgL) rearrangement and antigen encounter in secondary lymphoid organs, mature B-cells undergo CSR, in which a second DNA breakage and rejoining event results in a switch of the constant region of the heavy chain to the γ, ε, or α class, resulting in IgG, IgE or IgA class antibodies with the same antigen-specificities. In addition to this, as the immunoglobulin immune response progresses, SHM results in the accumulation of additional mutations in the expressed immunoglobulins. These mutations have the capability of increasing the affinity of the expressed immunoglobulin for its target antigen, providing an additional selective advantage to that somatically mutated clone. Immunosequencing can reveal these mutations by comparing these sequences to the corresponding germline gene segment sequences (Kleinstein et al., 2003; Uduman et al., 2014; Yaari et al., 2013).
3.3 Clone tracking
Having identified a particular sequence or sequences of interest through immunosequencing, the behavior of the associated clones can be tracked throughout the body, over time, or in different individuals. For example, a V(D)J sequence that arises in a lymphocyte that subsequently undergoes malignant transformation becomes a unique marker of that leukemia, lymphoma, or myeloma. Quantitative assessment of its frequency in response to therapeutic intervention can be of important prognostic significance to the patient undergoing treatment and the clinical management of that patient. The detection limit of the immunosequencing assay depends on the frequency of the clone in the blood and on the size of the blood draw, and therefore the limit of sensitivity of the test is essentially a function of the amount of genomic DNA (and cellular equivalents) analyzed. If one million cells worth of genomic DNA are assayed (approximately 6–7 μg), the limit of detection approaches one in a million. In CLIA-certified validation procedures, the limit of quantitation for an assay performed on one million cells worth of DNA is approximately 1/100,000.
For some malignancies, for example mycosis fungoides, a patient who presents with only skin patches or plaques can present a diagnostic dilemma (van Doorn et al., 2000). Demonstration of the same dominant TCR γ or TCR β clonal sequence in distinct anatomic sites from such patients can be an aid in making the diagnosis (Kirsch et al., 2015; Sufficool et al., 2015).
In tumors, identification of the sequence repertoire overlap between the systemic circulation and the tumor-infiltrating lymphocytic (TIL) population can provide information on the basic immunogenicity of the tumor as well as the likelihood of toxicity or response to an immunotherapeutic drug. The same type of comparative analysis when applied to different metastatic lesions can provide insight into tumor and tumor microenvironment heterogeneity (Emerson et al., 2013b). Also, specific sequences can be tracked over time in response to infection or vaccination. While the prediction of which specific immune receptor would be generated in response to a specific challenge is still in an exploratory phase (and may never be completely possible given the stochastic nature of V(D)J rearrangement and the enormous potential diversity of the sequences so created), progress has been made towards the identification of epitope-responsive T-cells post facto following infection, intervention, or tumor development (for example see (Gros et al., 2014; Klinger et al., 2013)). Tracking the movement of such clones from naïve to activated to memory T-cell compartments is also possible when combined with phenotypic characterization (DeWitt et al., 2015). There are some exposures that are now recognized to result in selection of identical or related TCR sequences at either the nucleotide or amino acid level. For example, a search for public TCR β sequences associated with chronic CMV infection can provide information on individual and population exposure to this virus (Emerson et al., 2015). More and more, potential public immune receptor domains for other viruses (e.g., EBV, HIV) or diseases (CLL, SLE, MS) are being searched for and in some cases, verified. At the cellular population level it has been possible to distinguish CD4+ from CD8+ T-cells based on variable gene usage and CDR3 length. While it is not possible to use the resulting algorithms to definitively categorize one or another T-cell subtype, the definition of the ratio of CD4+ to CD8+ T cells in a given sample has been achieved, using flow cytometry as a comparator (Emerson et al., 2013a). In addition, immunosequencing can be used in combination with flow cytometry to profile the repertoire of particular T-cell subtypes (e.g., see (Suessmuth et al., 2015)).
4 Specific applications of immunosequencing to immune-oncology
4.1 Simultaneous tracking of chimeric antigen receptor cells and residual leukemia
T-cells are being engineered to recognize specific targets of interest on a subset of malignancies (reviewed in (Maus et al., 2014)). Following infusion into patients, these engineered chimeric antigen receptor T cells (CART) bind to their targets, and kill the tumor cells in which the targets reside. The greatest success to date has come from T-cells engineered to recognize the CD19 antigen expressed on the malignant cells of patients with acute and chronic lymphocytic leukemia (Gill and June, 2015). The recognition function is derived from single chain variable domain fragments (scFv), in which the heavy and light chain variable regions of an anti-CD19 immunoglobulin are linked together on a single molecule. After transfection or infection of a polyclonal T-cell population with this molecule, along with DNA encoding a set of T-cell co-stimulatory factors, the resulting T-cells express the scFV specific for CD19. Many of the engineered vectors utilize murine antibody fragments as the recognition moiety, and fortuitously some of the groups performing this type of work have chosen murine sequences that cross-react with the set of primers used in the human IgH immunosequencing assay described above (Davila et al., 2014). By so doing, they have allowed for the determination of residual disease in patients receiving this type of therapeutic intervention, in addition to simultaneously generating information regarding the presence or absence – and general level- of the therapeutic T-cells that had been infused. This simultaneous “read-out” of residual disease and presence or absence of the therapeutic can be quite informative for clinical management (Figure 2).
In the past, one of the obstacles to the successful treatment of patients with this modality has been the relatively short half-life of the infused cells. New designs of these constructs are focused on increasing the durability and amplitude of the in vivo expansion of these infused T-cells, and immunosequencing can provide a means of titrating this effect. Furthermore, there is likely to be a modification of these CART-19s towards the use of scFv derived from the human immunoglobulin locus, in which case the measurement of the CARTs and the residual disease will both be robustly quantitative.
4.2 Exploration of the mechanism of action of immunomodulatory agents
With the advent of immunotherapy for cancer, over the past five years there has been an increasing focus on the identification of biomarkers that can be used to determine whether lymphoid infiltrates are correlated with prognosis, and whether the quality of those infiltrates can be somehow predictive of tumors that will respond to modulation of the resident tumor infiltrating lymphocyte population.
The TCR repertoire from circulating peripheral blood mononuclear cells has been profiled prior to and following administration of an anti-CTLA4 blocking antibody (Robert et al., 2014). In response to this therapeutic intervention, there was a marked increase in both the “richness” (number of unique TCRB sequences) of circulating T-cells and the diversity of the T-cell population (Figure 3). Interestingly, this increase appeared to be generalized, with no particular clone or subgroup of clones demonstrating a significantly greater increase than others. This observation suggests that clones that have been sequestered or “kept at bay” are somehow released by this therapeutic intervention. Of note, the degree of systemic toxicity associated with this form of therapy also correlated with increases in the richness and diversity metrics, suggesting that some of the clones being kept at bay are those that are capable of conferring more generalized inflammatory or autoimmune responsiveness.
While some immunomodulatory agents appear to mediate their effects systemically, the effect and “read-out” of others seems to occur more at the level of the actual tumor microenvironment. One such example are biologic agents that block the action of the PD-1/PDL-1 pathway. Before determining immune repertoire changes within a tumor bed, however, one must consider whether the repertoire is homogeneous or heterogeneous. In other words, can the repertoire present across the entire tumor be identified through the immunosequencing of a biopsy of a segment of a primary tumor or metastatic lesion? To analyze this situation, multiple independent samplings were obtained from metastatic lesions from patients with ovarian carcinoma, and separately subjected to immunosequencing of the TCRB locus (Emerson et al., 2013b). When each of the independent samples was compared to a centrally located biopsy specimen, the overlap among all of the samples was, on average, 75%, whereas the overlap between the central biopsy and the primary tumor was 57% (Figure 4). This matched the level of overlap obtained by two independent assessments of the same sample (which does not reach 100% due to under-sampling, leading to the lack of observation of T-cell clones that are present at a very small copy number, e.g., 1–3). Thus, the difference among samples across the lesion was essentially the same as the sampling error inherent in the procedure. In conclusion, this study showed that while the number of T-cells present at various sites across the lesion may vary, the fundamental repertoire appeared to be the same across the metastatic tumor. Moreover, the repertoire from the metastatic site was somewhat distinct from that from the primary tumor tissue, and quite distinct from that from that found in the systemic circulation (which only displayed a 19% overlap with the central punch biopsy from the metastatic site). Thus, at least for certain tumors, it is possible to use a single biopsy sample to define the immune repertoire of tumor-infiltrating lymphocytes in a tumor specimen.
The analysis of tumor-infiltrating lymphocytes in the tumors of patients with melanoma has been an important precedent in the development of the field of immunotherapy. In one notable study biopsies of skin lesions from patients with metastatic melanoma were obtained and subjected to TCRB immunosequencing analysis before treatment with anti-PD1 blocking monoclonal antibody (Tumeh et al., 2014). Patients whose tumors had the highest number of T-cells and the more clonal T-cell repertoire were most likely to respond to this therapy. Conversely, all of those patients whose total T-cell number and clonality measure fell below the median for each of these parameters had progressive disease. Moreover, biopsies obtained more than three weeks following the initiation of the anti-PD-1 therapy showed that patients whose tumors showed significant expansion of pre-existing T-cell clones in response to the therapy were most likely to have demonstrated a clinical response (Figure 5).
With the recent success of immunotherapy for some patients with some types of cancer there have been additional efforts to increase the population of patients with cancer who might benefit from an immunotherapeutic intervention. Attention has been focused on combining immunotherapeutics that target different aspects of the immunomodulatory pathway in order to achieve synergistic T-cell activation (while hopefully not equally increasing the generalized toxic effects). In addition, work is being done to assess whether the immunogenicity of tumors can be increased by the use of chemotherapeutic, molecularly targeted, biologic, or radiotherapeutic intervention. In one early exploration of this possibility, patients with melanoma where the tumor cells carried the BRAF V600E mutation were treated with a small molecule targeted inhibitor of BRAF (Cooper et al., 2013). It was noted that the T-cell clonality increased in tumors following treatment with the BRAF inhibitor, and that patients who had the most marked increase of those TILs that pre-existed prior to treatment were those most likely to have a response to the therapy.
5 Future directions
The aspirational goal of the immunosequencing approach is to be able to move from a primary nucleotide sequence to the target to which that T-cell is directed, or in other words, to link specific Ig or TCR protein sequences to their target epitope. One clear application of this method would therefore be the identification of tumor-antigen specific T cells, thus providing a focus for an anti-cancer therapeutic. Early, labor-intensive approaches to this goal are building a foundation for this possibility. For example, it has been possible to enrich for TILs that are activated by specific neoantigens (Schumacher and Schreiber, 2015) that are generated by the genetic changes occurring in melanoma tumors, and to identify the specific TCR sequences of the TILs that bind to them (Gros et al., 2014).
Profiling the immune receptor repertoire provides not only a record of creation of immune responsive cells, but also a record of selection of subsets of those cells presumably in response to the stimulatory drivers of antigenic exposure or neoplastic transformation. It is a reasonable assumption that any B- or T-cell that is found in more than one copy in a biologic sample of interest has undergone some level of selection, expansion, or hyperproliferation. If it were possible to relate a primary sequence of such a cell to the target recognized by the Ig or TCR that it carries, an “exposure history” of an individual could begin to be developed. Refined definition of this “history” may require significant input from structural biological modeling approaches. However, some early findings based only on sample selection and immunosequencing are raising hopes for the ultimate success of this endeavor. For example, a study of over 600 bone marrow transplant donors has allowed for the categorization of this population by either HLA type or status vis-à-vis chronic CMV infection, based solely on the analysis of the TCRB repertoire overlap among individuals in the cohort (Emerson et al., 2015).
With regard to the identification of cancer specific targets of TILs, as noted above (Gros et al., 2014), the subgroup of TILs most tumor-reactive and clonally expanded within a given patient's tumor were selected based on cell surface phenotype. These clones then had their TCRB loci sequenced and the reactivity of each unique clone to “neoantigens” formed by tumor specific genomic mutations was then assessed. In so doing a connection was made between the primary TCRB sequence and, at least in some cases, its presumed tumor target.
So far the studies just described have been completed by focusing on only one locus that encodes one chain of the heterodimeric proteins that make up the TCRs and Igs. It is likely that in many cases information derived from both chains of the receptor, combined with insight into how they cooperate to form the antigen recognition and binding site will be required for a more refined definition of receptor–target relationship. To date most attempts to define both partners of a dimeric TCR or Ig have been relatively low throughput and utilized more arduous techniques of single cell subcloning or emulsion-based “bridge amplification”. Recently, higher throughput methods have been explored, including an in situ combinatorial approach that has the potential to generate hundreds of thousands of paired sequences in a single experiment (Howie et al., 2015).