Next‐generation clinical trials: Novel strategies to address the challenge of tumor molecular heterogeneity

The promise of ‘personalized cancer care’ with therapies toward specific molecular aberrations has potential to improve outcomes. However, there is recognized heterogeneity within any given tumor‐type from patient to patient (inter‐patient heterogeneity), and within an individual (intra‐patient heterogeneity) as demonstrated by molecular evolution through space (primary tumor to metastasis) and time (after therapy). These issues have become hurdles to advancing cancer treatment outcomes with novel molecularly targeted agents. Classic trial design paradigms are challenged by heterogeneity, as they are unable to test targeted therapeutics against low frequency genomic ‘oncogenic driver’ aberrations with adequate power. Usual accrual difficulties to clinical trials are exacerbated by low frequencies of any given molecular driver. To address these challenges, there is need for innovative clinical trial designs and strategies implementing novel diagnostic biomarker technologies to account for inter‐patient molecular diversity and scarce tissue for analysis. Importantly, there is also need for pre‐defined treatment priority algorithms given numerous aberrations commonly observed within any one individual sample. Access to multiple available therapeutic agents simultaneously is crucial. Finally intra‐patient heterogeneity through time may be addressed by serial biomarker assessment at the time of tumor progression. This report discusses various ‘next‐generation’ biomarker‐driven trial designs and their potentials and limitations to tackle these recognized molecular heterogeneity challenges. Regulatory hurdles, with respect to drug and companion diagnostic development and approval, are considered. Focus is on the ‘Expansion Platform Design Types I and II’, the latter demonstrated with a first example, ‘PANGEA: Personalized Anti‐Neoplastics for Gastro‐Esophageal Adenocarcinoma’. Applying integral medium‐throughput genomic and proteomic assays along with a practical biomarker assessment and treatment algorithm, ‘PANGEA’ attempts to address the problem of heterogeneity towards successful implementation of molecularly targeted therapies.


Targeted therapies
Clinical outcomes have significantly improved for most cancers since the introduction of classic cytotoxic agents. Cytotoxic agents can be considered 'targeted' in that they inhibit DNA synthesis and the cell division apparatus e the 'bottleneck' steps required for cancer to manifest with morbidity and mortality. (Joensuu, 2008) Some stage IV solid tumors, such as testicular cancer, even achieve long term survival with this strategy alone, while in general most advanced solid tumors derive significant palliative benefit for an increased, albeit finite, period of time. Ultimately, solid metastatic tumors develop resistance to cytotoxics, and patients succumb to their illness. A 'benefit plateau' has been reached with these cytotoxics. Off-target 'collateral damage' of normal tissues is a well-recognized potential disadvantage of cytotoxics, necessitating a delicate balance between optimizing tumor control and limiting toxicity.
Genetic aberrations identified within various tumor types, including gene mutation, gene rearrangement, and gene amplification/deletion, led to an understanding of constitutive activation of oncogenes, or loss of function of tumor suppressors, all contributing to a sequential genomic carcinogenesis model. (Fearon and Vogelstein, 1990) The ensuing concept of an 'oncogenic driver' and 'oncogene addiction' ultimately shifted the course of therapeutics development; (Weinstein and Joe, 2008;Weinstein, 2002;Vogelstein et al., 2013) the era of targeted therapies towards a putative 'Achilles heel' was born. (Dancey et al., 2012) In addition to genomic events, abnormalities of protein expression not directly a consequence of a genomic event (ie. abnormally increased protein expression in the absence of mutation, amplification, or translocation of that protein's gene) also received attention for therapeutic potential, as did key signaling 'nodes' within critical oncogenic growth and metastasis pathways. (Slamon et al., 1984;Harris et al., 1994;Islam et al., 2013;Bianco et al., 2006) Following this, pharmaceutical agents directly inhibiting the function of a 'culprit' protein could be engineered with high selectivity. (Lengauer et al., 2005) Thus, theoretically, these agents would inhibit only cancer cells possessing the dysfunctional (over-activated or over-expressed) protein, while sparing normal cells, consequently magnifying the therapeutic window. Attention to essential stromal components of tumors including immune cells, fibroblasts, and endothelial/vascular components also arose. (Devaud et al., 2013;Gimbrone et al., 1972;Kakarla et al., 2012;Bellou et al., 2013;Mueller and Fusenig, 2004;Zitvogel et al., 2006) Over the last decades, the premise of using molecularly targeted agents for targeted patient populations based on tumor/stromal molecular profiles and pathway dependencies gave rise to an array of novel drugs intended to abrogate malignant progression through these 'specific' drugeprotein interactions. (Griffin, 2001;Mauro et al., 2002;Pegram and Slamon, 2000) Targets now include receptor tyrosine kinases (RTKs) (e.g. HER2, EGFR, MET), intracellular kinases (e.g. PI3K, MEK, AKT), transcription factors (e.g. STAT3), stem cell pathways (SHH/SMO, Notch), immunomodulators (e.g. CTLA4, PD1/ PDL1, vaccines), and hormone receptors (e.g. estrogen, progesterone, androgen). Excluding classic cytotoxic inhibition of DNA synthesis and cell division, the main targeted therapy classes include 'biologics' (monoclonal antibodies with/ without linked cytotoxics known as Antibody-Drug Conjugates (ADCs)) (Fauvel and Yasri, 2014), 'small molecules' such as tyrosine kinase inhibitors (TKIs) (Leary and Johnston, 2007;Faivre et al., 2006), and more recently, specific gene expression silencing by 'RNA interference', (Videira et al., 2014;Deng et al., 2014;Yan et al., 2014) each with their own properties, advantages and disadvantages (Table 1).

2.
Inter-patient tumor molecular heterogeneity: the 'driver vs wheel' metaphor As opposed to the several diverse examples above which targeted sub-populations for targeted therapy using potentially predictive biomarkers, other evaluations of novel molecularly targeted inhibitors have not been patient-selective. Among numerous examples (e.g. anti-EGFR, (Waddell et al., 2013;Lordick et al., 2013a) anti-mTOR, (Ohtsu et al., 2013) anti-Hedgehog (Cohen et al., 2013)), clinical trials for GEC based on a 'one-size-fits-all' strategy have in general been disappointing. For instance, applying an EGFR inhibitor to the entire GEC population, where genomic activation occurs in only w5% of cases (EGFR gene amplification) and perhaps in another subset of w15e20% of patients with true EGFR 'over-expression' (in the setting of an otherwise normal EGFR gene), was not successful (Waddell et al., 2013;Lordick et al., 2013a) (Table 2). Interestingly, the EXPAND trial subset analysis suggested that those patients with tumors within the highest EGFR expressing categories by immunohistochemistry (IHC) appeared to derive survival benefit (HR 0.41) from cetuximab compared to placebo (Lordick et al., 2013b) (Table 2). Other studies since, such as the second line TRANS-COG erlotinib study, (Petty et al., 2014) have shown similar results in these select patient subsets. (Zhang et al., 2013) However, when lowering the threshold definition of 'EGFR over-expressed', or to the furthest extreme of including all GEC patients, the benefits derived in the small 'EGFR-driven' subsets were seemingly diluted. It is clear that if a similar 'one-size-fits-all' strategy was used for anti-HER2 therapy in GEC, (Bang et al., 2010;Hecht et al., 2013;Satoh et al., 2014) trastuzumab would likely have encountered the same fate as anti-EGFR agents for this disease (Table 2). This is evidenced within the ToGA trial where subset analyses showed that 'FISHþ, IHC0/1þ' patients derived no benefit from the addition of trastuzumab. (Bang et al., 2010).
The experiences over the last decade with respect to molecular targeted agents saw more negative than positive trials. This has led to the growing acceptance that targeted therapies should be used for targeted patient populations. This is exemplified in Table 2 where GEC trials that have made efforts to select patients in some way (either prospectively or retrospectively) have demonstrated improved outcomes, whereas unselective trials generally have not. It is important to note that within a cancer type, several molecular subsets may be present (Catenacci et al., 2014a;Sehdev and Catenacci, 2013a) (Table 3). This high inter-patient molecularly heterogeneity from one patient to the next is certainly true for GEC. (Deng et al., 2012;Wang et al., 2011;Holbrook et al., 2011;Zang et al., 2012;Dulak et al., 2012;Co, 2014) In contrast, other cancers, such as CML, are quite homogenous (w95% BCL/ABL translocation), (Druker et al., 2006) partially explaining the success of the 'one-size-fitsall' approach initially attempted with imatinib for CML. There is now increased recognition of inter-patient molecular heterogeneity for most solid tumors. (Bedard et al., 2013;Fedele et al., 2014;Longo, 2012).
Continuing with the EGFR example, a 'driver vs wheel' metaphor of a 'run-away 18-wheeler truck' can help to conceptualize the current appreciation of inter-patient heterogeneity of molecular 'oncogenic drivers' (the gas pedal) and loss of tumor suppressors (the brakes) ( Figure 1A). When EGFR is the genomic 'driver' of a tumor (ie. EGFR mutation or EGFR amplification; inappropriately 'pushing the gas pedal'), using targeted inhibition towards that driver generally has resulted in significantly improved clinical outcomes in that patient subset, (Petty et al., 2014;Zhang et al., 2013;Zhou et al., 2011) albeit until development of resistance with consequent progression ( Figure 1C). On the other hand, in settings when EGFR is 'over-expressed' without genomic activation, or even less impressively merely 'expressed', similar to any of hundreds to thousands of other proteins in a tumor, EGFR may only be one of many wheels (downstream effectors) on the truck (cancer cell) ( Figure 1A). Therefore, it is not critical and is easily expendable if neutralized, with other wheels (parallel escape signaling) taking up the slack. (Waddell et al., 2013;Lordick et al., 2013a).
However, certain wheels may be more important than other downstream wheels, acting as critical downstream 'hubs' (ie. a front wheel flat tire versus one in the back will slow the truck more effectively, Figure 1D). Targeted inhibition of critical non-genomically-activated downstream 'nodes' (e.g. ER/PR antagonists for ER/PR þ breast, CD20 antagonists for Lymphoma, HGF antagonists for METþ GEC (awaiting phase III ( 2 0 1 5 ) 9 6 7 e9 9 6 validation), and even classic cytotoxics inhibiting the bottleneck cell division apparatus) are examples that may represent this approach ( Figure 1D). It is essential to recognize that a gene/protein can be a driver in some patients within a given tumor type (EGFR amplification) yet more commonly only a minor wheel (or even a bystander) in the majority of patients with that same tumor type.  (Catenacci et al., 2013;Baines et al., 2011)). Synthetic lethality may circumvent loss of tumor suppressors (the brakes) and/or oncogenic activating events that are not actionable, (Kaelin, 2005) and could be considered a critical hub. In contrast, inhibitory strategies directed at the tumor stroma (e.g. anti-angiogenesis) alter the ( 2 0 1 5 ) 9 6 7 e9 9 6 environment ( Figure 1B), and in the metaphor can be thought of as forcing the truck to go uphill, thus slowing it down. Antiangiogenesis has no predictive biomarker identified to date, which may be due to a universal benefit across unselected patients (that is marginal) ( Table 2, yellow). Finally, immunomodulatory agents (e.g. anti-PD1/PDL1, anti-CTLA4, vaccines, adoptive transfer of tumor infiltrating lymphocytes) are an orthogonal approach, that may not be dependent on genomic drivers, that may re-establish the capability of the host immune system (police in the metaphor) to recognize, 'catch', and remove tumor cells in appropriately selected patients Mellman et al., 2011) ( Figure 1E). In short, there are several instances where more substantial clinical benefit was observed when using targeted therapies for targeted populations (directed at the genomic driver or a critical downstream hub) than when using targeted therapies in untargeted, or the whole of, populations. However, a combination of one or more of these strategies (targeting the genomic driver(s), downstream hub(s), and/or the orthogonal strategies towards tumor stroma with anti-angiogenesis and/or immunomodulation) may provide an optimal approach to 'slowing down or stopping the truck'.
Working under the premise of 'driver' biology with matched targeted therapeutics, inter-patient heterogeneity of activated oncogenes has obvious implications on our targeted treatment strategies and clinical trial designs.

3.
Intra-patient tumor molecular heterogeneity Despite the clinical gains realized following the matched "oncogenic driver/targeted therapeutic" strategy, other hurdles have prevented more substantial benefit. These hurdles include intra-patient molecular heterogeneity through space (within the patient), and over time (before and after therapy) e both of which have been reported across solid tumors. (Yap et al., 2012;Gerlinger et al., 2012) Examples of intra-patient tumor heterogeneity through space, either within the primary tumor, or from primary tumor to an involved lymph node and/or distant metastatic site are shown in Figure 2AeC. This is not a ( 2 0 1 5 ) 9 6 7 e9 9 6  (Nowell, 1976) In some series, rates of molecular evolution for a given biomarker through space at a given 'snapshot' in time are as high as 10e15%. Swanton, 2012) Other studies have minimized the rate and significance of this evolution, usually when typically only evaluating a few select genes/proteins in the study. (Vakiani et al., 2012;Vignot et al., 2013) This may be tumor specific. However, the ultimate evidence of tumor evolution through space was demonstrated using high-throughput next-generation genomic sequencing (NGS), offering a remarkable illustration Stopping the driver from pushing the gas pedal {targeted inhibition towards the function of the oncogenic genomic driver} may relieve the inappropriate acceleration {eg. trastuzumab for HER2 gene amplification}, if only temporarily until another mechanism (inherent or acquired) to maintain the acceleration stimulus (oncogenic driver) moves to replace it. (D) Although loss of any back wheel (downstream effector) will likely not slow the truck given the presence of numerous wheels (redundant parallel escape signals), some wheels downstream can be critical, like when inducing a flat front tire (critical downstream hub) {eg. inhibition of DNA synthesis: classic cytotoxics; or inhibition of key protein: estrogen/androgen receptor antagonists}. (E) Reversing mechanisms of police (immune) evasion can re-establish the ability to recognize and eliminate the abnormal 'speedy truck' {immunomodulation}. A combination of the strategies in (BeE) may be optimal to slow with significant magnitude and duration.
of Darwinian evolution and natural selection at the cellular level in renal tumors. Hull, 2005;Navin and Hicks, 2010) Clearly, misclassification of a tumor as 'HER2 negative' based on the primary site may have implications on outcomes for that patient if the metastatic site had evolved to acquire HER2 amplification ( Figure 2B). (Seol et al., 2012;Arena et al., 2013;Lee et al., 2013a;Yoon et al., 2012) Further realization of continued tumor evolution and adaptation through time with consequent therapeutic resistance, has been extensively described pre-clinically, (Catenacci et al., 2011a;Turke et al., 2010;Corso and Giordano, 2013;Cepero et al., 2010;Engelman et al., 2007) and exemplified clinically via pre/post therapy tumor biopsies ( Figure 2D and E). Building on the "Driver-Wheel" metaphor, the tumor mass is composed of large populations of cancer cells (a populations of trucks). Developed resistance and disease progression on therapy may be due to i) inherent concomitant genomic resistance mechanisms within the majority of cancer cells ( Figure 1C) rendering immediate resistance, ii) inherent reactive or 'adaptive' resistance mechanisms within the tumor/stroma DNA blueprint, in the absence of other genomic events, leading to a responsive/adaptive signaling pathway 'rewiring' rendering immediate or eventual resistance, (Wilson et al., 2012) and/or iii) clonal selection of the cancer cell sub-populations (certain trucks) possessing additional genomic drivers with/without the originally identified genomic event, rendering eventual resistance. In this third scenario although some cells (trucks) are sensitive to (and eliminated by) the therapy, those possessing genomic events that provide mechanisms of resistance will persist. Given this, trials matching a targeted agent towards a targeted genomic event may successfully derive benefit, with varying duration depending on the time to selection and full expansion of subclonal populations (resistant trucks) leading to eventual drug failure. Working under the premise of 'driver' biology with matched targeted therapeutics, even if appropriately matched at the onset, molecular evolution and selection through space and time also has apparent implications on our targeted treatment strategies and clinical trial designs.

4.
The challenge of molecular heterogeneity in the design of clinical trials 4.1.
Inter-patient tumor molecular heterogeneity The ToGA trial evaluated trastuzumab for 'HER2 positive' GEC, (Bang et al., 2010) and screened 3803 patients internationally to obtain 810 eligible patients, of which 594 were otherwise eligible for randomization (Table 2). To be eligible, 'HER2 positive' was defined as a 'FISHþ' ratio !2 with any IHC score (0-3þ), or IHC3þ with 'FISH-'; 'IHC2þ/FISH-' patients were ineligible. After excluding ineligible patients by these biomarker screening criteria, as well as those not meeting other trial entry criteria, only 15.6% of all-comers with stage IV GEC were eligible for therapy. Based on preplanned subset analyses, 'HER2 positivity' is now clinically defined with a more stringent threshold than even ToGA initially used for screening: (IHC2þ/FISHþ, IHC3þ/anyFISH), which would exclude the 131 patient tumors with FISHþ/IHC0-1þ scores who appeared to derive no benefit from the addition of trastuzumab. That leaves 463 patients from the original 3803 screened patients (12%), or 57% of the initially identified 'HER2þ patients' in the trial. By acknowledging the disappointments of applying targeted therapies in a 'one-sizefits-all' strategy, the ToGA trial illustrates the ongoing challenge when attempting to select patients for targeted therapies. This includes the extremely high numbers of patients required to screen when attempting to apply classic clinical trial designs, with frequentist statistical methods, (Simon and Maitournam, 2004) to subsets within a very molecularly heterogeneous disease such as GEC. Worse, the example of HER2, entailing w10e15% of GEC, is one of the larger 'slices of the pie' (Figure 3, Table 3). The accrual numbers that were required for the ToGA trial demonstrates how profound inter-patient molecular heterogeneity is challenging the application of novel targeted agents for specific subpopulations using traditional clinical trial designs. Selecting patients with MET amplified tumors at w4% incidence within GEC for anti-MET therapy, (Smolen et al., 2006;Catenacci et al., 2011bCatenacci et al., , 2014b which is based on sound preclinical and clinical evidence, is an even more difficult challenge than the HER2 ToGA example. Such a phase III trial would require >15000 total GEC patients with stage IV disease to be screened to accomplish a 'MET amplified' phase III selection trial. When also considering that there are several redundant drugs adopting the same strategy for this limited patient cohort, a large randomized phase III trial is seemingly impossible (as is even a randomized phase IIb). Importantly, there is increasing recognition of multiple rare molecular subsets (including both genomic events or proteomic expression 'cut-offs' that are considered biologically vital) within solid tumors. Therefore, selection strategies within biomarkerdriven trials using ' a la carte' low throughput companion diagnostic assays, such as IHC, PCR or FISH, result in sizeable screening delays. (Stricker et al., 2011) With each diagnostic test having its own central site, serial eligibility screening for each individual trial is required e all while the patient Figure 3 e Interepatient tumor molecular heterogeneity. (Left panel) Genomic profiling using a w240 gene next-generation sequencing (NGS) platform of a cohort of 50 stage IV GEC samples (upper panel) revealing few high frequency events (peak) and numerous low frequency events (tail); pie chart revealing profound inter-patient molecular heterogeneity (see Table 3 awaits without therapy. Patient drop-out due to wait-time is high in this setting, unfortunately. Moreover, in the stage IV setting, the tumor sample is most often from a small tissue biopsy that is ultimately exhausted via repeated serial analyses, precluding further screening for trial eligibility without a repeat tissue biopsy. Further, the odds of qualifying for a given trial are low given the relative infrequency of each aberration. Crucially, multiple concurrent events (genomic and proteomic) within a given tumor biopsy, often not fully appreciated when presented as pie charts or 'high-peak long-tail charts', further complicate treatment stratification (Table 3). (Catenacci et al., 2014a) For instance, focusing solely on genomic aberrations without considering proteomic profiling, one tumor sample can range from having 0e18 identified 'actionable' events on medium-throughput NGS platforms (w250-300 genes). (Catenacci et al., 2014a;Sehdev and Catenacci, 2013b;Frampton et al., 2013) Which of the multiple genomic events should be targeted? How to navigate the infinite possible drug-combinations without established phase I data? What if the drug is not yet available commercially and no trial immediately available? How do we test the hypothesis of each actionable molecular event and 'matched' drug with statistical power to rule out effect from random variation (which to the extreme, ultimately approaches an 'N-of-1' trial)? (Parmigiani et al., 2009) The number of patients required, the limited amount of tissue available, and the length of time to results acquisition, along with the dilemma of multiple 'actionable' events in a given sample, all highlight serious challenges currently imposed by inter-patient tumor molecular heterogeneity.

Intra-patient tumor molecular heterogeneity
In addition to the risk of misclassifying tumors that have evolved through space when profiling primary disease site biopsies ( Figure 2AeC), there are many other obstacles to successfully implementing targeted therapies for patients arising from intra-patient molecular heterogeneity. It is not uncommon that 'outdated' biopsies or previous curative intent surgical resections of primary tumors, often dating months to years earlier and with multiple intermediate lines of therapy, are used to molecularly profile a tumor and dictate treatment stratification in the present. The likelihood of tumor evolution through both space and time in this common scenario is considerable ( Figure 2). (Yap et al., 2012;Gerlinger et al., 2012) Although the challenge of imposing a biopsy in 'real time' had been prohibitive over the last decade, (Dowlati et al., 2001) recently, this is becoming more accepted and even required for biomarker stratified trial enrollment.
(Poplin et al., 2013) However, systematic acquisition of posttreatment progression biopsies to directly assess tumor evolution pre/post that particular therapy, often branded as "research" or "optional", (Lee et al., 2013b;Overman et al., 2013) have been largely unsuccessful with physician reluctance and high patient drop-out. This has led to lost opportunity to systematically evaluate reasons for immediate drug failure, or eventual resistance in the event of an initial response. Perhaps most importantly, there is lost opportunity to acquire the knowledge how to guide therapy for that patient going forward based on the molecular changes observed.

Biomarker-focused trials, heterogeneity, and nextgeneration clinical trials
With increased recognition of the relevance of molecular 'driver-biology' along with the obstacles posed from interand intra-patient heterogeneity, various biomarker-focused trial designs have been utilized to evaluate a biomarker's ability to predict treatment response (Table 4, Figure 4). A biomarker's predictability of benefit can be assessed within two broad categories e retrospective (but prospectively evaluated) or prospectively planned studies. Retrospective-prospective approaches have been a mainstay approach, for example with KRAS (De Roock et al., 2010;Catenacci et al., 2011c) (and now NRAS (Douillard et al., 2013a(Douillard et al., , 2013b) mutations demonstrating lack of benefit of anti-EGFR monoclonal antibody therapies in colorectal cancer ( Figure 4A). Notably this design requires sufficient numbers of tumor samples from the original trial to be assessed in order to limit selection bias. (Simon et al., 2009) It is also limited to higher incidence biomarker subgroups (KRAS mutation w40% of colorectal cancers) such that there is adequate statistical power to identify a differential benefit based on the biomarker.
On the other hand, there are various types of prospective biomarker-driven trials (Table 4). (Mandrekar and Sargent, 2011;Sargent et al., 2005;Freidlin et al., 2010;Lai et al., 2012) Prospective population enriched integral designs, such as the ToGA trial mentioned earlier (Bang et al., 2010), predefine an eligible population by the presence of the biomarker (e.g. HER2 amplification) and test the targeted agent (e.g. trastuzumab) only in that population ( Figure 4B). This design inherently requires preclinical validation of this strategy with respect to both the companion diagnostic in order to accurately identify the biomarker, and the drug's unique benefit only in tumors with this integral biomarker. (Dancey et al., 2010) A more recent type of population enriched design ignores traditional tumor histologic origin and classification (eg, gastric, lung etc.) by enriching for a certain biomarker, such as PIK3CA mutation or MET amplification, irrespective of tumor histology ( Figure 4C). In attempt to overcome the rarity of the aberration and enrich the cohort in early phase trials using this 'histology-agnostic' strategy, new sets of hurdles arise. Interpreting treatment outcomes in the context of heterogeneous tumor-specific outcomes is challenging, particularly without a placebo and histology-stratified control. Moreover, differing sub-specialty oncologists, within current academic clinical trial infrastructure, enrolling to one such trial may be difficult to coordinate. Finally, differing chemotherapy backbones for different cancer types (if combining cytotoxics with the targeted agent), and ultimately lacking a clear path towards FDA approval (discussed below, Figure 8) are all factors challenging this design approach.
Biomarker stratified (or marker-by-treatment interaction) integrated designs aim to evaluate both a new treatment and a biomarker within the same trial ( Figure 4D). (Freidlin and Korn, 2014) These randomized phase IIb trials enroll allcomers within the disease-type, with a planned biomarker subset analysis. If the phase IIb shows a significant interaction of effect in patients with/without presence of the biomarker, then integral selection of only patients with the biomarker Ideal to assist in identifying the best molecular subset for a drug, if this is previously unknown, in phase I-IIb trials Can address inter-patient molecular heterogeneity with multiple drug 'bins', with efficient prospective biomarker testing Adaptive statistical design to confirm early efficacy signals in later stages of the trial Can theoretically spin-off 'winning combinations' of new biomarker-drug matches to confirm in a larger phase III trial, with clear path to FDA approval Dynamic and iterative e add/remove drugs Requires very high numbers of patients for adequate power from start to FDA approval of a drug Difficult to accrue for follow up large phase III trials if biomarker is rare, as in 'Population Enriched' cohorts above Initially not truly personalized (randomized to each drug bin) for many patients Not ideal if a strong preclinical or clinical association between a biomarker and drug is already established (e.g. trastuzumab and HER2 amplification) Ideally, biomarker subsets are chosen beforehand, so they must be known, but design is flexible to add newly identified molecular subsets Requires multiple drug cohorts and therefore extensive coordination between various pharmaceutical collaborations (continued on next page) Can test defined biomarker subsets within a cancer with a drug (or drug combination) thought best matched to that biomarker cohort in an organized global approach for that specific tumor type Each biomarker cohort is run as its own phase IIa or b trial (compartmentalized), likely with a separate principal investigator Dynamic and iterative e can add/remove cohorts and matched drugs in real-time Treatment has (or should have) a prioritized scheme, acknowledging multiple aberrations in a given tumor Requires top-down coordination and centralization (feasible in centralized health care systems like the United Kingdom or in large cooperative groups/NCI-CTEP or large pharmaceutical companies with many drugs) Requires very high numbers of patients as each cohort is considered its own separate trial with individual statistical endpoints e infrequent biomarker incidence is not adequately addressed, particularly for less common tumor types Arguably, still requires a confirmatory phase III trial for each of the cohorts that have positive signals at the randomized phase IIb setting, requiring even more patients in the population enrichment phase III design Treatment algorithm can be considered arbitrary and may not have consensus amongst investigators Can test defined biomarker subsets in any tumor type with a drug (or drug combination) thought best matched to that biomarker cohort in an organized global approach for that specific tumor type Each biomarker cohort is run as its own phase IIa or b trial (compartmentalized) with a separate principal investigator Dynamic and iterative e can add/remove cohorts and matched drugs Wide participation (including private oncology clinics), central IRB and screening can screen large numbers of patients Requires top-down coordination and centralization (feasible in centralized health care systems like the United Kingdom or in large cooperative groups/NCI-CTEP or large pharmaceutical companies with many drugs) Requires very high numbers of patients as each cohort is considered its own separate trial with individual statistical endpoints e i.e. infrequent biomarker incidence is not specifically addressed, particularly for less common tumor types There is a trend of using the weaker primary endpoint of response rate in phase IIa trials (Signature) Arguably, still requires a confirmatory phase III trial for each of the cohorts that have positive signals at the randomized phase II setting, requiring even more patients in the population enrichment phase III design (and decision whether or not to select for specific histology) There is not a treatment algorithm and therefore tumors with multiple mutations are randomly selected to one of many possible biomarker groups Assumes aberrations are identical across differing tumor histologies, which is not always confirmed (e.g. BRAF mt in melanoma vs colon) for the confirmatory phase III trial would follow. A recent example of this approach includes the rilotumumab (AMG-102, anti-HGF) monoclonal antibody for GEC, (Iveson et al., 2014) which was non-selective in the randomized phase IIb, but based on the interaction of MET expression status, the ongoing phase III (NCT01697072, NCT02137343) currently selects only MET-positive tumors e as determined by IHC (Table 2). Larger sample sizes are needed statistically in the phase IIb biomarker stratified design in order to test the interaction, particularly if the biomarker is of low incidence, and a number of biomarker-negative patients are required to be enrolled, who are theoretically not expected to gain benefit based on the presumed link between the 'driver biomarker' and targeted therapy. However, adaptive randomization can limit drug exposure in biomarker-negative patients.  In this design, clearly, the specificity of the inhibitor is important, as promiscuity of the drug, such as observed with numerous kinase inhibitors, (Karaman et al., 2008) can confuse matters when responses due to off-target drug effect occurs (Table 1). For instance, an observed response in an ALK wild-type tumor having an unidentified MET amplification in a patient enrolled on a biomarker stratified lung cancer trial that is evaluating crizotinib therapy and the interaction of ALK-translocated versus ALK wild-type tumors would bias the results towards the null (a response in the biomarkernegative group due to MET amplification).
Although each of these 'classic' biomarker trial designs above have potential advantages and disadvantages (Table  4), perhaps the most notable disadvantage is the ' a la carte' nature of both the companion diagnostic and the 'biomarker/drug matching', which neglects the numerous concerns raised earlier regarding immense inter-patient and intra-patient molecular heterogeneity.

'Umbrella' or 'platform' next-generation clinical trials
In recognition of the numerous challenges described above by both inter-and intra-patient tumor molecular heterogeneity, newer trial designs have emerged referred to herein as A holistic approach to a specific cancer type within one trial, drastically reducing the total number of patients required Treating one tumor type with tumor-specific cytotoxics, strategies, and diagnostics All patients are eligible, given relegation tiers One center can run pilot phase IIa trials Randomized phase IIb iterations can be accomplished with small collaborative groups A number of ongoing trials can be done at various centers, testing various aspects of the personalized approach (Table 5) Positive phase IIb trials can move to the phase III setting to test the 'Holistic' approach OR positive cohorts within the phase IIb can spin-off to their own phase III trial Multiple treatment arms within one trial, which is challenging to negotiate different companion diagnostics and drugs for each identified biomarker subset Treatment algorithm can be considered arbitrary and may not have consensus amongst investigators, but given the low numbers required, the algorithm can be tested quickly with one/few sites, while other algorithms can be tested simultaneously within separate parallel Type IIA trials performed at other sites. Despite rationale for such a design, regulatory structure and FDA approval of a trial encompassing multiple molecular subsets each treated with a matched therapy towards one common statistical endpoint is uncertain currently, deterring Pharma and Companion Diagnostics company participation Type IIB: Grass-Roots and Holistic *With 'Biologic Beyond Progression' (BBP) e.g.

PANGEA-BBP
The only biomarker-driven trial to address intra-patient tumor heterogeneity over time due to resistance in sequential fashion Sequential nature of BBP allows for less confounding of post-protocol therapies for overall survival endpoint, and also less selection bias at second or third line setting A randomized phase IIb can evaluate overall survival of a 'personalized holistic approach' compared to standard therapy Those positive phase IIb trials can move to the phase III setting to test the 'Holistic' approach OR positive cohorts within the phase IIb can spin-off to their own phase III Multiple biopsies are required, a potential deterrent for some patients/physicians Treatment algorithm can be considered arbitrary and may not have consensus amongst investigators, but given the low numbers required, the algorithm can be tested quickly with one/few sites, while other algorithms can be tested simultaneously within separate parallel Type IIA trials performed at other sites Despite rationale for such a design, regulatory structure and FDA approval of a trial encompassing multiple molecular subsets each treated with a matched therapy towards one common statistical endpoint is currently uncertain, deterring Pharma and Companion Diagnostics company participation 'Next-Generation Clinical Trials'. The designs discussed below each can evaluate novel targeted agents as monotherapy or in combination with standard cytotoxics and/or targeted agents, and most importantly, they can also assess 'novelenovel' combination approaches/strategies and efficient ways to select and test these combinations (Yap et al., 2013).

Exploratory platform (e.g. 'BATTLE', 'I-SPY')
In particular instances where there is uncertainty as to the optimal molecular subset(s) to apply a given therapeutic agent, the 'Exploratory Platform' design has emerged. Notable examples include the 'BATTLE' (Kim et al., 2011) and 'I-SPY' (Barker et al., 2009) trials ( Figure 5A). Using a Bayesian adaptive statistical approach,  drug arms (or 'bins') are pre-specified, and patients are randomized to each arm evenly (ie. not 'personalized' initially). Molecular testing is performed on the tumor biopsy prior to randomization and the pre-specified biomarker subsets are stratified evenly between the drug cohorts (an integrated biomarker design, initially). This design is essentially an umbrella biomarker stratified design ( Figure 4D) (if a control arm is included e I-SPY 2 does, BATTLE didn't), exploring several drug cohorts and a number of molecular biomarker subgroups simultaneously. Over time, if a significant efficacy signal is identified in a certain biomarker-drug pair at a planned interim analysis, then adaptive randomization of that now integral biomarker (e.g. molecular profile D) to that treatment arm (e.g. treatment Y) ensues ( Figure 5A). (Blumenschein et al., 2013;Tsao et al., 2013;Po, 2014;Ne, 2014) Alternatively, drugs that do not demonstrate benefit in any of the molecular profiles are dropped. If molecular profile D and treatment Y continue to demonstrate the initially observed signal through the later trial stages, this newly recognized pairing of 'profile D / treatment Y' can be confirmed statistically in a traditional 'population enriched' phase IIb trial or may proceed directly to a phase III confirmatory trial ( Figure 4B), with all the advantages/disadvantages that this design entails, as discussed above with population enriched trials (Table 4, Figure 8).

Expansion platform Type IA: Global, Compartmentalized & Histology Dependent (e.g. 'FOCUS-4')
The 'FOCUS-4' trial, (Kaplan et al., 2013) recently described for advanced colon cancer, is a first example of this clinical trial design, referred to herein as an 'Expansion Platform Type IA' trial; this is a global, compartmentalized, and Histology Dependent trial, 'expanding' from the population enrichment design evaluating one biomarker, (Figure 4B), into a platform After first progression (PD1) patients undergo repeat biopsy of a progressing lesion and undergo repeat molecular testing and treatment assignment, which may allow cross-over to a more appropriate biological group as directed by the prioritization algorithm (Figure 7). Patients on placebo remain on placebo at each progression point.
consisting of multiple biomarker categories. This overarching 'umbrella' protocol design provides structure in which there can be global molecular characterization of patient tumors into specific integral biological cohorts. The 'FOCUS-4' trial includes: Cohort A, BRAF mt; Cohort B, PIK3CA mt and/or PTEN loss; Cohort C, RAS mutation; Cohort D, All wild type, and also includes a molecularly 'unclassified' or 'relegation' Cohort E ( Figure 5B). In contrast to the Exploratory Platform discussed above, the Type IA Expansion Platform identifies integral molecular subsets prior to trial initiation, (Dancey et al., 2010) and assigns therapy to only those within that biological subset. The example of the 'FOCUS-4' trial applies this Type IA design for maintenance therapy after obtaining stable disease (or better) on first line therapy for stage IV colorectal cancer. It also includes a treatment prioritization scheme to address the issue of multiple biomarkers found within one individual (ie. Cohort A > B > C > D > E). The Expansion Platform Type IA design is differentiated from the Type IB design, below, which is histology-agnostic. The Type IA is differentiated from Type II designs, below, in that each cohort in the Type IA is run as a separate (compartmentalized) phase IIa (or ideally randomized phase IIb) trial with distinct Principal Investigators, and isolated statistical endpoints for each subgroup. This Type IA Expansion design requires a global 'top-down' organizational infrastructure, such as is possible as this large national study in the United Kingdom. This global approach requires the inclusion of multiple enrolling centers, to increase accrual rate to the rare molecular subsets in order to achieve statistical power within each subgroup. Cooperative Groups and NCI/ CTEP in the USA could also design trials in this fashion as well, such as the planned 'MASTER PROTOCOL' for squamous cell lung cancer, (Abrams et al., 2014) as could pharmaceutical companies (alone or in collaboration) able to embody multiple biological cohorts within their pharmaceutical portfolio.  The biomarker and treatment assignment algorithm is premised on optimizing the inhibition of 'driver-biology'. This 9-point algorithm serves to prioritize treatment assignment should multiple aberrations (genomic and proteomic) be observed in an individual sample. Should multiple aberrations be present, priority could be given to higher allele frequency (for mutations) or higher gene copy/expression. The algorithm acts as a filter to create 5 distinct biomarker categories (with 9 tiers) that will receive 5 specific and most-appropriately matched targeted therapies. Approximate hazard ratios (HR) anticipated for each categorized tier, as well as the aggregate HR (the primary endpoint of PANGEA), are indicated. This first iteration of the 'PANGEA' strategy is a compromise within the spectrum between the two extremes of 'one-size-fits-all' and completely individualized therapy or 'N-of-1' (bottom panel). Rather than being a 'tailored suit', PANGEA can be considered fitting to 'X-large, large, medium, small and X-small'. Future iterations could include more biomarker categories and treatment arms, consequently moving closer towards the 'N-of-1' limit. Figure 8 e Applications of next-generation clinical trial designs, and total patients required, towards approval of 'personalized' treatment strategies that encompass both the drugs and companion diagnostics. Total numbers of patients required from phase II to phase III and FDA approval are approximated in the final right column, using a biomarker incidence of 20% and 7% as examples. For comparison purpose, the numbers reflect a median overall survival as the primary endpoint with target HR 0.67, two-sided alpha 0.05, 80% power, randomization ratio 2:1, 12 month accrual and 24 month follow up. Total numbers for each trial design include estimated numbers for serial phase IIa, phase IIb, and then phase III trials in 5.1.3. Expansion platform Type IB: Global, Compartmentalized & Histology Independent (e.g. 'NCI-MATCH' and the Novartis 'Signature' trials) The Expansion Platform Type IB design is also an umbrella global design with the main difference from the Type IA being that it is histology-agnostic ( Figure 5C). Trials and concepts to date following this design include the 'NCI-MATCH' 'basket' protocol (currently planning to screen w3000 patients and treat 1000), (Abrams et al., 2014) which will include multiple histology-agnostic phase IIa (or ideally phase IIb) arms each consisting of an identified genomic aberration(s) and 'matched' targeted therapy. A second trial example is the 'Signature' 'basket' trial by Novartis currently with five molecular 'bins', each registered as an individual phase IIa (non-randomized) NCI trial. To date, neither of these two examples have specified a treatment prioritization algorithm, if a patient were to be eligible for more than one molecular bin, however, NCI-MATCH will putatively submit profiling results to the 'automated MATCHBOX rules engine' where aberrations will be matched to treatments based on a currently undisclosed prioritization scheme. A question remaining is whether follow up phase II/III randomized trials would be required for FDA approval of 'successful' single arm phase IIa trials that have primary endpoints of response rate or disease control rate. If so, it is obvious that the numbers of patients required from start (phase IIa) to the completion of the biomarker selective population-enriched phase III trial are vast (Figure 8). However, the recent example of crizotinib gaining FDA approval for ALKtranslocated NSCLC without a traditional randomized phase III trial may have emboldened this single arm phase IIa approach.

Expansion platform Type IIA: Grass-Roots, Holistic & Histology Dependent (e.g. 'PANGEA' trials)
The Expansion Platform Type IIA design, on the other hand, is a trial design that is not global, nor compartmentalized, and can be considered a 'Grass-Roots' or investigator initiated approach. In fact, it can be performed in single institutions as a pilot trial (phase IIa non-randomized) or within smaller collaborative groups as randomized phase IIb trials ( Figure 5D). The 'PANGEA' concept (Personalized Antibodies for Gastro-Esophageal Adenocarcinoma) is the first example of the Type II holistic design. (Catenacci et al., 2014a) As in the Type IA Expansion Platform, the Type IIA 'PANGEA' design identifies various integral molecular subsets within GEC that are tiered by level of priority and degree of anticipated benefit from targeted inhibition (Figure 7) prior to trial initiation. Therapy is assigned specifically only to those within that biological subset. This treatment assignment is based on current understanding of 'driver' biology of that tumor type at the time of trial initiation, matching available targeted therapies thought to best suit each molecular subset. It also relies on a pre-specified treatment prioritization algorithm to address multiple 'drivers' and inter-patient heterogeneity (Figure 7, Table 3), (which could be considered arbitrary, particularly if anticipated hazard ratios are similar between different potential biomarker-drug pairings). Should multiple aberrations be present, priority in 'PANGEA' will be given to higher allele frequency (for mutations) or higher gene copy/expression, (Gomez-Martin et al., 2013) for example. However, the Type IIA design is executed as one uniform (holistic) umbrella trial, with one primary statistical endpoint testing the hypothesis that personalized therapy is better than the current standard therapy for that tumor type e it is ultimately testing the treatment strategy comprised of numerous companion diagnostics and their respective matched targeted therapies. All patients screened are eligible irrespective of their molecular profiling result, due to relegation tier(s) within the treatment algorithm (Figure 7). The design can be applied at any line of therapy to evaluate a 'personalized strategy' compared to the standard treatment for that scenario. It can be performed as a phase IIa pilot compared to historical outcomes, as a randomized placebo-controlled phase IIb, or conceivably even as a large registration phase III trial (for the pre-specified treatment strategy), if warranted, based on promising phase IIb trial results (Table 4, Figure 8).

Expansion platform Type IIB: with Biologic Beyond Progression (e.g. 'PANGEA-BBP')
None of the biomarker directed trials discussed thus far systematically address (with action) intra-patient tumor heterogeneity though time. All are focused on one line of therapy, and none control post-progression treatment (in either the placebo or treatment arms) which may result in confounding when evaluating an overall survival outcome, as has been experienced in numerous trials, including the current debate surrounding discordance between the 'FIRE-3' and 'CALGB-80405' colorectal trials. (Catenacci et al., 2011c;ESMO, 2014) The Expansion Platform Type IIB design with Biologic Beyond Progression (BBP) builds on the Expansion Platform IIA, by incorporating a sequential biopsy after failure of therapy (or through multiple lines of therapy), along with a reassigned targeted therapy as appropriate ( Figure 5E). (Catenacci et al., 2014a) This is an attempt to address tumor evolution and resistance over time. The first iteration of the 'PANGEA' trial (IMBBP, NCT02213289) is an example of this Type IIB Expansion Platform Design, seeking to address the hurdles posed by both inter-and intra-patient tumor molecular heterogeneity and attempting to realize the full potential of targeted therapeutics ( Figure 6A).

'PANGEA-BBP' and the 'Biomarker & Treatment Assignment Algorithm'
The 'PANGEA-IMBBP' trial ( Figure 6A), is a pilot (phase IIa) trial for patients with GEC evaluating a personalized treatment tandem. For the exploratory platform design, given the adaptive Bayesian statistics, a direct comparison is not possible. * The target total number for the ongoing BATTLE-2 trial. **The target total number for the ongoing ISPY-2 trial. ***Estimated numbers for a follow up randomized phase IIb trial for an identified biomarker/drug combination from either the phase IIa or Phase IIb Exploratory Platform design, with statistical endpoints as set above, performed prior to a full phase III. Numbers in parentheses indicate the target biomarker population subset that would be required to be identified from the entire patient population.
algorithm that is compared to historical control survivals of approximately 12 months (NCT02213289). (Catenacci et al., 2014a) Ultimately, should the initial feasibility, toxicity, and early efficacy endpoints be met on this pilot, the trial will proceed to a placebo-controlled randomized phase IIb ('PANGEA-IIMBBP') to evaluate the primary endpoint of overall survival ( Figure 6B). In this particular iteration of the Type IIB design, the targeted therapies chosen for 'PANGEA' are monoclonal antibodies, due to the benefit of antibody specificity, ADCC, and ease of combination with cytotoxics for this tumor type, facilitating use in first line setting (Table 1). Given the recognized inter-patient molecular heterogeneity (Figure 3, Table  3), choosing five or six biologic cohorts is largely a compromise between the feasibility of acquiring various targeted drugs for each group and the putative number of potential biological cohorts. Rather than being a 'tailored suit', PANGEA can be considered fitting to 'extra-large, large, medium, small and extra-small'. The 'PANGEA' 'Biomarker & Treatment Assignment Algorithm' prioritization scheme is the key variable being tested in this trial (Figure 7, Table 5). The BBP component of this Expansion Platform Type IIB design will mandate biopsies at each progression time-point to evaluate tumor molecular evolution (heterogeneity through time). In addition to altering the standard backbone cytotoxic therapy at each time of progression as is routinely done clinically, assessment will be made on the 'real-time' biopsy to either continue the previously assigned biologic (if the tumor has not evolved to a different biologic subgroup) or to change to a different biologic (if the tumor has evolved to a different biologic subgroup). Evidence that maintenance of 'biologic inhibitory pressure' on an oncogenic 'driver' is effective despite first (or second) progression has been demonstrated both preclinically and clinically in various settings. (Cepero et al., 2010;Grothey et al., 2008;von Minckwitz et al., 2009;Verma et al., 2012) On the other hand, should there be clear 'migration' of 'driver' status within the prioritization algorithm to a new biomarker classification at any progression time-point, the patient will be treated with the new appropriate biologic therapy matched to the new molecular subgroup. In this pilot 'phase IIa' study, multiple phase-Ib-like safety lead-ins for each biologic/cytotoxic pairing will be done if full phase I data is not yet established, rather than separate large phase Ia trials; it is anticipated that the addition of antibodies to various cytotoxic regimens will not substantially alter safety profiles. At each disease progression time-point, the 'PAN-GEA-IMBBP' iteration in Figure 6A allows for treatment cross-over based on the 'Biomarker & Treatment Assignment Algorithm' e but a trial could be designed without such allowance, with the previous biologic continued despite evolution while altering the chemotherapy backbone, or by continuing the original biologic, but adding a second (Table 5). Also, the 'PANGEA-IMBBP' iteration maintains a constant 'Biomarker & Treatment Assignment Algorithm' at each progression time-point, but could alternatively test the application of a 'dynamic algorithm', in attempt to ensure a different biologic therapy at each progression point. More complex designs with combinations of biologics could also be envisioned (Table 5), as it is possible that 'personalized therapeutic cocktails' may ultimately be required to optimally inhibit molecularly heterogeneous tumors.
The challenges of a single institution or Principle Investigator contracting with several pharmaceutical companies for participation in any of the Expansion Platform Designs cannot be underestimated (it is not an understatement that 'PANGEA' was attempting to 'move continents' going on five years from conception in 2009 to opening 2014!), but this has become increasingly more feasible. Similarly, issues surrounding validation of next-generation companion diagnostics prior to trial initiation are also prohibitive, but achievable, as discussed below. (Khoury and Catenacci, 2014) Statistical considerations of Type II Expansion Trials, for phase IIa, phase IIb and phase III will be reported separately (other than the major clinical endpoints shown in Figures 6,  8). The detailed statistical methods for 'PANGEA' include pre-specified statistics in the phase IIb and/or phase III settings to i) identify prognostic implications of each molecular group as determined by placebo-controlled stratification of each biomarker subgroup; and ii) identification of significant interactions on clinical outcome between and within the molecular subsets and matched therapies, to assist in determining future trial iterations and optimal treatment approaches.

7.
Next-generation clinical trials: a comparison Advantages and disadvantages of the discussed 'Next-Generation' trials are detailed in Table 4. Different settings will call for application of different trial designs (Figure 8). Different diseases may likely need to be approached differently, and diverse health care systems may dictate feasibility of one approach versus another. Retrospective-prospective trials remain reasonable for already-completed trials to explore translational concepts. Classic 'histology-dependent population-enriched' trials may be useful for biomarkers that are relatively prevalent, but will not be ideally suitable for low frequency molecular subsets. 'Histology-independent population-enriched' trials continue to be used for early signal detection in rare molecular subsets in 'phase Ic expansion' trials, which are essentially multiple parallel tumor-specific phase IIa trials (e.g. anti-PDL1 for w40 patients of each tumor-type of interest within a 'phase I expansion') ( Figure 8). A path to FDA approval for this strategy is not clear, yet if a substantial clinical benefit is observed compared to accurate historical controls when no alternative therapy exists, even in relatively small patient numbers, then approval may be possible (e.g. crizotinib and ALK translocation). Traditional 'biomarker stratified' designs performed ' a la carte' will likely become obsolete with the recognition of smaller and smaller molecular subsets, requiring too many negative screens and inefficient use of precious tissue samples and screening time. (Warth et al., 2012) Molecular profiling from the peripheral blood may ameliorate this problem, as discussed below, but this is currently not validated nor readily available for clinical application. 'Exploratory Platform Designs' require large numbers of patients, along with significant coordination, pharmaceutical participation, and funding, yet they may be useful to identify molecular profiles that predict benefit to specific drugs in cases where strong preclinical/clinical evidence establishing ( 2 0 1 5 ) 9 6 7 e9 9 6 there is question as to benefit of the investigational drug in the biomarker-negative patients), or it could proceed directly to a confirmatory phase III population enriched design. d A phase III trial could be designed holistically testing 'personalized treatment' versus control, pooling the subgroups together towards the primary endpoint, with the advantage of requiring significantly fewer patients. The caveat is that all biomarker subgroups along with matched targeted agents chosen must contribute to the overall benefit observed (ie. the HR for each subgroup, although not required to be equal, should all be < 0.8, and the aggregate HR must meet the primary overall endpoint e see Figure 7). The power to detect benefit of each subgroup will be limited, unless the benefit observed is large. e Depending on the frequency of the biomarker subset within the population studied (Histology Dependent), the ability to identify the benefit in that subgroup, if small, in the exploratory platform design may not have adequate power, unless initial trial sizes are substantially larger. Moreover, once the second adaptive randomization phase establishes benefit (which requires more patients), the confirmatory phase III trial would still require very inflated numbers of patients screened to identify the infrequent biomarker þ patients. f Screening 2500 patients (after the initial 1375 patients in the phase IIB) may be plausible with global coordination (eg Research UK and Medical Research Council Clinical Trials Unit) for a high incident tumor such as colorectal cancer. However, many tumor types do not have this 'luxury' of high incidence and would have difficulty with such high numbers required for screening/accrual. such associations is lacking. In general, for Exploratory Platform designs, performed in phase IIa settings, these will require spin-off of newly discovered biomarker-drug pairings through prospective placebo-controlled randomized phase IIb and/or III trials (requiring even more patients) for confirmation and ultimate approval (Figure 8). Type IA and IB Expansion Platform Designs will require extensive top-down coordination, pharmaceutical participation, and companion diagnostic validation, all of which can make this difficult to materialize. However, if in place, such as the first five modules of the 'Signature Trial' (now 8 trials just prior to manuscript publication) this umbrella screening protocol will funnel patients into similar biologic groups having distinct 'stand-alone' trials assessing relevant biomarker(s)-drug(s) pairing questions. However, the numbers of patients required to proceed from Phase IIa-/IIb/III confirmation and ultimately regulatory approval are drastically high (Figure 8). For the histology-agnostic Type IB trial, the assumption that a molecular aberration behaves identically across histologies may not be true, exemplified by BRAF mutation and differences in clinical benefit in melanoma and colorectal cancer with inhibitory BRAF monotherapy. Genomic events in the context of tumor histology may prove to be important going forward in more examples. The Type IB Phase IIa trials done with exploratory intent may help to elucidate this question. Backbone chemotherapies differ across tumor types, posing challenges for the Type IB trial design when attempting to evaluate molecularly targeted therapies in combination with cytotoxics in earlier lines of therapy.
The Type II Expansion Platform Design (with or without BBP) is ideal for small single institution Phase IIa pilot studies (n ¼ w70 for HR 0.66) and even Phase IIb studies (n ¼ 100e200, depending on the HR, power, type II error, and randomization ratio desired) that assess one strategic approach of several relevant biomarker-drug pairings (Figure 8). It can even conceivably test the 'personalized treatment strategy' as a stand-alone phase III trial with intent for regulatory approval for all components (companion diagnostics, drugs, and treatment algorithm) within the strategy, or could be spun off to have individual phase III trials for each molecular subset should a signal be identified in one of the biomarker-drug pairings, with the caveat of substantially increasing the numbers of patients required to reach individual subgroup statistical endpoints (Figure 8). This Type II trial design is dependent on a host of factors including accurate biomarker-drug pairings, validated companion diagnostics, and most specifically a correct treatment algorithm. However, given the low numbers required for pilot trials, numerous iterations can be conducted across various centers altering any of a number of variables within the design of the trial (Table 5). This iterative process (Figure 9) along with consequent refining of biologic subgroups, better matched therapies (or use of other agents within a given drug class), and a whole host of other variables (Table 5) will allow for multiple centers, cooperative groups, and pharmaceutical companies to tackle these questions simultaneously. Smaller pharmaceutical companies can also participate within limited budgets, as relatively fewer patients would be assigned to their actual treatment arm within the trial, and the umbrella screening costs would be g Depending on the frequency of the biomarker subset within the population studied (histology Independent), the ability to identify benefit in a phase IIb trial would require high patient numbers to identify infrequent biomarkers (despite searching across tumor types), and the ensuing confirmatory phase III would also need many patients. Also see point b above regarding histology agnostic designs.
h The treatment assignment algorithm is an effort to address inter-patient heterogeneity and multiple concurrent aberrations within the tumor sample. Despite best efforts to incorporate current biologic understanding and rationale, ultimately this algorithm is arbitrary.
i Intra-patient tumor heterogeneity over time (ie treatment resistance) can be assessed by repeat biopsy (or surrogate via serum/urine assays) of a progressing lesion, re-evaluating biomarker status, and re-assignment by the treatment algorithm. Allowing cross-over to the new biomarker/drug group as appropriate may enhance the personalized strategy.
j The treatment algorithm can be held constant at each progression time-point in the Type BBP design, (e.g. if there is still HER2 amplification, but a newly acquired MET amplification observed at the time of progression, continued anti-HER2 therapy would be indicated because HER2 is first in the priority tree), or the algorithm could be fluid (e.g. if in the same scenario, if there has been progression on anti-HER2 therapy, a fluid algorithm could exclude eligibility from the prior biomarker group, and proceed to the next groups, or a fluid algorithm could allow for continued anti-HER2 but also addition of other targeted therapies directed at the newly acquired molecular aberration, in this case an anti-MET agent.).
k FOCUS-4 includes a relegation cohort (E) that is negative for inclusion in cohorts A-D. However, other Type IA designs do not necessarily include such a relegation category (e.g. NCI-MATCH), and therefore not all screened patients would be eligible. Figure 9 e The 'PANGEA' strategy addressing intereand intraepatient tumor molecular heterogeneity. The expansion platform type II design with biologics beyond progression is testing the 'PANGEA personalized treatment strategy'. Obtaining baseline biopsies of metastatic disease and shared by all trial pharmaceutical participants. Ultimately, the phase IIa and IIb Type II Expansion trials that identify the most promising holistic approaches, could ideally 'expand' to a phase III Type II Expansion trial to test the treatment strategy as compared to the prevailing standard therapy. Should the new personalized treatment strategy meet pre-specified survival endpoints with statistical significance compared to control, this would be grounds to consider it as a new standard of care ( Figure 8). Future iterations could then compare, for instance, ten refined molecular subgroups compared to the newly established personalized approach with five subgroups, in the example of 'PANGEA'. Smaller increments of progress (starting with 5 subgroups, then moving to 10 etc), as opposed to attempting 10 or more right away, will limit the problems concerning uncertainty of benefit across each and every one of the included subgroups ( Figure 7).

Next-generation companion diagnostics
Attention to the regulation of companion diagnostics increased dramatically after the investigations regarding the 'Duke scandal' came to light. (Kurzrock et al., 2014) A recent opinion letter detailed the difficulties in achieving optimal balance between imposing FDA regulatory oversight and facilitating advances in the development of innovative biomarker testing and trial designs, (Kurzrock et al., 2014) and many have called for a complete overhaul of the regulatory system for diagnostic tests. (Hirsch et al., 2014) Multi-plex analysis of biomarkers will be required to overcome the hurdle of interpatient molecular heterogeneity and limited tissue, as discussed above. (Stricker et al., 2011;Khoury and Catenacci, 2014;Collins and Hamburg, 2013) Multivariate analyses associating combined genomic and proteomic profiles with clinical outcomes will likely assist to optimally determine best uses of each targeted therapy. (Catenacci et al., 2014b;Parkinson et al., 2012;Mendelsohn, 2013;Garraway et al., 2013;Hembrough et al., 2012) Clear paths to regulatory approval for pre-specified companion diagnostics and drugs within the Expansion Platform Type II trial design, such as a future 'PANGEA-IIIMBBP', is paramount, as this will facilitate pharmaceutical collaboration in such trials ( Figure 8). (Khoury and Catenacci, 2014) The approval of next-generation DNA sequencing for marketing authorization recently by the FDA is a first step. (Collins and Hamburg, 2013) We recently proposed a certification mechanism for laboratories offering next-generation companion diagnostics services coined "Certified Advanced Companion Diagnostics Facilities e CACDF" to meet certain benchmarks, such as performance against a standardized sample set, in order to be granted certification by the College of American Pathologists (CAP) under commission from the FDA. (Khoury and Catenacci, 2014) However, identifying genomic variants that can be validated before being integrated into decision making for purposes of modulating drug response are critical. (Khoury and Catenacci, 2014;Collins and Hamburg, 2013;Catenacci et al., 2014c) Future goals of next-generation companion diagnostics using non-invasive 'liquid biopsies' to assess for circulating tumor cells or free serum/urine DNA are already being assessed as potential surrogates to serial tumor biopsy ( Figure 9). (Khoo et al., 2014;Diaz and Bardelli, 2014;Leary et al., 2012;Speicher and Pantel, 2014;Schwarzenbach et al., 2014) It may or may not be that the same information can be acquired from the peripheral blood and/or urine as from the tumor itself, yet there is great potential for one or both of these assays, and this needs to be tested with scrutiny. (Pantel and Alix-Panabieres, 2013;Shimada, 2014;Alix-Panabieres and Pantel, 2014) Ultimately, the least-invasive strategy that best predicts therapeutic benefit of given targeted therapies at the lowest cost to the health care system will prevail.

9.
Next-generation FDA regulation: companion diagnostics and drug approvals An initial concern of novel and complex trial designs, particularly the Type II Expansion Platform, is the exceedingly high number of variables embedded within the personalized strategy (Table 5). However, consider a hypothetically completed phase III 'PANGEA-IIIMBBP', where the overall survival (OS) endpoint is met for the investigational arm of the schema in Figure 6B. Clearly, if the endpoint of OS is met in this hypothetical prospective, double-blind, placebo-controlled phase III trial, it can be strongly argued that the companion diagnostics, the biomarker subsets, their assigned treatments, and the pre-stated treatment algorithm have, collectively, shown superiority over the standard control with statistical scrutiny (and had also done so prior in an earlier phase IIb). This is especially true if the pre-specified subset statistical analyses verify that there are no identified 'duds'. 'Duds' are defined as biomarker-drug groups that actually do not provide benefit and do not contribute to the improved intention-to-treat survival of the holistic investigational arm (and may actually be detrimental). For example, what if the primary endpoint is met, yet the survival benefit is mostly due to only 2 of the 5 groups, while the other 3 biomarker-drug groups are noncontributing? As discussed earlier, pre-defined statistics are intended to identify ineffective biomarker-drug groups from progressing through the phase IIb into the confirmatory phase III schema, using an iterative process for planning future trial treatment strategies (Figure 9). Moreover, further scrutiny of each of the subsets within the phase III would also be planned to identify any 'dud' that was not identified at the phase IIb checkpoint. Certainly, the cross-over nature of the BBP component of the Type IIB Expansion trial (if included in that design) does significantly confound the subgroup assessment for overall survival, while PFS1 would remain distinguishable prior to any cross-over taking place.
serially biopsies at each progression time-point within the trial with repeat molecular testing and treatment assignment to match targeted therapies in real-time may improve clinical outcomes, compared to a historical (phase IIa) or placebo (phase IIb) controlled standard therapy. Upon completion of each trial, an iterative process will allow to refine the treatment strategy (biomarker assays, molecular categories, treatment algorithms, and therapeutic agents) using knowledge gained from each previous trial and new technology and drugs developed in the interim.
Regardless, many standard treatments to date are in place based on 'one-size-fits-all' approaches of adding an investigational agent to standard therapy, where the benefit of adding the investigational agent was 'statistically significant', yet there were clearly more patients that derived no benefit at all ( Figure 10A); yet it is considered 'sufficient' to set new treatment standards for all patients. For the Type II Expansion Platform Design, the personalized approach is hypothesized to derive higher absolute benefit collectively, since it is an aggregate of several targeted therapies used in targeted subgroups that are predicted to respond, prioritized in a manner to optimize clinical outcome (Figure 7). This may not be statistically evident when evaluating each molecular arm individually, due to lack of power ( Figure 10B, left panel), but would certainly be apparent when evaluating the holistic approach in terms of aggregate benefit as the primary endpoint, the 'raison d'être' of the type II Expansion Platform Design ( Figure 10B, right panel).

Conclusions
Tumor molecular heterogeneity is a hurdle to successfully realizing the full potential of molecularly targeted therapeutics. To date, 'classic' biomarker-driven trial designs have attempted to address various aspects of molecular heterogeneity, but with recognized limitations. Next-generation clinical trial designs coupled with next-generation companion diagnostics are emerging as solutions to the increasingly recognized issue of molecular heterogeneity. A paradigm shift from 'one-size-fits-all' and ' a la carte' biomarker diagnostics and patient selection to next-generation clinical trial designs may allow us to break through the clinical benefit plateau experienced recently with molecularly targeted therapies. Immune therapy holds great promise in the upcoming years as another layer of therapeutics in the arsenal. Regulatory oversight should continue to reasonably assess novel companion diagnostics and next-generation clinical trial designs, and provide guidance and assurance for regulatory approval to encourage and stimulate innovative approaches, such as testing 'personalized treatment algorithms' within the Expansion Platform Type II Design.

Conflicts of interest
None. Figure 10 e Comparison of one-size-fits-all accepted design strategy and the 'Expansion Platform Type II Holistic Design'. (A) In the classic clinical trial design, administering an investigational agent to all-comers versus placebo will lead to approval, should statistical endpoints be met. Often, statistical endpoints are met with only marginal clinical improvement in overall survival (w1e2 months). Approval of agents in this scenario leads to large numbers of patients treated with the new agent that do not derive any benefit (top and bottom bracket at any time-point (t) along the x-axis). Often targeted agents applied using this trial design fail since only a small subset derive benefit which is not recognized due to dilutional effects of the other biomarker-negative patients, along with too few numbers within the subset analysis for adequate statistical power. (B) The Expansion Platform Design Type II (with/without biologics beyond progression) uses targeted agents for targeted populations (middle panel), in attempt to improve (red line) over the natural outcome observed for each specific molecular group treated without the targeted agent (black line). Three of the 5 subgroups of PANGEA are shown here as theoretical outcomes that are hypothesized. Due to the large number of patients that would be required should each of the molecular groups within PANGEA be run as an individual compartmentalized stand-alone trial (ie an Expansion Platform Design Type IA or B), the advantage of the type II design is that all patients screened are placed in a group that is most appropriate for them within the one trial, reducing total patients required. Results are pooled (right panel) for the primary endpoint of 'personalized treatment strategy' versus standard control to limit exposure of any agent to any patient not expected to derive benefit, while maximizing exposure to those that will (bracket). Since the total effect size is hypothesized to be large, particularly in the higher tiers of the algorithm (see Figure 7), fewer total patients are required for statistical endpoints.