The transcriptional regulation of normal and malignant blood cell development

Development of multicellular organisms requires the differential usage of our genetic information to change one cell fate into another. This process drives the appearance of different cell types that come together to form specialized tissues sustaining a healthy organism. In the last decade, by moving away from studying single genes toward a global view of gene expression control, a revolution has taken place in our understanding of how genes work together and how cells communicate to translate the information encoded in the genome into a body plan. The development of hematopoietic cells has long served as a paradigm of development in general. In this review, we highlight how transcription factors and chromatin components work together to shape the gene regulatory networks controlling gene expression in the hematopoietic system and to drive blood cell differentiation. In addition, we outline how this process goes astray in blood cancers. We also touch upon emerging concepts that place these processes firmly into their associated subnuclear structures adding another layer of the control of differential gene expression.


Introduction
Hematopoiesis is one of the best understood developmental pathways [1,2] and has extensively been studied in mice. The origin of blood cell development in the embryo is the mesodermal germ layer in the mammalian embryo, and hematopoietic specification occurs in two waves: The first wave takes place in the extraembryonic blood islands of the yolk sac and gives rise to primitive progenitor cells with mostly erythroid and myeloid potential [3,4]; the second wave gives rise to definitive hematopoietic stem cells (HSCs) and takes place at the ventral part of the dorsal aorta in the aorta-gonad-mesonephros (AGM) region of the embryo [5]. Cells emerging during the second wave migrate first to the fetal liver and later to the bone marrow. Here, they are maintained in a specialized niche and are largely quiescent, and if growing, either self-renew or enter differentiation to sustain mature blood cell production throughout lifetime. All HSCs are born from a specialist endothelial cell layer, the hemogenic endothelium (HE), which communicates with the dorsal mesenchyme. In response to signals, HE cells undergo a cellular shape transition, the endothelial-hematopoietic transition (EHT), forming intra-aortic clusters, which undergo several maturation steps before floating off into the bloodstream. Blood cell development therefore involves a carefully regulated cascade of gene expression changes that are regulated by molecular mechanisms linking genomic responses to a multitude of signals coming from the outside. It matters, where a cell has been and who it has talked to. In turn, it matters whether a cell is responsive to an outside signal, making development and differentiation an intricate, but highly robust balancing act that occurs in multiple cells at the same time. This review will summarize seminal studies, which uncovered the players involved in this balancing act and highlight the notion that signaling-responsive transcription regulation and chromatin dynamics are at the heart of the mechanisms maintaining and changing cellular identity. We will also highlight that perturbing any of these mechanisms leads to a disturbance of differentiation and, in some cases, to the development of malignant cells that have opted out of normal growth and differentiation control.

Transcription factors control blood cell development and differentiation
Blood cell lineage-specific gene expression is under the control of specific transcription factors. A large number of studies employing genetically modified mice showed that the absence of lineage specifically expressed factors leads to a perturbation of differentiation or a complete absence of the respective lineage. One of the first examples of a knockout removing an important TF was that of the erythroid-specifically expressed TF GATA1, which led to a complete absence of erythroid (and megakaryocytic) differentiation, while other lineages appeared to be unperturbed [6]. The underlying molecular mechanism of differentiation defects in the absence of a lineage-determining TF is the deregulation of genes carrying binding sites for this TF. Knockout experiments also highlighted the fact that TFs act in a hierarchical fashion. The elimination of earlier acting TFs such as TAL1 or RUNX1 affects HSC formation and thus the development of the entire hematopoietic system, whereas elimination of others such as PU.1 largely affects the myeloid and B-cell lineages. Recent studies using single-cell RNA-sequencing approaches have visualized the successive activation of specific developmental trajectories by performing 'nearest neighbor' analyses, which determine changes in the gene expression patterns of single cells and order them according to the direction of increased maturation. Such analyses highlight the different branches of the hematopoietic system and show that they deviate from each other earlier than previously thought [7,8]. Performing such studies with hematopoietic cells lacking a lineage-determining factor clearly showed the absence of specific branches in the trajectory [9]. Another hallmark of the hierarchical action of TFs is the finding that they are often only critically required at specific stages of development even if still expressed at other stages. Examples for this notion are again RUNX1 and TAL1, and the removal of their genes from the germ line strongly blocks HSC development, but when removing it conditionally after HSCs have formed, their maintenance is not affected [10][11][12].
However, such a clear-cut result is not seen with all TF knockouts and the reason for this behavior is the fact that TFs operate within large interacting protein assemblies as explained in further detail below. TFs have a modular structure with different domains that fulfill different functions and interact with different proteins. Recent studies showed that crippling TFs by removing individual domains can have unexpected effects that shed light on their actual function in gene regulation and point to an amazing robustness of protein complex formation driving gene expression. An example is again the transcription factor TAL1. Deletion of the whole factor abolishes HSC emergence completely, but deletion of the DNA-binding domain alone has a much milder phenotype and factor binding can be detected at a subset of genomic targets [13]. A different result was observed after removing the DNA-binding domain from the ubiquitously expressed TF SP1. This mutation affects all developmental pathways, and a germ-line mutation is an embryonic lethal [14]. However, in contrast to lineage-determining factors, removing it conditionally later in development had very little effect [15]. The explanation for this finding came with the analysis of the differentiation of mouse embryonic stem cells into blood precursors in vitro, which showed that the knockout still expressed a truncated protein and that the effect of the mutation on gene expression was cumulative. The full knockout of the Sp1 gene was incompatible with differentiation and so was the full deletion of the SP1 orthologue SP3 in a SP1 hypomorphic genetic background, indicating that the truncated version of SP1 needed SP3 to function. During the differentiation of cells expressing a truncated Sp1, bulk gene expression patterns of purified cells became more and more diverse. Single-cell RNA-Seq experiments and the analysis of differentiation trajectories of such cells showed why this was the case: Cells entered the correct gene expression trajectory, but seemed to do this at different time points, forming transcriptionally diverse cell populations. In essence, cells do not execute cell fate decisions as a cohort, meaning that robustness of differentiation was lost [15]. However, the system could only tolerate a certain level of deregulation: Once past the progenitor stage, differentiation crashed, and mutant cells were unable to form terminally differentiated blood cells [16]. A break-down of robustness of differentiation can also be seen when another crucial level of control of differentiation is disturbed: the expression of correct TF levels. Many crucial TFs show haploinsufficiency phenotypes when one genetic copy is deleted or expression levels are reduced by the mutation of important cisregulatory elements, with the kinetics of development being perturbed. This notion is true for GATA2 whose downregulation by the mutation of an essential enhancer [17] or by haploinsufficiency [18] causes various hematopoietic defects and predisposes to leukemia. The latter is also true for a cis-regulatory mutation of the gene encoding PU.1, SPI1 [19]. Last, but not least, RUNX1 needs to be expressed at carefully controlled levels to drive hematopoietic differentiation and specification [20,21].
Taken together, these studies show that the effects of the crippling of an essential TF or its gene on gene expression control have to be seen within the context of development being a dynamic and highly robust process. The system is composed of large interacting protein assemblies and partly redundant components, which compensate until they fall apart or are malfunctioning, meaning that the defect occurs way before phenotypic alterations can be seen. The current challenge is to identify the point when this occurs. This notion will become important when trying to interpret how mutant transcription factors set differentiating cells on the path to cancer.

Transcription factors collaborate and respond to signals
TFs come in families that bind to specific DNA-binding motifs within cis-regulatory elements such as enhancers and promoters, which are responsible for determining how a gene is regulated and when and in which cell type it is expressed. Each regulatory region contains multiple TF-binding motifs, which often are highly conserved depending on the nature of a gene and whether its function is conserved in evolution. A good example is the 'Heptad', a consortium of co-localizing transcription factors such as GATA2, TAL1, RUNX1, and FLI1 and the bridging factors LDB1/LMO2 that specify the cisregulatory elements of genes expressed in hematopoietic stem and progenitor cells [22,23]. The spatial arrangement of TF-binding sites is often not conserved [24], but there are exceptions with TFs that directly interact on DNA and whose binding is interdependent. Here, the spacing of binding motifs can be very precise, as, for example, seen with the pair AP-1/TEAD in the hemogenic endothelium [25], the pair RUNX1/ETS1 [26], or the GATA/E-Box motifs within the heptad [27].
As exemplified by SP1, ubiquitously expressed TFs cooperate with tissue-specific factors to set up differential gene expression patterns, and thus, the binding patterns of TFs are highly specific for each cell type [28].
Importantly, binding patterns are highly dynamic and can be maintained in self-renewing cells [29] or change during development [30,31]. In this context, it is noteworthy that transcription is not a uniform process but occurs in bursts that are regulated by the burst frequency, indicating that genes are in intricate contact with their environment [32,33]. The cell receives signals from various sources, which trigger developmental changes and are integrated within the genome by the action of inducible and signaling-responsive TFs. Many of these factors can be activated in all cells and include the AP-1 (JUN/FOS) factor families, which respond to MAP kinase signaling [ . As a result of the activation of such factors, enhancer elements can be activated de novo, or become more active, driving increased levels of gene expression. However, note that also noninducible TFs present can be regulated in their activity by signaling-dependent post-translational modifications such as phosphorylation with RUNX1 being a prominent example [40]. While we have a fairly good idea about what regulates the activity of inducible transcription factors, we know very little about how the different modes of signal transmission interplay with each other across the genome. This notion becomes important when different signals are being integrated at the genome level by regulating TF binding. For example, the abolition of AP-1 binding during hematopoietic specification by using a dominantnegative version of FOS led to a loss of binding of the Hippo signaling-responsive factor TEAD at the composite genomic sites described above [25]. Hippo signaling is activated by the onset of blood flow, which creates biomechanical forces stimulating Rho-GTPase signaling [41]. During T-cell activation, MAP kinase and Ca++ signaling are integrated by a cooperation of AP-1 and NFAT and at a specific subset of sites with compositebinding motifs one factor cannot bind without the other [42,43], thus ensuring that genes respond only when both signals are present. These few examples show that we are only now starting to obtain a glimpse of the principles and staggering complexity of how the multitude of signaling inputs that a cell encounters are integrated within the genome and shape a genomic response. repress neuronal genes in other tissues, being a prominent example [44]. Other factors can activate or repress depending on the genomic context, which determines whether they recruit co-activator or co-repressors (see below) or interfere with the activity of lineage-determining TFs that set up alternate gene expression patterns. Examples for the latter are the B-cell commitment factors PAX5 and GATA2. PAX5 is required to activate the expression of B-cell-specific genes, but at the same time represses the expression of myeloid genes [45,46] and thus finalizes commitment. GATA2 activates multiple hematopoietic genes but is required for the repression of cardiac genes [47]. PU.1 and GATA-1 form a similar antagonistic pair during erythropoiesis [48,49]. A very interesting example of how to turn an activator into a repressor during dynamic gene activation is provided by Mylona et al. [50], who showed that the type of response of the serum-responsive TF ELK1 depends on the timing of its post-translational modification. The protein contains multiple ERK kinase-dependent phosphorylation sites that are modified with different kinetics, fast, intermediate, and slow. After the fast sites are modified, co-activators and mediator are recruited and genes are activated, once the slower sites are modified, co-repressors are recruited and gene expression is switched off. Many other TFs contain multiple phosphorylation sites as well, making it highly likely that such dynamic behavior of factors responding to signaling is widespread and ensures that gene expression does not overshoot. An important feature of TF function is the fact that they can bind to genes encoding other TFs and form gene regulatory networks (GRNs), and this is also true for blood cells [31, 51,52]. In order to be able to construct such a network, it is necessary to identify regulatory relationships between TFs and their target genes. This aim can be achieved by inference, a strategy by which the expression of individual putative regulators is perturbed followed by determining which genes respond and whether they are upregulated or repressed [53]. A more direct way is to identify the actual TF-binding events using in vivo footprinting [54] or chromatin immunoprecipitation (ChIP) assays and then link binding sites to their associated genes. GRNs consist of nodes (TFs or TF families) that are interconnected (edges), all of which bind to non-TF genes that actually specify a cell type (Fig. 1). It is now clear that highly interconnected nodes are important for the maintenance of a specific cell type. Moreover, it matters how the network is structured and how the different components are wired. Factors binding and regulating their own and other TF-encoding genes can form recursively wired circuits, thus carefully controlling their expression levels and binding patterns, and are a hallmark of self-renewing hematopoietic stem and progenitor cells [29]. However, when differentiation is kicked off by a signal or by the upregulation of expression of a specific TF, connections are altered, meaning that TF-binding patterns are altered as well, and factors move to different locations [31]. A striking example of how the expression of one factor can drive the rewiring of an entire GRN is provided by experiments that expressed an inducible version of RUNX1 in mouse embryonic stem cells with a RUNX1 null genetic background. Differentiation of such cells to blood cells in vitro is blocked at the hemogenic endothelium stage. Induction of RUNX1 allows the differentiation of the hematopoietic progenitor stage and, importantly, leads to a genome-wide relocation of other factors, such as TAL1, LDB1, and FLI1 to different locations close to RUNX1-binding sites. Importantly, at least at the early stages of induction, relocation was reversible. Rewiring and RUNX1-dependent differentiation into blood progenitors require the chromatin reader BRD4 with the final complex recruiting mediator and CDk9 kinase to activate transcription [30,55]. The analogy of GRNs with differentially wired circuits and the availability of global binding and gene expression data have attracted the attention of computational biologists and mathematical modelers who strive to create models that could predict the behavior of GRNs in response to perturbation [56]. However, these efforts face formidable challenges, both experimentally and bioinformatically. For example, due to the signaling responsiveness of many TFs and their cofactors, their binding does not necessarily mean that a gene is expressed. So far, these models are therefore only capable of predicting simple subaspects of gene expression control, such as whether a gene is likely to be expressed or not [57]. Due to the multiple parameters feeding into the system, predicting gene expression patterns in a dynamic or even a developmental context is so far out of reach. However, such methodology is essential, if we want to predict the response of a GRN to changes in transcription factor binding as a result of DNA sequence changes, changes in the signaling environment, or in perturbation experiments such as drug treatment.

Transcription factors interact with a specific chromatin landscape
The most important feature of TFs is that they recognize specific DNA sequences and therefore are able to read the genetic code. However, within the eukaryotic nucleus they encounter a formidable obstacle to this process in the form of chromatin. Here, DNA is wrapped around nucleosomes, which are then packaged into higher-order structures of differential compaction, depending on whether the genes within these structures are active, potentially active, or stashed away in heterochromatin. In order for the genetic code to be accessed by TFs, chromatin needs to be remodeled and modified, which is achieved by different mechanisms. One mechanism is the opening of chromatin by pioneer factors, which are capable of binding to nucleosomal DNA and then cooperate with other factors to nucleate a transcription factor complex [58]. Other TFs interact with nucleosomes in different ways with most factors binding the nucleosomal linker regions [59]. All binding modes have in common that after a stable TF assembly is established, TFs recruit chromatin remodelers such as SWI/SNF complexes that use ATP to 'peel' DNA off the nucleosome and free up sequences for further binding [60]. A variant of the second mechanism is 'assisted loading' whereby an inducible factor binds, recruits chromatin remodelers that enable the binding of a second factor which cannot normally bind, and then leaves again, leaving stably remodeled and TF-bound chromatin behind [61,62]. During the assembly process, TF complexes recruit further cofactors such as histone acetyltransferases (HATs) that facilitate transcription by modifying the N-terminal tails of the surrounding nucleosomes and stabilize an open chromatin structure that is devoid of nucleosomes and exists as a nuclease hypersensitive site [63]. Given the importance of chromatin remodelers and modifiers in gene activation, it does not come as a surprise that these proteins are essential components of the regulatory machinery driving hematopoiesis [64][65][66].
The establishment of stable TF complexes and modified chromatin is not the only mechanism that is required to activate transcription. TF and cofactor complexes at the different enhancers and the promoter of a gene contact each other within nuclear space [67] and form large protein-DNA complexes on cis-regulatory elements that contain all the factors necessary to activate mRNA synthesis by RNA polymerase, and form a regulatory unit or chromatin hub [68]. The architecture of such units can be simple or complexdepending on the complexity of gene regulation during development and in different tissues [69,70]. The reason for such complexity is that during development, genes can be regulated by a relay of differentially/tissue-specifically active cis elements. A good example for this notion is the chicken lysozyme locus, which is expressed in the oviduct or in macrophages and uses different and shared tissue-specific elements and factors to drive different regulatory modes of gene expression [71]. Moreover, even genes that are expressed in every cell, that is, 'housekeeping genes', are regulated by a relay of different factors thus keeping chromatin open and ensuring their sustained activity [31]. The latter mechanism highlights several important concepts in gene regulation: An active, transcriptionally permissive chromatin structure has to be actively maintained. In the absence of activators, an inactive chromatin structure is established by repressing factors such as DNA methyltransferase and histone deacetylases, which methylate DNA and remove the acetylation mark from histones. Secondly, an active chromatin pattern that is nuclease accessible and carries active histone marks is cell-type specific. Finally, it is not the promoters, but the nonpromoter elements that contain the information of tissue-specific gene expression and mirrors tissue-specific gene expression patterns [72,73]. Each transcription cycle is regulated by the balance of activating and repressing factors responding to outside signals [74]. A large number of genes maintain their transcriptionally active structure throughout cell division. However, during mitosis, TF complexes are largely stripped off chromatin and the question arises how they reform. It is now clear that the parent set of modified histones are distributed to the two daughter strands. Modification patterns are therefore retained during mitosis and mark genes that are activated after mitosis [75]. It is also clear that certain TFs, such as FOXA1, are capable of binding to mitotic chromatin and form the basis of re-assembled TF complexes creating an active chromatin structure once the nuclear environment has been reformed [74,76]. This transcriptional memory is often dependent on signaling processes, as shown during the formation of T-cell memory: Stable TF binding allowing rapid reactivation of genes by a second stimulus is dependent on the constant reinforcement of factor binding by cytokine signaling, employing inducible TFs. The absence of cytokine signaling leads to a loss of an active and transcriptionally permissive chromatin structure [77]. A similar transcriptional memory is also established in macrophages after a first inflammatory stimulus [78]. In a developmental context, this interplay of inducible and constitutive factors establishing an early memory of a previously received signal, also referred to as priming [79], plays a decisive role in changing or maintaining cell identities, as exemplified by neuronal development of C. elegans. The developmental timing of regulation of the Lsy-6 miRNA locus is dictated by a NOTCH responsive an early enhancer. Those neuronal precursor cells receiving the signal upregulate the gene earlier as compared to those who did not with a strong impact on gene expression patterns. The result is a functional left-right asymmetry in otherwise morphologically symmetric neurons [80]. Developing blood cells are embedded in a sea of signals that have a profound impact on gene expression. One of the challenges in the next years will be to unravel the order of events of how genes are activated in hematopoietic development and how external signals such as soluble factors, mechanical forces, and spatial context regulate the ordered formation of HSCs, cells of the different hematopoietic lineages, and hematopoietic tissues such as the thymus and lymph nodes. Single-cell analyses of chromatin changes and expression patterns in developing cells together with spatial information will be crucial to answer these questions [9,81]. Such studies need to be combined with studies of surface molecule mapping [82] and the analysis of intracellular signaling processes using advanced imaging-a formidable task.
Gene regulatory processes take place in different parts of the nucleus Gene regulation cannot be viewed without taking into account where it takes place-in the nucleus (Fig. 2). In recent years, it has become clear that this organelle displays a highly organized structure, with genes occupying different compartments depending on their activity state, the nature of their neighbors, and whether they are transiently or permanently silenced [83][84][85]. The latter distinction is important, because transcription can be rapidly switched off with genes remaining in a poised state ready for further activation of repression, which is mediated by polycomb-repressive complexes (PRCs). PRC complexes come as two general types, PRC1 and PRC2. PRC2 contains the EZH1/2 methyltransferase, which deposits methyl groups on histone H3K27. H3K27me3 binds the PRC1 complex, which then ubiquitinates histone H2A at target promoters resulting in a block of transcriptional elongation by RNA polymerase II, with the nonelongating form of RNA polymerase still being associated at these sites [86,87]. PRC complexes at promoters interact with each other in nuclear space and form a longrange network of transcriptionally silent genes [88,89] that can intermingle with active genes [90] to rapidly switch from one state to another. In contrast, true heterochromatic regions such as centromeres, telomeres, repeat elements, and genes that are stably silenced display a highly compact chromatin structure, are not bound by RNA PolII, and are associated with the nuclear periphery and the nuclear lamina [91].
Another level of chromosomal organization of higher eukaryotes, which is associated with differential gene expression, are topologically associated domains (TADs) [92,93]. TADs partition chromosomes into regulatory domains inhibiting interactions between neighboring chromosomal regions. TADs are in average 100-200 kb in size, and their borders are bound by the CCCTC-binding factor (CTCF) [94]. TADs can contain both active and inactive genes displaying active and inactive chromatin features, whereby the cis-regulatory elements of active genes interact with each other inside, but not outside the TAD boundaries, forming distinct subcompartments. The presence of CTCF is essential for forming the TAD structure, and the presence of the boundary is important for the insulation of genes from neighboring TADs. However, while CTCF depletion abolishes TAD boundaries and insulation, the organization into active and inactive genes and their interactions are largely unaffected [95], which is in line with the observation that TAD structures are not tissue-specific and gene expression patterns are programmed by transcriptional and epigenetic regulators. The vast majority of all TF and cofactor interactions within gene regulatory elements take place within the TAD boundaries with both TFs and cofactors participating in mediating these contacts [96] bringing together large regions of DNA that are highly tissuespecific [97]. Such extruded DNA loops can be encircled and thus stabilized by the structural maintenance of chromosome (SMC) complex, which contains cohesin and condensin and uses ATP to reel in DNA [98]. It is likely that once formed, such structures are required for genes to be able to respond to outside signals with a burst of transcription without having to build up the entire 3D structure from scratch.
In the last few years, another feature within the nucleus has caught attention-that of nuclear speckles or membrane-less organelles (MLOs). Such structures can be formed by liquid-liquid phase separation (LLPS), which in biological systems is essentially a process that is based on an interaction of molecules that excludes water [99]. The nucleus contains a multitude of such structures [100]. The best known are nuclear speckles, which are the sites of splicing, and the nucleolus, which is the site of rRNA synthesis that originates from multiple repeats of rDNA genes. RNA itself is sufficient to nucleate the formation of a nucleolus, which is faithfully reformed after cell division [101]. Transcriptionally inactive heterochromatin containing the heterochromatin protein HP1 consists of another nuclear compartment at the nuclear periphery, which protects the genome from mechanical stress [102]. Proteins, such as HP1, are capable to form condensates by themselves [103] and a tell-tale sign of their ability to do so are domains of intrinsically disordered regions that appear to be devoid of structure but are essential for phase separation [104]. A large number of TFs, including those important for hematopoietic differentiation processes [105,106], contain such regions and are able to form large assemblies without having to be too selective and sprout-specific domains for every possible interaction [107]. Under physiological salt condition, unmodified chromatin undergoes phase separation in vitro or when injected into the nucleus, and this feature is modified by histone acetylation and protein binding [108]. A large number of factors contribute to a specific speckle type indicating that such structures play a global role in organizing nuclear processes [109]. Transcription is no exception. Microscopic analysis had shown many years ago that RNA polymerase II is organized in foci within the nucleus and appears to occur at fixed sites called 'transcription factories' [110]. More recently, the partition of transcriptional processes into separate assemblies was revived with the advent of global chromatin immunoprecipitation assays that uncovered that genes with complex regulatory regions (also termed 'superenhancers') form large, DNA-dependent molecular assemblies. It was suggested that these assemblies are able to undergo phase transition, thus forming regulatory entities with their own rules [111,112]. Moreover, it was also suggested that RNA polymerase II can shuttle between a transcription and a splicing compartment depending on its phosphorylation status [113]. However, while it is clear that such protein-DNA assemblies containing TFs and their cofactors form condensates in vitro and speckles in vivo, there is still some controversy whether DNA-dependent factor assembly represents true LLPS in living cells [114,115]. Nevertheless, it is now clear that compartmentalization is an essential part of regulatory processes within the nucleus, which drives the behavior of proteins in terms of their assembly kinetics and activity of enzymes. The challenge in the next years will be to precisely define the role of each compartment and which factors are involved in deciding how genes choose where to go to and are involved in driving compartmentalization.

The malignant state-differentiation going sideways
It is now clear that all of the mechanisms described in this review so far are important for normal development. Decades of research using knockout mice have shown that the machinery regulating differential gene expression is highly robust with a high inbuilt level of redundancy. However, they also showed that defects do not always manifest themselves immediately but can appear later in the life of an organism in the form of cancer, which is exemplified by certain types of blood cancers, occurring in families with inherited mutations in TF genes [116] that predispose patients to acute myeloid leukemia (AML). However, most cancer-causing mutations occur as somatic mutations in early hematopoietic precursor and stem cells. Recurrent mutations are seen in genes controlling gene regulation and epigenetic processes impacting cell fate decisions [117]. This involves genes encoding TFs (i.e., RUNX1 or C/EBPa), chromatin remodelers and modifiers (i.e., CHD4, CBP), polycomb family members (i.e., EZH2), DNA methyltransferases (i.e., DNMT3A) but also demethylases such as TET1/2. Moreover, we also find mutations in genes encoding signaling molecules controlling gene expression driving growth such as RAS, genes encoding architectural proteins such as CTCF, and genes encoding splicing factors. AML is mostly a disease of the elderly, with mutations in genes encoding transcriptional and epigenetic regulators occurring first, which are then followed by additional mutations in growth-promoting genes [118,119]. Such successive acquisition of mutations first generates progenitor cells with slightly impaired differentiation capacity, which manifests itself as clonal hematopoiesis where the normally tightly regulated balance of differentiation is disturbed. One particular progenitor clone expands and contributes excessively to blood cell development without causing any overt disease phenotype. However, the seed is then laid for secondary mutations, which then lead to a complete impediment of differentiation and excessive malignant growth. It is now clear that different driver mutations have a different impact on the differentiation trajectory and the epigenetic landscape. As a result of a defect in an important regulator of cell fate driving a normal developmental trajectory, malignant cells adopt new identities distinct from normal cells and differentiation goes 'sideways' [120,121] (Fig. 3). The question now arises, what is the nature of these new cellular identities and how are they maintained as compared to normal cells. Normal cellular differentiation processes have been shaped and perfected by evolution over millions of years, whereas malignant cell differentiation is a product of patient-specific clonal selection that occurs in a much smaller time frame: Being 'imperfect', the question arises of why are malignant cells so difficult to eradicate?
The answer to this question lays in the robustness and plasticity of the differentiation process, that is, life itself. Similar to normal cells, malignant cells are maintained by distinct GRNs that drive common and AML subtype-specific signaling and metabolic pathways. It should be noted that while cancers come in different forms and can arise from many tissues, the rewiring of normal GRNs into one that sustains a malignant phenotype is a hallmark of all of them. In AML, each mutation shapes the aberrant differentiation process in a different way, and even different mutations in a single TF-encoding gene such as RUNX1, which give rise to different aberrant version of RUNX1, can lead to completely different disease outcomes and cellular identities with distinct chromatin landscapes [122,123]. Moreover, the inducible expression of different RUNX1 oncoproteins causes an immediate reprogramming of their chromatin and TF-binding landscape, which is specific for each aberrant protein [123,124]. These data suggest that once different epigenetic landscapes have been set up after the first oncogenic hit, cells on their way to malignancy tweak their GRNs to compensate for the weakness of one differentiation process to activate another to maintain a stable state that is compatible with growth. We find the aberrant activation of genes encoding lineage-inappropriate TFs, which then become essential part of the network of abnormal but not normal cells [121,125,126]. We also find compensatory mechanisms whereby the mutation of one allele encoding a TF leads to shift in the GRN so that it now is dependent on the function of the wild-type allele [127]. Compensatory mechanisms and rewiring of signaling pathways are also common and tend to appear during therapy with the development of different subclonal populations. Examples for this phenomenon are the eradication of cells carrying a mutant FLT3 growth factor receptor after FLT3 inhibitor therapy and the appearance of RAS mutant cells either from preleukemic cells carrying the original driver mutation or from mutated leukemic cell escaping therapy [118,128]. A glimmer of hope comes from studies that profiled the chromatin landscape and gene expression of prospectively isolated subclonal population pairs from different patients carrying different founder mutations. Each subclonal population displayed a different chromatin accessibility pattern indicating that the acquisition of additional genetic changes led to the formation of different chromatin landscapes [129]. However, when different subclonal pairs from different patients were compared, the chromatin accessibility patterns of each pair still clustered in a patient-specific way, demonstrating that epigenetic landscapes cannot drift apart in a disorderly way, that is, the cells have still much in common. Identifying the nature of these commonalities together with the differences will be crucial for the identification of patientspecific therapies.

Perspectives
In this review, we have only been able to show a glimpse of the complexity of the gene regulatory mechanisms that are encoded in our genome and that drive cell differentiation and we face significant challenges in our understanding of the molecular basis of developmental processes. We have deliberately left out the RNA world, and we have not mentioned how proteinprotein interactions and metabolic processes impact on genome function and many other regulatory processes, many of which also play part in multiple pathologies. For our understanding of cancer as described above, it becomes clear that (a) each type of cancer has to be seen as a different entity with an entirely unique underlying biology, (b) that we need to understand this biology if we want to get away from therapeutic approaches that target unregulated growth only (chemotherapy) which in itself is genotoxic, and (c) that we need to start thinking how we can reprogram GRNs without touching normal cells. Note, that these statements are valid for a number of pathological processes. We need to directly target the gene regulatory machinery in a disease-specific way, and we need to block the compensatory escape routes that are used by cancer cells, be it the rewiring of signaling pathways or increasing genomic instability thus jumbling GRNs and speeding up evolution. With the development of drugs targeting TFs such as MYC [130], RUNX [131,132]), chromatin regulators (BET [133,134], MLL [135]), and repair mechanisms (ATMi [135][136][137], PARPi [138]), we are starting to develop the right tool box. However, what is clear is that neither our normal environment nor pathological processes can be understood without knowing the rules of gene regulation and cellular biology and the players dictating these rules. This review is a passionate appeal to keep studying how life operates in all its amazing complexity.