Journal list menu

Volume 400, Issue 1 p. 2-8
Hypothesis
Free Access

Vertebrate evolution by interspecific hybridisation – are we polyploid?

Jürg Spring

Corresponding Author

Jürg Spring

Institute of Zoology, University of Basel, Rheinsprung 9, CH-4051 Basel, Switzerland

Corresponding author. Fax: (41) (61) 267 34 57; E-mail: [email protected]Search for more papers by this author
First published: 18 November 1998
Citations: 206

Abstract

For the growing fraction of human genes with identified functions there are often homologues known from invertebrates such as Drosophila. A survey of well established gene families from aldolases to zinc finger transcription factors reveals that usually a single invertebrate gene corresponds to up to four equally related vertebrate genes on different chromosomes. This pattern was before widely noticed for the Hox gene clusters but appears to be more general. Genome quadruplication by two rounds of hybridisation is discussed as a simple biological mechanism that could have provided the necessary raw material for the success of vertebrate evolution.

1 Introduction

It has been widely publicised that the homeobox genes corresponding to the homeotic complex HOM-C from Drosophila occur in a cluster in invertebrates from cnidarians [1]and Caenorhabditis elegans [2]to amphioxus [3], while they are found as four so-called paralogous Hox clusters on four different chromosomes in higher vertebrates [4-9]. The two rounds of duplication of the Hox cluster probably occurred close to the origin of vertebrates [10]. In addition, unrelated genes that code for as functionally diverse proteins such as keratins, collagens or EGF-receptor-like tyrosine kinases are linked to the Hox clusters and are also duplicated [11]. A similar relationship has also been shown for the syndecan and myc gene families: a single invertebrate gene is found to be equally similar to the four vertebrate genes of its group which are linked each to a member of the other group on four different mouse chromosomes [12]. In fact, several extensive paralogous genomic regions containing gene families with various functions have been reviewed for mouse and man [13, 14]and Ohno [15]had already elaborated the theory of evolution by gene duplication by 1970. Polyploidy had been discussed there as one of several possibilities for vertebrate gene family complexities and appears to have become an acceptable working hypothesis [16]. The new ideas proposed here are that allopolyploidy by interspecific hybridisation would create more evolutionary potential than autopolyploidy and that an invertebrate gene and the corresponding multiple vertebrate members of gene families should be considered as a group. This allows one even to make predictions for the number and positions of homologues in the human genome from model organisms such as Drosophila or C. elegans. Of course, Hox genes are still special because of their transcription along the body axes according to the position of the genes within the cluster, which inspired the concept of the zootype [17]. However, the one to four relationship of invertebrate and vertebrate genes is not specific for Hox genes, but rather appears to be the normal case for well studied gene families.

2 Homologues, orthologues, paralogues and tetralogues

Homologous genes are all those that are derived from a common ancestor by duplication and divergence. Orthologues are equivalent genes of different species, e.g. human HOXA4 and murine HoxA4. Paralogues, in contrast, are homologous genes within one species. After tandem duplication such genes could be called cis-paralogues. However, this is only useful as long as they stay together such as in the case of HOXA4 and HOXA5. To distinguish trans-paralogues such as HOXA4, HOXB4, HOXC4 and HOXD4 from all other homologues and to make a connection to invertebrate orthologues such as Drosophila Dfd or amphioxus Amphihox4, I propose the term tetralogues. Tetralogous genes are groups of quadruplicated vertebrate genes at four different chromosomal localisations corresponding to a single invertebrate gene which are all more similar to each other than to members of other tetralogy groups (Fig. 1 ). The ubiquity of this one to four relationship of invertebrate and vertebrate gene subfamilies suggests two genome-wide tetraploidisation events as the source for tetralogues.

figure image
Typical representation of the relationship within gene families where a single invertebrate gene corresponds to four vertebrate genes on four different chromosomes.

3 How many tetralogues?

While for many gene families only three and not four tetralogues are presently known in vertebrates, closer inspection of the four Hox gene clusters revealed that in most Hox gene tetralogy groups only three members are really maintained in the human genome as well. Only 2 groups consist of all four genes, 8 out of 13 groups have three and 3 groups have only two genes left (Fig. 2 A). Also, corresponding linked genes coding for keratins, collagens or tyrosine kinases show a similar pattern [11]. A comparable analysis of the MHC class III region illustrates that also here an average of three vertebrate tetralogues and one invertebrate gene can be found for various gene families belonging to unrelated functional groups (Fig. 2B). The MHC class III region on human chromosome 6p21.3 is one of the best documented portions of the human genome that contains more than 30 genes located between the MHC class I and II clusters [18]. Much less is known about many of these genes than about the Hox clusters, and some gaps in this table might still be filled. However, the chance of finding three or four tetralogous genes or clusters of genes on different chromosomes is apparently no higher for regulatory genes like the Hox or myc genes than for many other gene families with a wide variety of functions.

figure image
(A) Hox gene organisation in Drosophila and on four tetralogous human chromosomes with EGF-receptor-like tyrosine kinases as examples for unrelated linked genes. Although four clusters of Hox genes persist in vertebrates, only three genes were maintained on average from each tetralogy group. (B) Tetralogous display of MHC class III genes, the region between MHC class I and II genes on human chromosome 6p21.3 with known vertebrate and invertebrate homologues. An average of three tetralogous genes in humans and a single orthologue from invertebrates can be found for the better studied genes from the MHC class III region. (a) Although many invertebrate members of the immunoglobulin family are known, a clear candidate corresponding to MHC class I, II or CD1 molecules is still missing. (b) VAV2 and a Vav homologue from C. elegans (GENBANK/EMBL U23520) were cloned only recently; a Drosophila homologue is still missing. (c) NOTCH2 was mapped to 1p13-p11, which could indicate a recent inversion; INT3 was also called ‘NOTCH3’ and mapped to a contig with PBX2 and TNXB1 (tenascin-X) which is equally related to HBX (tenascin-C; TNC) as to TNR (tenascin-R) [29]. (d) An invertebrate homologue of C3, C4 and C5 could eventually be recognised in the course of the C. elegans sequencing project, but it might be difficult to recognise invertebrate members of the TNF family (e) as already the known vertebrate members have very little sequence similarity; tumour necrosis factor a (TNFA) is only 30% identical to the CD27 ligand (CD70) and the Fas ligand (APT1LG1) and OX40 ligand (TXGP1) are less than 20% identical. The other gene symbols are related to the common gene names and additional information is available on the World Wide Web in the genome data bases FLYBASE, MGD, GDB or OMIM; sequences were from SWISSPROT or translated from GENBANK/EMBL and analysed with BLAST, FASTA and PILEUP in GCG.

4 Tetralogues on all human chromosomes

Representatives of well studied gene families where one invertebrate gene is equally similar to three or four vertebrate tetralogues can be now found on all 23 human chromosomes (Table 1 ). For all these examples, linked genes that belong to independent tetralogy groups themselves are listed. When more than four related members of a gene family are known in a vertebrate, I found that they can be subdivided according to their sequence similarities, gene structures or chromosomal localisations into subgroups of up to four per corresponding invertebrate gene or into clusters of tandemly repeated genes. As an example, the recent cloning of a novel src-related gene from Drosophila [19]helped to divide this family with eight closely related human members into tetrapacks. The new Drosophila gene Src41A (Dsrc41) is most similar to the human subgroup with SRC, YES1, FGR and FYN. The previously known candidates for Drosophila src genes are Src64B, which might correspond to the human subfamily with LCK, LYN, HCK and BLK, while Src29A is clearly more similar to the group with Bruton's tyrosine kinase BTK which are only distantly related members of the non-receptor tyrosine kinases. 53 examples of tetralogy groups are listed in Table 1 and in a growing database on the World Wide Web, also including some interesting examples where the relationships could yet not be resolved completely or where homologous sequences are not yet available for Drosophila, such as for the myc, insulin or fibroblast growth factor families.

Table Table 1. Gene families with multiple human tetralogues for each Drosophila orthologue
Tetralogy groups D:H Drosophila Human (mouse) Tetralogous neighbours
Abl (non-receptor tyrosine kinases) 1:2 Abl ABL1 9q34.1 PBX3 (homeobox transcription factors)
ABL2 1q24-q25 PBX1
Aldolase (glycolysis enzymes) 1:3 Ald ALDOA 16q22.2 HSD17B2 (hydroxysteroid dehydrogenases)
ALDOB 9q22.3-q31 HSD17B3
ALDOC 17cen-q12 HSD17B1
Alzheimer β-amyloid (cell surface protease inhibitors) 1:3 Appl APP 21q21.2 ETS2 (Ets domain transcription factors)
APLP1 19q13.1
APLP2 11q23-q25 ETS1
Ankyrin (membrane skeleton proteins) 1:3 Ank ANK1 8p12-p11.2
ANK2 4q25-q27 NFKB1 (Ig-fold transcription factors)
ANK3 10q21 NFKB2
BMP/dpp (TGFb-like growth factors) 1:2 dpp BMP2 20p12 CHGB (secretogranins)
BMP4 14 CHGA
BMP/60A (TGFb-like growth factors) 1:4 Tgfbeta-60A BMP5 6(q12-q13)
BMP6 6(p23-p22) ID4 (inhibitory HLH factors)
BMP7 20 ID1
BMP8 ?
Bruton's tyrosine kinase (non-receptor tyrosine kinases) 1:3 Src29A BTK Xq21.33-q22 CDX4 (homeobox transcription factors)
ITK 5q31-q32 CDX1
TEC/TXK 4p12
Cadherin (cell adhesion molecules) 1:3 Dec CDH1/3/14 16q22.1 MT3 (metallothioneines)
CDH2 18q12.1 MTL3
CDH12 5p13-p14
Calmodulin (calcium-binding regulators) 1:3 Cam CALM1 14q32 CKB (creatine kinases)
CALM2 2p21
CALM3 19q13.3 CKM
Caudal (homeobox transcription factors) 1:3 cad CDX1 5q31-q33 ITK (non-receptor tyrosine kinases)
CDX3 13q12.3
CDX4 Xq13.2 BTK
Collagen type IV (network-forming collagens) 1:3 Cg25C/viking COL4A1/2 13q34
COL4A3/4 2q35-q37 GPC1 (PI-linked proteoglycans)
COL4A5/6 Xq22 GPC3
Cathepsin (cysteine proteases) 1:3 CysP-1 CTSL 9q22.1-q22.2 NTRK2 (receptor tyrosine kinases)
CTSS/K 1q21 NTRK1
CTSH 15q24-q25 NTRK3
Dlx (homeobox transcription factors) 1:3 dll DLX1/2 2q32 EN1 (homeobox transcription factors)
DLX4 ?
DLX5/6 7q22 EN2
E2A (bHLH transcription factors) 1:3 da TCF3 19p13.3 INSR (receptor tyrosine kinases)
TCF4 ?
TCF12 15q21 IGF1R
E2F (Rb-binding transcription factors) 1:3 E2f E2F2 1p36 ID3 (inhibitory HLH factors)
E2F3 6p22 ID4
E2F4 16q21-q22
EGF (epidermal growth factors) 2:6 spi EGF 4q25 FGF2 (fibroblast growth factors)
grk TGFA 2p13
HGL 8p21-p12
AREG/BTC 4q13-q21 FGF5
DTR 5q23 FGF1
TDGF1 3p21.3-p21.1
EGF receptor (receptor tyrosine kinases) 1:4 Egfr EGFR 7p12 HOXA@ (homeobox transcription factors)
ERBB2 17q11.2-q12 HOXB@
ERBB3 12q13 HOXC@
ERBB4 2q34 HOXD@
Egr/Krox-20 (zinc finger transcription factors) 1:4 sr EGR1 5q23-31
EGR2 10q21.1 PLAU (plasminogen activators)
EGR3 8p23-p21 PLAT
EGR4 2p13
Engrailed (homeobox transcription factors) 1:2 en/inv EN1 2q13-q21 IHH (secreted signalling factors)
EN2 7q36 SHH
Emx (homeobox transcription factors) 1:2 ems EMX1 2p14-p13 REL (Ig-fold transcription factors)
EMX2 10q26.1 NFKB2
Even skipped (homeobox transcription factors) 1:2 eve EVX1 7p15-p14 HOXA@ (homeobox transcription factors)
EVX2 2q34.3-q31 HOXD@
Ezrin (peripheral cytoskeletal proteins) 1:3 Moe VIL2 6q22-q27 ESR (steroid hormone receptors)
RDX 11q23 PGR
MSN Xq11.2-q12 AR
FGF receptor (receptor tyrosine kinases) 2:5 Fr1 FGFR1 8p12 EGR3 (zinc finger transcription factors)
btl FGFR2 10q25.3-q26 EGR2
FGFR3 4p16.3
FGFR4 5q33-qter EGR1
FGFR6 ?
Gli (glioblastoma family zinc fingers) 1:3 ci GLI 12q13 HOXC@ (homeobox transcription factors)
GLI2 2 HOXD@
GLI3 7p13 HOXA@
Glypican (PI-linked proteoglycans) 1:4 dally GPC1 2q35-q37 COL4A3/4 (network-forming collagens)
GPC2 ?
GPC3 Xq26 COL4A5/6
GPC4 ?
Hedgehog (secreted signalling factors) 1:3 hh SHH 7q36 COL1A2 (major fibril-forming collagens)
DHH (12q13) COL2A1
IHH 2(q35-q36) COL3A1
Hox gene cluster (homeobox transcription factors) 1:4 ‘HOM-C’ HOXA@ 7p15-p14 EGFR (receptor tyrosine kinases)
HOXB@ 17q21-q22 ERBB2
HOXC@ 12q12-q13 ERBB3
HOXD@ 2q31 ERBB4
Id (inhibitory HLH factors) 1:4 emc ID1 20q11 SDC4 (cell surface proteoglycans)
ID2 2p25 SDC1
ID3 1p36.13-p36.1 SDC3
ID4 6p22-p21.3
Insulin receptor (receptor tyrosine kinases) 1:3 InR INSR 19p13.3 MEF2B (MADS box enhancer factors)
INSRR 1 MEF2D
IGF1R 15q25-qter MEF2A
Integrin α-chain PS2 group (extracellular matrix receptors) 1:3 if ITGA2B 17q21.32 HOXB@ (homeobox transcription factors)
ITGA5/7 12q11-q13 HOXC@
ITGA4/V 2q31-q32 HOXD@
Integrin β-chain (extracellular matrix receptors) 2:6 mys ITGB3/4 17q11-qter HOXB@ (homeobox transcription factors)
betaIntn ITGB6 2 HOXD@
ITGB7 12q13.1 HOXC@
ITGB1 10p11.2
ITGB2 21q22.3
ITGB5/8 ?
Jak (non-receptor tyrosine kinases) 1:4 hop JAK1 1p32.3-p31.3 JUN (bZIP transcription factors)
JAK2 9p24
JAK3 ?
TYK2 19p13.2 JUNB/D
Laminin α-chain (extracellular matrix proteins) 1:3 LanA LAMA1 18p11.31 YES1 (non-receptor tyrosine kinases)
LAMA2/4 6q21-23 FYN
LAMA3 18q11.2
Laminin β-chain (extracellular matrix proteins) 1:3 LanB1 LAMB1 7q22 BRAF (serine/threonine kinases)
LAMB2 3p21.3-p21.2 RAF1
LAMB3 1q32
Mef2 (MADS box enhancing factors) 1:4 Mef2 MEF2A 15q26 IGF1R (receptor tyrosine kinases)
MEF2B 19p12 INSR
MEF2C 5q14
MEF2D 1q12-q23 INSRR
MyoD (bHLH transcription factors) 1:3 nau MYOD1 11p15.1 INS/IGF2 (insulin-like growth factors)
MYOG 1q31-q41
MYF5/6 12q21 IGF1
Myosin heavy chain (smooth/non-muscle myosins) 1:3 zip MYH9 22q12.3-q13.1 PRKM1 (MAP kinases)
MYH10 17p13
MYH11 16p13.1 PRKM3
NFkB/Rel/dorsal (Ig-fold transcription factors) 2:5 dl NFKB1 4q24 FGF2 (fibroblast growth factors)
Dif NFKB2 10q24 FGF8
REL 2p13-p12
RELA 11q13 FGF3/4
RELB ?
NOS (nitric oxide synthases) 1:3 Nos NOS1 12q24 COL2A1 (major fibril-forming collagens)
NOS2A/B/C 17q11-q12 COL1A1
NOS3 7q35-q36 COL1A2
Notch (cell-cell interaction receptors) 1:4 N NOTCH1 9q34.3 COL5A1 (minor fibril-forming collagens)
NOTCH2 1p13-p11 COL11A1
NOTCH3 19p13.2-p13.1
INT3 6p21.3 COL11A2
Otx (homeobox transcription factors) 1:2 oc OTX1 2p13 CALM2 (calcium-binding regulators)
OTX2 14q21-q22 CALM1
Pbx (homeobox transcription factors) 1:3 exd PBX1 1q23 RXRG (nuclear receptors)
PBX2 6p21.3 RXRB
PBX3 9q33-q34 RXRA
Raf (serine/threonine kinases) 1:3 phl RAF1 3p25 IL5RA (interleukin receptors)
ARAF1 Xp11.3-p11.23 IL3RA
BRAF 7q34
Ral (GTP-binding oncogenes) 1:2 Rala RALA 7p HOXA@ (homeobox transcription factors)
RALB 2cen-q13 HOXD@
Ras (GTP-binding oncogenes) 1:3 Ras85D HRAS 11p15.5 BDNF (nerve growth factors)
KRAS2 12p12.1 NTF3
NRAS 1p13 NGFB
Retinoblastoma (tumour suppressors) 1:3 Rbf RB1 13q14.3
RBL1 20q11.2 MMP9 (gelatinases)
RBL2 16q12.2 MMP2
Retinoic acid receptor type X (nuclear receptors) 1:3 usp RXRA 9q34 PBX3 (homeobox transcription factors)
RXRB 6p21.3 PBX2
RXRG 1q22-q23 PBX1
Src (non-receptor tyrosine kinases) 1:4 ‘Src41A’ SRC 20q11.2 COL9A3 (type IX collagens)
YES1 18p11.31-p11.22
FGR 1p36.2-p36.1 COL9A2
FYN 6q21 COL9A1
Src-related (non-receptor tyrosine kinases) 1:4 Src64B LCK 1p35-p34.3 SDC3 (cell surface proteoglycans)
LYN 8q13 SDC2
HCK 20q11-q12 SDC4
BLK 8p23-p22
Stat (signal transducers and activators) 1:3 mrl STAT1/4 (2q12-q33) HOXD@ (homeobox transcription factors)
STAT2/6 (12q13-q14.1) HOXC@
STAT3/5A/B (17q11-q22) HOXB@
Syndecan (cell surface proteoglycans) 1:4 Syd SDC1 2p(24-p23) MYCN (bHLH transcription factors)
SDC2 8q22-q23 MYC
SDC3 (1p36-p32) MYCL1
SDC4 20q12-q13
Tenascin (extracellular matrix proteins) 1:3 Ten-m HXB 9q32-q34 PBX3 (homeobox transcription factors)
TNXB1 6p21.3 PBX2
TNR 1q25-q31 PBX1
Wnt (wingless/int-1 signalling factors) 1:3 wg WNT1 12q13 COL2A1 (major fibril-forming collagens)
WNT2 7q31 COL1A2
WNT3 17q21-q22 COL1A1

53 representative gene families are listed that include all 22 human autosomes and the X chromosome and a wide variety of functions. Only one example of linked tetralogous genes is shown per group. Most tetralogy groups are subfamilies of larger gene families. Additional members belong to an independent tetralogy group if duplication occurred before the divergence of the lineages leading to Drosophila and man such as in the case of Src41A [19], Src64B and Src29A. For the ratio D:H the number of Drosophila and human gene clusters rather than individual genes was used. Lineage specific tandem duplications appear to be common in vertebrates while en/inv is the only example in Drosophila listed here. Gene families with a ratio of 2:5 or 2:6 could not yet be resolved into tetralogy groups. Localisations shown in parentheses were predicted from mapping data in the mouse. Data were collected and analysed as described in Fig. 2, especially from FLYBASE and the human genome data base GDB. Additional information can be found in TetraBase, a continuously upgraded data base at the URL: http://www.unibas.ch/dib/zoologie/research/spring.html.

5 Genome quadruplication through hybridisation

The pattern of up to four vertebrate tetralogues for each invertebrate gene could provide us with new clues about the evolution of vertebrates and genomes in general. Many aspects of genome duplication had been discussed extensively [15]and even probabilities for finding existing patterns had been calculated [11, 13]. However, the distribution of all these gene families in groups of two, three and apparently maximally four could simply suggest that all genes were first duplicated to the four-fold stage. Considering not only the statistics but also the biology of this problem, a single but simple mechanism that worked in many plants and invertebrates, and even in vertebrates like Xenopus, could explain the observed picture: allotetraploidisation. Two such rounds of interspecific hybridisations with the concomitant genome duplications of amphioxus-like animals could have created primitive vertebrates close to the Cambrian explosion 530 million years ago (Fig. 3 ). Hybridisation is not an efficient mode of evolution in higher vertebrates. It was therefore often generalised that in contrast to plants, hybridisation is not important for animals. However, in many invertebrates and even lower vertebrates such as fish and amphibians hybridisations are widespread. Immediately after speciation, hybridisation leading to allopolyploidy is not much different from autopolyploidisation and probably has few advantages, since one of the gene copies would continue to function while the other should accumulate mutations and disappear quickly [15]. Hybridisation in modern, highly adapted species has probably few advantages too and became rare in animals, possibly also due to the involvement of behaviour in species specific fertilisation mechanisms. Exceptions like Xenopus, salmon, trout or goldfish show that vertebrates can still undergo further polyploidisation, but additional constraints such as the increasing chromosome number might then become limiting. There could have been a very narrow hybridisation window when allopolyploidy really permitted evolutionary jumps through the combination of advantageous traits that had evolved previously in separate lineages. Candidates that resemble putative amphioxus-like founder species already lived in the Cambrian, for example Pikaia gracilens and Yunnanozoon lividum [20]. Modern hagfish and lampreys could be descendants of the proposed intermediate allotetraploids. As hagfish are so different from lampreys and all other, extinct jawless fish [21], they could be independently derived allotetraploids AB and AC or even CD (cf. Fig. 3). Alternatively, they might also be allohexaploids ABC or ABD, i.e. hybrids between an allotetraploid AB and a diploid C or D.

figure image
A phylogenetic view of quadruplicated genome parts in vertebrate evolution. Hybridisation events are indicated by stippled lines connecting the involved lineages. Other scenarios can be imagined such as the formation of an allooctoploid ABA′B′ from two diverged allotetraploids AB and A′B′, respectively. Amphioxus is a good candidate for a direct descendant of a diploid ancestor. The jawless hagfish and lampreys might be allotetraploids while jawed vertebrates from fish to man would be allooctoploids. Around the so-called Cambrian explosion, 530 million years ago, hybridisation might have been common in little diverged ancestors of vertebrates. But immediately after speciation, hybridisation leading to allopolyploidy is not much different from autopolyploidisation and has few advantages. With increasing differentiation, the chance of diverged species producing successful hybrids is declining. Allopolyploidisation of closely related modern species or autopolyploidisation might still be possible but should have little evolutionary impact; tetraploid Xenopus still look like diploid or octoploid Xenopus. During a narrow hybridisation window allopolyploidy of rather primitive animals could have been more advantageous: allotetraploid lineages evolved and gave rise to an allooctoploid combining in a short period of time the advantages from previously separated lineages.

6 Partial redundancy of allooctoploids

If genome duplication was the result of hybridisation of rather different species, by allotetraploidisation, the faster evolving genes would already be quite different at the time of hybridisation and thus could serve as an only partially redundant pool for further divergent evolution of gene families. According to this idea, highly conserved genes are more likely to be perfectly redundant at the time of such hybridisation and are therefore more likely to be reduced to a single copy than rapidly diverging genes. Regulatory regions of genes can mutate even faster and with less constraints than coding regions and can thus lead to at least partial tissue specificity of expression of functionally still redundant genes. We have, for example, three calmodulin genes on three different chromosomes coding for identical proteins [22]. Could their survival be due to differences in their regulatory sequences, as suggested for three otherwise redundant paired-box containing genes in Drosophila [23]? Similarly, the homeobox gene En-2 can rescue En-1 knock-out mice when the En-2 coding sequence is brought under the control of the regulatory sequences of its tetralogue En-1 [24]. The close relationship, not only of the coding sequences, but also of the regulatory sequences of tetralogous genes could also help to explain why so many of the knock-out mice have much milder phenotypes than expected from the expression patterns of the individually investigated genes. Therefore, tetralogues should be investigated simultaneously whenever possible.

7 Concluding remarks

Random gene, chromosome or genome duplications would be expected to result in complicated patterns of genome complexities. The simple one to four relationship observed for many invertebrate and vertebrate genes, developmental control genes as well as household enzymes or structural proteins, argues for unspecific quadruplication. A set of roughly 10 000 primitive metazoan genes is only slightly varied by tandem duplications or deletions within invertebrate genomes from worms to amphioxus; e.g. C. elegans has fewer genes than amphioxus in the Hox cluster and probably also in the whole genome. This set of primitive metazoan genes is represented up to four times on different vertebrate chromosomes or chromosomal regions, often with additional gene copies due to higher numbers of tandem duplications in vertebrates. More than 100 chromosomal rearrangements have visibly scrambled the genomes of the mouse and man since the divergence of their lineages about 70 million years ago [25]. In vertebrate evolution, this rate of rearrangements could still have left some genes next to each other purely by chance, without any functional implications. Conservation of gene linkage in Drosophila or C. elegans and vertebrates, however, could indeed point towards functional constraints [11]. Analysis of the genome of amphioxus, or even more conveniently an urochordate with a smaller genome, might combine the advantages of close relationship to vertebrates and a four-fold reduction of complexity as compared to vertebrate genomes. Similarly, the pufferfish (fugu) was chosen as a model vertebrate simply based on its small genome size of only 400 Mb [26], which is just four times the size of the C. elegans genome. Comparison of characteristic regions of model genomes from urochordates or amphioxus, jawless fish and vertebrates from pufferfish to mouse and man could further clarify the phylogeny of tetralogous genome parts and the time points of duplications [27]. Changes in genome complexities are also associated with other major evolutionary transitions such as from prokaryotes to eukaryotes or from protozoa to metazoa [28], which, therefore, should be compared to the transition from invertebrates to vertebrates. Short-term benefits of the recognition of the four-fold complexity of vertebrate genomes might include a unified and phylogenetic nomenclature for invertebrate and vertebrate gene families and immediate help in sorting our roughly 80 000 genes into 4×20 000 groups on the quadruplicated parts of the human genome.

Acknowledgements

I would like to thank V. Schmid, P. Flook and M. Šuša for constructive comments. Additional information and the hundreds of references for gene localisations, sequences and methods may be obtained on the World Wide Web or from the author.