Tracing the evolution of fatty acid‐binding proteins (FABPs) in organisms with a heterogeneous fat distribution

The distribution of fat among both invertebrate and vertebrate groups is heterogeneous. Studies have shown that fatty acid‐binding proteins (FABPs), which mainly bind and transport fatty acids, play important roles in the regulation of fat storage and distribution. However, the systematic and genome‐wide investigation of FABP genes in organisms with a heterogeneous fat distribution remains in its infancy. The availability of the complete genomes of Caenorhabditis elegans, Callorhinchus milii, and other organisms with a heterogeneous fat distribution allowed us to systematically investigate the gene structure and phylogeny of FABP genes across a wide range of phyla. In this study, we analyzed the number, structure, chromosomal location, and phylogeny of FABP genes in 18 organisms from C. elegans to Homo sapiens. A total of 12 types of FABP genes were identified in the 18 species, and no single organism exhibited all 12 fatty acid‐binding genes (FABPs). The absence of a specific FABP gene in tissue may be related to the absence of fat storage in the corresponding tissue. The genomic loci of the FABP genes were diverse, and their gene structures varied. The results of the phylogenetic analysis and the observation of conserved gene synthesis of FABP family genes/proteins suggest that all FABP genes may have evolved from a common ancestor through tandem duplication. This study not only lays a strong theoretical foundation for the study of fat deposition in different organisms, but also provides a new perspective regarding metabolic disease prevention and control and the improvement of agricultural product quality.

The distribution of fat among both invertebrate and vertebrate groups is heterogeneous. Studies have shown that fatty acid-binding proteins (FABPs), which mainly bind and transport fatty acids, play important roles in the regulation of fat storage and distribution. However, the systematic and genome-wide investigation of FABP genes in organisms with a heterogeneous fat distribution remains in its infancy. The availability of the complete genomes of Caenorhabditis elegans, Callorhinchus milii, and other organisms with a heterogeneous fat distribution allowed us to systematically investigate the gene structure and phylogeny of FABP genes across a wide range of phyla. In this study, we analyzed the number, structure, chromosomal location, and phylogeny of FABP genes in 18 organisms from C. elegans to Homo sapiens. A total of 12 types of FABP genes were identified in the 18 species, and no single organism exhibited all 12 fatty acid-binding genes (FABPs). The absence of a specific FABP gene in tissue may be related to the absence of fat storage in the corresponding tissue. The genomic loci of the FABP genes were diverse, and their gene structures varied. The results of the phylogenetic analysis and the observation of conserved gene synthesis of FABP family genes/proteins suggest that all FABP genes may have evolved from a common ancestor through tandem duplication. This study not only lays a strong theoretical foundation for the study of fat deposition in different organisms, but also provides a new perspective regarding metabolic disease prevention and control and the improvement of agricultural product quality.
From Caenorhabditis elegans to Homo sapiens, all animal species have found a way to store excess energy in the form of fat for future needs. Worms (C. elegans) store fat in the intestine [1,2], and sharks store fat in the liver [3,4]. However, in most species, fat is stored mainly in white adipose tissue (WAT) to provide energy during periods when energy demands exceed caloric intake [5]. The location of WAT varies in different species. For example, the most efficient site for WAT is the intra-abdominal region, as observed in most amphibians and reptiles; in nearly all mammals except for pinnipeds (seals) and certain small cetaceans (whales and dolphins) and in many birds, adipose tissue is partitioned into a dozen or more discrete depots that are widely distributed around the body [6]; fish mainly store fat in the liver, muscle, and mesentery; and in the platypus, fat is stored in the tail [7]. Fat distribution plays an important role in the risk of developing metabolic syndrome. Increased intra-abdominal/visceral fat promotes a high risk of obesity, diabetes, and other metabolic diseases, whereas increased subcutaneous fat in the thighs and hips is responsible for little or no risk [2]. Notably, adipose tissues present in different parts of poultry and livestock play an important role in product quality [8].
Many studies have shown that the storage and distribution of fat are related to age, sex, hormones, fat synthesis, fat decomposition transport, and other factors [9][10][11][12]. Among these factors, fatty acid-binding proteins (FABPs) are critical mediators of fat storage and distribution [13][14][15][16] that bind fatty acids and other lipid ligands. Epidermal FABPs (EFABPs) are differentially expressed between human omental and subcutaneous adipose tissue [11], and elevated levels of adipocyte FABPs (AFABPs) have been found in pericardial fat tissue and are associated with cardiac dysfunction in obese people [17]. AFABP and HFABP (muscle and heart FABP) have been considered candidate genes for pig fatness traits. AFABP is involved in the regulation of intramuscular fat accretion in mammals such as Duroc pigs, mice, chickens, and rabbits [18][19][20].
Fatty acid-binding proteins are a family of small cytosolic proteins that bind hydrophobic ligands (mainly fatty acids) noncovalently. They are 14-to 15-kDa proteins of 126-134 amino acids and are named after the first tissue from which they are isolated or identified [12]. There are 12 known FABPs in vertebrate and invertebrate animals. In mammals, nine different FABPs with a tissue-specific distribution have been identified: FABP1 (LFABP, liver), FABP2 (IFABP, intestinal), FABP3 (HFABP, muscle and heart), FABP4 (AFABP, adipocyte), FABP5 (EFABP, epidermal), FABP6 (IlFABP, ileal), FABP7 (BFABP, brain), FABP8 (MFABP, myelin), and FABP9 (TFABP, testis) [21]. In teleost fish, two members of the FABP family, FABP10 (Lb-FABP, liver basic FABP) and FABP11, have been discovered [22]. In addition, FABP12 has been identified in humans, rats, and mice [23]. Despite the variable sequence identity of these proteins, whose amino acid sequence similarity ranges from 20% to 70%, all FABPs share a conserved tertiary structure composed of 10 antiparallel beta-sheets and two alpha-helixes. The orthogonal bsheets wrap around a large solvent-accessible ligand-binding cavity centered at one end of the barrel, where the helix-turn-helix motif is proposed to act as a 'portal' to allow ligand entry [24].
Evolutionary studies have shown that FABPs evolved via successive gene duplications, generating a large number of tissue-specific homologs [24][25][26]. However, the systematic investigation of FABP genes in organisms with a heterogeneous fat distribution in which fat is located in various organs remains in its infancy. The number, structure, sequence identity, and phylogeny of FABP genes in these organisms are unclear. Thus, based on the complete genomes of C. elegans, Callorhinchus milii [27], and other organisms, we systematically analyzed FABP gene organization in 18 species, including organisms with a heterogeneous fat distribution and important domestic economic species (Table 1). We found that although 12 FABP genes have been identified in the selected species, not all types of fatty acid-binding genes (FABPs) are found in certain organisms. In addition to the 'canonical' fabp gene structure including four exons and three introns, some atypical protein-coding genes exist, including seven exon-six intron, five exonfour intron, and three exon-two intron genes. Alternative splicing, a common phenomenon in humans and other species, greatly increases the diversity of FABP proteins. Based on the phylogenetic analysis and gene synthesis of FABP family genes/proteins, we verified that all FABP genes could have evolved from an ancestral FABP gene through gene duplication. In this study, the differences in fat deposition sites between different species were considered in terms of the transport of fatty acids. This work provides a new research hypothesis and direction for the study of ectopic fat deposition in humans and domestic economic species.

Results and Discussion
FABP family gene identification from C. elegans to Homo sapiens Fatty acid-binding proteins are members of the superfamily of lipid-binding proteins (LBPs). The primary role of all FABP family members is the regulation of fatty acid uptake and intracellular transport. Thus, we wondered how many fabp genes exist in species with different adipose tissue depots to regulate fat transport and storage. We found that the presence or absence of 12 FABP genes (FABP1-FABP12) varies from C. elegans to H. sapiens (Table 1). Among these genes,   FABP4, FABP5, and FABP8 have been identified in humans, mice, birds, amphibians, and reptiles but not in fish. FABP9 and FABP12 are found only in mammals. In contrast to FABP10 in chickens, amphibians, reptiles, and fish, FABP11 has only been identified in teleosts. On the other hand, some species exhibit limited copies of FABPs such as fbpb1, fabp2, fabp5, fabp6, fabp7, and fabp8. For example, four FABPs (FABP1, FABP2, FABP5, and FABP8) are found in the genome of the platypus (Ornithorhynchus anatinus). According to Ensembl, live-like, intestinal-type, brainlike, and myelin P2 protein-like fish and invertebrates' fatty acid-binding genes (fabps) exist in C. intestinalis.
In Drosophila melanogaster, only one fabp gene can be found. In C. elegans, there are nine lbp genes, lbp-1 to lbp-9. lbp-1, lbp-2, lbp-3, and lbp-4 are exclusive to C. elegans, and their products localize to the extracellular region. These four genes are not orthologs of FABP1, FABP2, FABP3, and FABP4 in mammals. Based on the HomoloGene annotation of the NCBI (https://www.ncbi.nlm.nih.gov/homologene), lbp-5, lbp-6, lbp-7, and lbp-8 are orthologs of FABP4 in mammals. lbp-9 is an ortholog of FABP8 in humans and FABP11a/11b in fish. With the updated shark (C. milii) genome, we identified five fabps (fabp1, fabp2, fabp3, fabp7, and fabp10) in cartilaginous fish. fabp1, fabp2, fabp3, fabp6, fabp7, fabp10, and fabp11 have been identified in the genomes of teleost fish. More gene copies have appeared in teleost fish, followed by divergence. For example, in zebrafish, there is only one copy of fabp2, fabp3, and fabp6, while there are three copies of the fabp1 gene (fabp1a, fabp1b.1, and fabp1b.2), and there are two copies of fabp7, fabp10, and fabp11. In total, 142 FABPs were identified in the 18 collected organisms (Tables 1 and  S1). In Table S1, we list the gene annotations, including the gene ID, transcript ID, splice variant number, RNA/protein length, exon/intron number, and differing genome sites. Considering the relationship between FABP genes and fat deposit tissue, we propose that the absence of the FABP5 (EFABP, epidermal) gene in Xenopus and fish may be related to the absence of subcutaneous WAT in these species. In addition, the lack of the FABP4 (AFABP, adipocyte) gene in fish may account for the deposition of fat in the liver, muscle, and mesentery in most fish rather than in adipocytes. Indeed, in zebrafish, it has been proven that fabp11a, and not fabp4, plays a key role during adipogenesis [28,29]. However, the mechanism regulating the differential distribution of fat is complex, and for many FABPs, the precise physiological function is not completely understood; thus, how FABPs regulate the fat deposited in different tissues also requires further study.

Diversity of the gene structure of FABPs
As reported previously, most functional FABP genes exhibit a 'canonical' structure including four exons and three introns [26]. However, some protein-coding genes with an atypical structure also exist (Figs 1, S1 and Table 2). For example, seven exons and six introns are found in FABP6 of dolphin (Tursiops truncates); there are five exons and four introns in FABP2 of Rattus norvegicus and FABP10 of Anolis carolinensis, respectively; there are three exons and two introns in FABP5 of T. truncates, fabp1 of Xenopus tropicalis, fabp10b of Takifugu rubripes, fabp7a of Oryzias latipes, and FABP1 of C. intestinalis; and in C. elegans, lbp-5, lbp-6, lbp-7, and lbp-8 are composed of two exons and one intron. In addition, alternative splicing may result in two (FABP6 and FABP7 of humans), two (FABP5 and FABP7 of pigs), one (fabp6 of medaka), three (fabp1, fabp2, and fabp10 of sharks), or one (lbp-9 of C. elegans) protein encoded by genes exhibiting an unrepresentative gene structure ( Fig. 1 and Table 2). Additionally, some lncRNAs, retained introns, and nonsense-mediated decay are observed in FABP genes derived from alternative splicing (Figs 1, S1, and Table S1).
As shown in Fig. 1, which shows the gene structure of 142 fabps, we found that the second and third exons are conserved in almost all protein-coding FABP genes with four exons and three introns. Although the exon/ intron positions are similar in all genes, the length, especially that of the intron, is variable. For example, the lengths of the first intron of FABP1 vary from 74 bp (FABP1 in medaka) to 5239 bp (FABP1 in platypus) because of the differences in genome size among different species. In contrast, the length of exons varies relatively little. In almost all FABPs, the length of the second exon is 173 bp, except in FABP11, in which the length of the second exon is 176 bp. The third exon of most FABPs (FABP3, FABP4, FABP5, FABP7, FABP8, FABP9, FABP11, and FABP12) consists of 102 bp nucleotides; however, there are 93, 108, 90, and 90 bp nucleotides in the third exons of FABP1, FABP2, FABP6, and FABP10, respectively (Fig. 1).

Alternative splicing of FABP genes
Alternative splicing is a common phenomenon in eukaryotes that greatly increases the diversity of proteins that can be encoded by genes [30]. For example,  in humans,~95% of multiexon genes undergo alternative splicing [31]. According to the Ensembl annotation, 33 of the 142 FABPs ( Fig. 1 and Table S2) exhibit splice variants, except those in the genomes of Bos taurus, T. truncates, Gallus gallus, X. tropicalis, Oreochromis niloticus, and C. intestinalis. Notably, in humans, with the exception of FABP2 and FABP9, eight other FABP genes exhibit alternative splicing; FABP1 and FABP3 to FABP6 exhibit four splice variants, whereas FABP7, FABP8, and FABP12 exhibit two splice variants. In addition, in D. melanogaster, there is only one FABP gene, but it exhibits three splice variants, with all transcripts retaining the second exon during alternative splicing (Fig. 1). Mapping these variants to those in mammals, we found that the second and third exons in mammals are combined to form the second exon (272 bp) in D. melanogaster. Therefore, exon skipping and intron retention are involved in fabp post-transcriptional splicing.

The conserved synteny of FABP genes
Conserved synteny (the colocalization of genes on chromosomes) is sometimes used to describe the preservation of the precise order of genes on a chromosome passed down from a common ancestor [32], although many geneticists reject this use of the term [33]. Many studies have indicated that all FABPs are likely to have  arisen from common ancestral genes through duplication and diversification [25,34,35]. Thus, we analyzed the synteny of FABP family genes. As shown by the genetic linkage maps in Fig. 2 [36]. In fish, fabp3 is clustered with fabp10 or/and fabp11. For example, in sharks, fabp3/fabp10 are colocalized on scaffold KI635890.1, but in zebrafish and tilapia, fabp3/fabp11a are located on chromosome 19 and scaffold GL81161.1, respectively; however, in takifugu and medaka, fabp3/ fabp10b/11a are located on chromosomes 12 and 11, respectively. In zebrafish, the duplicated fabp1 genes fabp1b.1 and fabp1b.2 form a gene cluster on chromosome 8. In C. elegans, the lbp5/lbp6 and lbp7/lbp8/lbp9 gene clusters are positioned on chromosomes I and V (Fig. 2), respectively, suggesting that they might have arisen via tandem gene duplication.

Phylogenetic relationship between the FABPs of organisms with a heterogeneous fat distribution
Thus far, FABP genes have only been found in vertebrates and invertebrates, and evolutionary studies have distinguished major subfamilies that could have been derived from a single ancestral gene close to the time of the vertebrate/invertebrate split [12]. In this work, to understand the evolutionary relationships of organisms with a heterogeneous fat distribution, we downloaded all selected genes encoding proteins, which varied in length from 108 to 189 amino acids. Multiple sequence alignment of proteins using ClustalW showed that the identities of the proteins ranged from 8.8% (fabp1 in shark vs. fabp5 in dolphin) to 96.9% (FABP7 in mouse vs. FABP7 in rat). The similarity of the amino acid sequences of FABPs of the same type in different organisms is greater than that between FABPs of different types in the same organism. For example, FABP1 from 18 different species consistently displays sequence identities higher than 60%, while the identities of 10 FABPs (FABP1-FABP9, FABP10) in the human genome are as low as 23% on average (Table S3). We further constructed a protein neighbor-joining (NJ) tree using MEGA6 (Biodesign Institute, Tempe, AZ, USA ) [37]. As shown in the NJ tree (Fig. 3), all FABPs were split into two clades: FABP1, FABP6, and FABP10 cluster in one clade, and FABP2, FABP3, FABP4, FABP5, FABP7, FABP8, FABP9, FABP11, and FABP12 colocalize in another clade. Moreover, the invertebrate FABPs of Nematoda and Drosophila cluster into the FABP2/FABP3/FABP4/FABP5/FABP7/ FABP8/FABP9/FABP11/FABP12 clades, which suggests that the FABP1/FABP6/FABP10 gene cluster may have diverged from another cluster before vertebrate/invertebrate divergence. In teleost fish, fabp7a/fabp7b and fabp11a/fabp11b duplicates share a common node, but fabp1a/fabp1b and fabp10a/fabp10b do not share a common node, although all of these sequences cluster in the same clade with all other sister fish and invertebrates' fatty acid-binding proteins (fabps). However, fabp7 and fabp8 in C. intestinalis do not cluster into the corresponding clade with other species. The topology of the maximum-likelihood (ML) tree (Fig. S2) is similar to that of the NJ tree. Together with the conserved FABP gene synteny analysis, we confirmed that the current FABP family gene/protein set might have resulted from multiple rounds of duplications and splicing editing divergence during evolution. These results regarding FABP evolution are consistent with the reports of Schaap [25] and M. Wright [35].
As shown by the studies of M. Wright and his colleagues [35][36][38][39][40][41], teleost fishes possess many copies of fabps genes owing to a whole-genome duplication even that occurred early in the teleost radiation. In addition, these authors proposed that the two copies of fabp7a/fabp7b, fabp10a/fabp10b, and fabp11a/fabp11b in the teleost fish genome may have resulted from a fishspecific duplication event [29,39], although the duplicated zebrafish fabp1b.1 and fabp1b.2 sequences are tandemly arrayed on chromosome 8. The fabp1b.1 and fabp1b.2 genes of zebrafish are paralogs that were presumably duplicated by unequal crossing-over during meiosis [35].

Conclusions
To clarify the relationships between FABPs and the tissues in which fat is deposited, the number, gene structure, conserved synteny, and evolution of FABP family genes in 18 species were systemically studied. There are ten, nine, eight, eight, seven, five, five, one, and five types of FABP family genes in mammals, chickens, Xenopus, anole lizards, teleosts, sharks, Ciona, fruit flies, and worms, respectively (Table 1). In total, 142 FABPs were identified in the selected species (Tables 1 and Table S1). We propose that the loss of a particular FABP may be related to the lack of fat storage in the corresponding tissue. However, since there are many members of the FABP family, how they individually or interactively regulate the distribution of fat in different tissues requires further study.

FABP orthologs identified from C. elegans to Homo sapiens
In this study, we chose humans, rats, mice, dolphins, platypuses, chickens, Xenopus, anole lizards, teleost fish, sharks, fruit flies, and worms as representative organisms with a heterogeneous fat distribution, and we chose cows, pigs, tilapia, fugu, and medaka as representatives of domestic economic species. Different FABPs are typically identified based on the reciprocal best hits (RBHs) in two genomes using Ensembl BioMart (http://asia.ensembl.org/biomart/martview/ 1aaa80b6fabd0febf1bc9c3d3c2fb519). Using human FABP1, As previously reported, FABPs evolved through successive gene duplications, generating a large number of tissue-specific homologs. However, a high degree of gene duplication, particularly in distantly related organisms, hinders ortholog identification by different methods. To avoid obtaining false orthology from Ensembl Compara, we performed NCBI BLASTP and TBLASTN searches using the default parameters to detect orthology. Then, we manually verified the missing genes using OrthoDB (https://www.orthodb.org/), ZFIN (Zebrafish International Resource Center database, http://zf in.org/), and WormBase (https://wormbase.org/#01-23-6).

Gene structure and alternative splicing analysis
Gene structure and splicing variant information was obtained by referencing the Ensembl database (http://asia.e nsembl.org/index.html). We manually searched and downloaded information about genetic structure and splice variants for each FABP. Then, we drew the gene structures including every fabps transcript with PowerPoint (PPT).

FABP genetic linkage map
The chromosomal or scaffold locations of the FABP genes were derived from Ensembl genome databases. Based on the provided genetic linkage data, we drew genetic linkage maps using MapDraw 2.1 [42] in Excel.

Sequence alignments and reconstruction of gene/protein trees
The FABP protein sequences were aligned using the Clus-talW algorithm with default parameters and then manually checked. In addition to all the FABP amino acid sequences identified in this study, 144 FABP sequences were also included for phylogenetic analysis. LCN1 (accession number NP_002288), which belongs to the lipocalin family of calycins and has a size (158 amino acids) comparable with those of iLBPs, was used as an outgroup. Phylogenetic trees were constructed using both the NJ and ML methods implemented in MEGA6 software (bootstrap = 1000) [37]. Evolview v3 [43] was used to visualize and annotate the NJ tree.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. High-resolution gene structure in PDF format. Fig. S2. ML tree of FABP proteins. Table S1. 142 FABP gene locations and transcripts. In the table, we list the gene ID, the location on chromosome or scaffold, splice variant numbers, transcript names, transcript IDs, gene length, protein length, biotype and the numbers of exons/introns. Table S2. List of FABP genes subjected to alternative splicing. Table S3. Sequence identity matrix of FABP proteins.