Identification of a novel botulinum neurotoxin gene cluster in Enterococcus

The deadly neurotoxins of Clostridium botulinum (BoNTs) comprise eight serotypes (A–G; X). The neurotoxin gene cluster encoding BoNT and its accessory proteins includes an operon containing an ntnh gene upstream of the boNT gene. Another operon contains either ha (haemagglutinin) or orfX genes (of unknown function). Here we describe a novel boNT gene cluster from Enterococcus sp. 3G1_DIV0629, with a typical ntnh gene and an uncommon orfX arrangement. The neurotoxin (designated putative eBoNT/J) contains a metallopeptidase zinc‐binding site, a translocation domain and a target cell attachment domain. Structural properties of the latter suggest a novel targeting mechanism with consequent implications for application by the pharmaceutical industry. This is the first complete boNT gene cluster identified in a non‐clostridial genome.

Neurotoxins produced by Clostridium botulinum and occasional strains of C. baratii and C. butyricum (BoNTs) cause a severe and fatal neuro-paralytic disease of humans and animals (botulism) [1]. Currently there are eight recognised serotypes of the BoNT protein (A-G, and recently X [2,3]). Biologically active BoNT is complexed with several accessory proteins, all encoded by a neurotoxin gene cluster. This gene cluster is often associated with mobile elements or is located on a plasmid or bacteriophage, indicating that it is capable of horizontal gene transfer between bacteria sharing a common environment [4]. The gene for the coexpressed protein NTNH is always located upstream of the gene that encodes BoNT. A further operon encodes typically three genes that fall into two categories; ha genes (haemagglutinin) and orfX genes (of unknown function) [2]. Accessory gene products are needed to ensure survival of the BoNT toxin complex during its passage through the gastrointestinal tract, and for transfer through the gut wall into the circulatory system with subsequent delivery to the target nerve cell, although the exact mechanism for translocation across the gut epithelium has only been shown for the HA proteins [2,5]. BoNT is the most potent toxin known [1]. It is a zinc metallopeptidase with an extreme specificity for its target, the SNARE docking proteins of cholinergic nerve cells. BoNT activity destroys the function of these SNAREs, preventing exocytosis of the neurotransmitter acetylcholine with subsequent floppy paralysis of associated muscle tissue [6]. BoNT is used both in the cosmetic and pharmaceutical industries. As such, there is great interest in the discovery of new forms of BoNT, in the hope that these will increase the range of medical conditions that can be alleviated.
Here we describe the discovery of a novel boNT gene cluster that exists not in the C. botulinum, C. baratii or C. butyricum genome but within the genome of a species of Enterococcus. The Enterococcus sp. 3G1_DIV0629 genome contains a botulinum-like neurotoxin gene cluster with a typical ntnh gene and an uncommon orfX arrangement. The predicted neurotoxin gene product from this cluster (designated herein putative eBoNT/J) contains all the functional domains characteristic of a typical neurotoxin, including a metallopeptidase zinc-binding site, a translocation domain and a target cell attachment domain [7]. Structural properties of the latter domain suggest a novel targeting mechanism with consequent implications for application by the pharmaceutical industry. This the first report of a complete new botulinum-like neurotoxin gene cluster outside of the Clostridium species.

Methods
Putative eBoNT/J was identified using known BoNT protein sequences to search the whole genome sequence (WGS) database (visited October 2017) at the National Center for Biotechnology Information (NCBI). An unrooted Neigh-borNet phylogenetic network of clostridial neurotoxin proteins was computed using the SplitsTree4 application [8]. Parameters used for preliminary sequence alignment in Geneious [9] using CLUSTALW were: cost matrix blosum; gap open cost 10; gap extend cost 0.1. The programme SIMPLOT [10] from the DAMBE software suite [11] was used to compare putative eBoNT/J and its associated NTNH with representative examples of other neurotoxin amino acid sequences; we performed this analysis to identify possible mosaic sequences. Parameters used were: window size 100, step length 20, genetic distance PoissonP, using either putative eBoNT/J or NTNH as seed. An iterative search method (JACKHMMER) was used to compare the predicted gene product of putative eBoNT/J with reference proteomes of C. botulinum until convergence. Functional protein domains were identified using HMMER [12] and Pfam [13].
To determine the closest relative of strain Enterococcus sp. 3G1_DIV0629, we used the web-based tool PGAdbbuilder, which uses a pipeline based on whole genome multilocus sequence typing (wgMLST; [14]). Genome sequences were downloaded in FASTA file format as contigs or complete genome sequences from the NCBI website. For phylogenetic analysis, representative enterococcal genome sequences [15] were reannotated using Prokka [16], and comparative genomics performed using Roary [17]. The phylogenetic tree was produced (UPGMA) with the PHYLIP programme using the constructed allelic sequences and bootstrap values calculated by the ETE tool [18]. DNA G + C content of complete genomes was taken from the appropriate NCBI Genome Assembly and Annotation report pages. ISLANDVIEWER 4 [19] was used for identification and visualisation of genomic islands (GIs).
The amino acid sequence of putative eBoNT/J was further analysed using the programme Phyre2 [20], which predicts 3D protein structure. Further analysis and comparison of putative eBoNT/J toxin and its associated NTNH to BoNT/A and BoNT/A complexed with its NTNH was performed by using the I-Tasser simulation [21] which selects the best structural model that fits the query sequence. Estimated accuracy of the predicted model using I-TASSER was; TM score 0.86 AE 0.07, C-score = 1.07 and RMSD = 7.0 AE 4.1 A (eBoNT/J) and TM score 0.87 AE 0.07, C-score = 1.17 and RMSD = 6.6 AE 4.0 A (NTNH).

Results and Discussion
A bioinformatics search of the WGS database (visited October 2017) at the NCBI, using the predicted protein translation product of boNT genes scored a hit with the product of a gene from a recently deposited (Earl A. et al. May 2017) genome of Enterococcus sp. 3G1_DIV0629 (NCBI accession number NGLI01 000004.1; this refers to contig 4 from the sequencing assembly). The Enterococcus putative neurotoxin gene product shared 39% identical residues with its closest relative, BoNT/X, with 58% residues exhibiting conservative changes. The contig containing the boNT-like gene was further examined, to reveal a set of genes upstream that were similar to the orfX1, orfX2, orfX3, p47 and ntnh genes of other orfX-type boNT neurotoxin gene clusters [2,3] although distantly related (26-36% amino acid sequence to closest relative, BoNT/X). In all other examples of orfX neurotoxin gene clusters, only the p47 gene respects the direction of expression of ntnh and boNT, with the three orfX genes facing in the opposite direction ( Fig. 1). Although the recently discovered botulinum neurotoxin homologue in Weissella oryzae SG25 [22] has been tentatively named boNT/Wo [23], until neurotoxicity studies have been performed, we propose to call this new homologue putative eboNT/J. As with many boNT gene clusters, that of putative eBoNT/J is bordered by IS elements [2,24], evidence that it may have been acquired by horizontal gene transfer (Fig. 1).
A NeighborNet phylogenetic network of clostridial neurotoxin proteins (including the putative neurotoxin of W. oryzae SG25) was estimated using the SPLITSTREE programme. The output can be compared directly with that used to demonstrate the discovery of BoNT/X [3]. Predicted gene products of the adjacent ntnh gene were similarly analysed ( Fig. 2A,B). As shown by the position and length of their branchpoint, putative eBoNT/ J is most closely related (38% identity) to BoNT/X, and all other neurotoxins are equally distant (23-25% identity), apart from the putative neurotoxin of Weisella which shares the least protein identity (13%). A similar result was obtained with the putative eBoNT/J NTNH protein. An iterative search (JACKHMMER) was used to compare the predicted gene product of putative eBoNT/J with reference proteomes of C. botulinum until convergence. The resulting 232 matches to botulinum toxin showed that putative eBoNT/J possesses all domains known to be required for BoNT activity; furthermore, these are located in their correct positions [7] (Fig. 2C). These include a light chain containing the zinc binding site of an M27 peptidase (HELCH) at positions 225-229 [25], cysteine residues at positions 424 and 438 required for the disulphide bridge between heavy and light chains following proteolytic cleavage and activation [7] (putative eBoNT/J lacks the extra C residue in this linker region that is present in BoNT/X), a translocation domain (residues 529-843) at the N terminus of the heavy chain containing a version (PYLGNIL, residues 622-628; in BoNT/X this is PYIGPLL) of the conserved PYxGxAL motif required for toxin translocation from the endosome into the target nerve cell cytoplasm [26] and N and C termini (H CN , H CC ) of the C-terminal-binding domain of the heavy chain that are required for binding to the target cell and initiation of endocytosis. With BoNT/ A, B, E, F and G, this appropriation of the normal host synaptic vesicle recycling pathways involves a dual host-receptor mechanism comprising a synaptic vesicle protein and a ganglioside [27][28][29][30][31]. These BoNT-ganglioside interactions are facilitated by a SxWY motif located in the C terminus of the heavy chain binding domain; in putative eBoNT/J, this motif is SAWY (residues 1250-1253) and is identical to that of BoNT/X. Similarity plots were used to further analyse putative eBoNT/J and its accompanying NTNH for relatedness to other BoNT and NTNH proteins (Fig. 2C,D). As indicated by SplitsTree, BoNT/X remained the closest relative throughout its entire length (Fig. 2C); however, all NTNH sequences seemed to be approximately equally distant from that of putative eBoNT/J (Fig. 2D), except for the NTNH-like peptide associated with the putative neurotoxin of Weisella which is a clear outlier.
The amino acid sequence of putative eBoNT/J was further analysed using the programme Phyre2 [20], which predicts 3D protein structure. This predicted a protein structure for putative eBoNT/J that exactly matched a BoNT with a 100% confidence limit. This shows that not only does the entire length of putative eBoNT/J share amino acid sequence conservation with other BoNTs but it also shares structural identity. Another modelling programme, I-TASSER showed that the predicted structure for putative eBoNT/J most closely matched that of a BoNT/A molecule; this match was also mirrored when the NTNH associated with eBoNT/J was superimposed with the structure determined for BoNT/A complexed with its own NTNH, suggesting that if expressed, the putative eBoNT/J could form a similar complex (Fig. 3).
Electron microscopy studies of progenitor BoNT/A complex have shown that its NTNH moiety carries the binding site for HA70, occupied during formation of the active botulinum toxin complex. This binding site is conserved in other BoNTs that form complexes with HA moieties. Those BoNTs that derive from an orf-X neurotoxin gene cluster lack this site, which has been located to a 33-residue region located~120 residues from the N terminus, termed the nLoop [32,33]. Both BoNT/X and putative eBoNT/J share this deletion, which is evidence that they have evolved alongside their Orf-X accessory proteins. Species of Enterococcus are Gram-positive bacteria of the phylum Firmicutes, order Lactobacillales that can be commensal in the gastrointestinal tract of humans and animals, but may also be pathogenic, causing diseases such as neonatal meningitis or endocarditis [34]. Enterococcus sp. 3G1_DIV0629 was isolated from cow faeces in South Carolina, USA. It is distinct from currently recognised species, so has not yet been given a specific name (see NCBI project number PRJNA313452). To determine the closest relative of this strain, we used wgMLST to establish that Enterococcus sp. 3G1_DIV0629 was most closely related to the probiotic strain Enterococcus faecium T-110 [35], (Fig. 4). A search of all publicly available (> 1000) Enterococcus genomes failed to identify another strain containing a botulinum neurotoxin gene cluster.
To determine whether the putative eboNT/J neurotoxin gene cluster has been acquired via horizontal gene transfer, its DNA G + C content was analysed. The G + C content of all sequenced genomes of E. faecium ranges between 36.7% and 42.8%; that of a group of 77 uncharacterised isolates of Enterococcus (of which strain 3G1_DIV0629 is a member) is 33.1-43.3%. The average G + C content of the Enterococcus sp. 3G1_DIV0629 genome is 37.2%, which falls within these ranges. However, contig 4, which contains the putative eboNT/J neurotoxin gene cluster has an unusually low G + C content (31.8% as compared to 38.0-39.3% for the other main (> 50 kb) contigs). That for a 20 kb region of contig 4 encompassing the putative eboNT/J neurotoxin gene cluster, including the IS elements upstream and downstream, was 31.8%, which is identical to the rest of the same contig. Using the DNA sequence of contig 4 as a query, the NCBI WGS database of all sequenced members of phylum Firmicutes was interrogated. All regions of contig 4 which generated a match (~30%) were to enterococcal plasmid sequences; particularly to añ 52 kb region in the centre of the contig, mapping immediately upstream of the neurotoxin gene cluster. However, isolated matches to enterococcal plasmid sequences were scattered throughout contig 4. Considering that the size of contig 4 matches that of several examples of enterococcal plasmids, it is possible that this entire contig represents a plasmid sequence. However, it is equally possible that contig 4 represents a horizontally acquired GI, the G + C content of which is often typically lower than that of its host genome [36]. Using ISLANDVIEWER 4, a software programme for identification and visualisation of GIs [19], a comparison between Enterococcus sp. 3G1_DIV0629 and its closest relative E. faecium T-110 shows that the putative eboNT/J toxin gene cluster lies between two predicted pathogenicity islands (a type of GI); however, comparison of the G + C plot of this region with that of the entire chromosome of strain T-110 indicates that the entire contig may be a GI (data not shown). More sequencing work is needed to confirm this speculation, as the insertion sites for GIs are often found at the switch sites of GC-skew [37]. The DNA G + C content of 203 sequenced C. botulinum genomes in the NCBI genome database exhibits a narrow range of 27.0-29.8%, somewhat lower than that for Enterococcus. This probably reflects the fact that the last common ancestor of the Lactobacillales and the Clostridiales existed~2.8 billion years ago [38] and suggests that if the putative eboNT/J neurotoxin gene cluster has been acquired horizontally (as suggested by the presence of the two IS elements, Fig. 1), then the donor organism may not be C. botulinum.

Conclusion
In summary, this work reports the bioinformatic discovery of the first complete boNT toxin gene cluster located in a non-clostridial genome. The organisation and sequence identity of this gene cluster shows that its closest relative is the recently published boNT/X cluster from C. botulinum strain 111. Although amino acid sequence homology with BoNT/X is only 38%, 3D structure modelling shows that putative eBoNT/J closely mimics the structure of the most potent neurotoxin, BoNT/A. Significantly, as with BoNT/X, variation in the relevant region of the C terminus of the heavy chain indicates that it may possess a novel cellbinding domain. Further work will be required to investigate whether this structural variation will have important implications for the potential use of putative eBoNT/J as a therapeutic agent. Finally, as this work is purely a bioinformatics study, with no access to the bacterial strain, there is no information available regarding whether the putative eboNT/J toxin gene cluster is expressed by its host. However, the fact that all open reading frames for the putative toxin cluster genes are intact strongly suggests that expression is likely. Associated metadata with the genome sequence do not indicate that the herd from which the faecal sample was taken had suffered from symptoms of botulism. Both questions are intriguing and will be the subject of future work.