Abstract
Introduction
Gene duplication has been an important mechanism for shaping immune defenses against the high diversity of pathogens faced by plants, invertebrates, and vertebrates.1,2 This can be seen from the gene clusters of Toll-like receptors, 3 major histocompatibility complex class I and II, 4 and the antimicrobial peptides (AMPs), ie, defensins 5 and cathelicidins. 6 Cysteine-rich AMPs are abundant in animal and plant tissues involved in host defense. In insects, most AMPs are synthesized in the fat body, an organ analogous to the liver of vertebrates. Cysteine-rich AMPs are summarized in Supplementary Table 1. The number of disulfide linkages varies from one to four. Beta-defensins contain three pairs of disulfide-linked cysteines, and plant AMPs often contain eight cysteines, which form four disulfide linkages. This could correspond to a hypothesis that the main function of the disulfides may be to protect the backbone from proteolysis rather than maintaining the micro-bicidal activity of the AMP molecules. 7
Log-likelihood values and positively selected sites for β-defensin genes under site models.
Defensins are a group of small cationic peptides, which are a first line of host defense against pathogenic infections. 8 They have a broad spectrum of antimicrobial activities against bacteria, virus, fungi, and protozoan parasites.9–12 On the basis of the cysteine pairing to form intramolecular disulfide bonds, vertebrate defensins can be classified into three subfamilies, α, β, and Θ.13,14 All three types of defensins have six conserved cysteines. Specifically, the six cysteines of α-defensins are disulfide-linked C1-C6, C2-C4, and C3-C5, but in β-defensins they are connected C1-C5, C2-C4, and C3–C6. 15 Theta-defensin, a cyclic peptide also containing three pairs of disulfide bonds, is believed to arise from peptide splicing of two-hemi Θ-defensins. 11 Alpha-defensins are specific to mammals and are mainly produced by leukocytes of myeloid origin and Paneth cells of the small intestine.11,16 Beta-defensins have been found in most vertebrate species including fish, 8 amphibians,17,18 lizards, 19 birds, 20 and mammals21,22 with a much wider range of tissue expression pattern. Theta-defensins were first isolated from the leukocytes of rhesus macaques 23 and are the only backbone-cyclic peptides known in animals up to date, 24 which are believed to have arisen from α-defensins. 25 Comparisons of α- and β-defensins in vertebrates have shown more evidences favoring a closer relationship between vertebrate β-defensins and insect defensins.21,26
In recent years, β-defensins have been discovered in various vertebrate species, including teleost fish,
8
Chinese brown frog,
18
salamander,
17
chicken,
20
zebra finch,
20
duck,
27
lizard,
19
cattle,
28
mouse, and human.
29
The
To date, no β-defensin family members have been described in dolphin, panda, manatee, and platypus. How the β-defensin family evolved (birth-and-death processes) during the vertebrate evolution is also unknown. Taking advantage of several vertebrate genomes including fish, amphibians, reptiles, birds, and mammals, we obtained multiple intact β-defensin-like peptides in 29 vertebrates and provided a comprehensive view of birth-and-death processes involving β-defensin genes during the evolution of vertebrates in this study.
Materials and Methods
Identification of novel β-defensin genes
To identify potential sequences in vertebrates with whole genome sequences available at the Ensembl (http://www.ensembl.org/index.html), University of California-San Cruz (UCSC) genome browser (http://genome.ucsc.edu/) and NCBI (http://www.ncbi.nlm.nih.gov/), all 86 intact β-defensin genes previously reported (including 24 human, 10 mouse, 27 cattle, 15 chicken, 1 salamander, 1 Chinese brown frog, 3 lizard and 5 fish sequences) were retrieved from GenBank and used as query sequences to conduct a TBLASTN search of each genome sequence and cDNA sequence using the
The topology of the species tree was downloaded from the UCSC Genome Browser (http://hgdownload-test.cse.ucsc.edu/goldenPath/mm10/multiz60way/mm10.60way.common-Names.nh, last accessed May 4, 2014).
Prediction of full-length coding sequences of β-defensins
Most mammalian β-defensins are encoded in two separate exons set apart by a short intron of less than 2 kb, with one exon encoding signal/prosegment sequence and the other exon encoding the mature peptide containing the six-cysteine β-defensin motif. 34 Unlike most mammalian β-defensin genes, which primarily consist of two exons and one intron, the chicken β-defensin genes were found to be composed of four short exons separated by three introns with variable lengths ranging from 117 bp to 3,322 bp. 14 We used a combination of GenomeScan 35 or GENSCAN 36 to identify the full-length coding sequence of β-defensin genes.
Molecular phylogenetic analyses of identified β-defensins in vertebrates
The deduced β-defensin sequences were aligned by ClustalX 37 with appropriate manual adjustments. The neighbor-joining tree of β-defensin amino acid sequences from 29 vertebrates was constructed. The p-distance method 38 was used. The reliability of the estimated trees was evaluated by the bootstrap method 39 with 1,000 replications. The zebrafish preprohepcidin1 (GenBank: AY363452.1) and preprohepcidin2 genes (GenBank: AY363453.1) were used as out-groups because preprohepcidin genes are relatively closer to β-defensin genes among zebrafish AMPs. 40 Phylogenetic trees were constructed in MEGA 5. 41 Multiple sequence alignment was visualized with GeneDoc software. 42
Selection pressure analyses
Maximum likelihood methods were used to study the selective pressure acting on β-defensin genes, and all tests were conducted using the CODEML program in PAML 4.7 package. 43 Natural selection was examined using the site-specific models of heterogeneous selection pressure among sites. Potential positive selection was tested based on the ratio (ω) of nonsynonymous 28 to synonymous (dS) substitutions rates (ω = dN/dS). 7 Generally, if ω = 1, the amino acid substitutions were assumed to be largely neutral; ω >1 is evidence of positive selection; ω <1 was consistent with purifying selection. Five models were used to test for positively selected sites: M1, M2, M7, M8, and M8a. Parameters for the models M0 (one ratio), M1a (neutral), M2a (selection), M7 (beta), M8 [beta and ω (equivalent to Ka/Ks)], and M8a (beta and ω = 1) were calculated. The M0 model assumes a uniform selective pressure among sites. The M1a model assumes a variable selective pressure but no positive selection. The M2a model assumes a variable selective pressure with positive selection. The M7 model assumes a beta-distributed variable selective pressure. The M8 model assumes a beta-distributed variable selective pressure plus positive selection. The M8a assumes a beta-distributed variable selective pressure without positive selection. Four likelihood ratio tests were carried out: M0 vs. M3, M1a vs. M2a, M7 vs. M8, and M8a vs. M8. 43
Peptide sequence logo was based on the alignment of all intact β-defensin peptide sequences identified in this study. The alignments were slightly modified to adjust the gap positions by visual inspection. Peptide sequence logo of β-defensins was generated from the WEBLOGO (http://weblogo.berkeley.edu/logo.cgi).
Structural analyses
To infer structural characteristics of vertebrate β-defensin peptides, we predicted the folding pattern of several sequences using the Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) 44 and SWISS-MODEL (http://swissmodel.expasy.org) 45 web servers. Electrostatic potential was calculated using the PBEQ-Solver web server (http://www.charmm-gui.org/?doc=input/pbeq-solver). 46 All figures were generated with PyMOL software (DeLano Scientific; http://pymol.org).
Results
Number of intact β-defensin genes in 29 vertebrate species
We used the published data from the NCBI, Ensembl, and UCSC database and applied the known full-length β-defensins from the human, mouse, cattle, chicken, amphibians (salamander and Chinese brown frog), lizard and fish as queries to identify β-defensin genes from the genome sequences of 29 vertebrates. There are a total of490 β-defensin genes identified in this study (Fig. 1). The nucleotide sequences of all intact β-defensins are provided in Supplementary File 1. The results suggested a remarkable diversity existing in the number of β-defensins in vertebrates. The number of putative β-defensin genes varies from 1 in frog to 42 in cattle. Cattle β-defensin genes present a big expansion compared with other mammals, while marine mammals have fewer β-defensin genes (dolphin and manatee have 8 and 17 β-defensins, respectively). To sum up, β-defensins show a distinct tendency for gene expansion along the evolutionary path from reptiles to mammals. 34

The β-defensin genes of 29 vertebrates determined in this study. Intact β-defensin genes identified in this study.
Phylogenetic relationships of vertebrate β-defensin genes
To clarify and analyze the evolutionary relationships and evolutionary dynamics of the vertebrate β-defensin gene family, we used MEGA 5 41 to construct a neighbor-joining tree (Fig. 2 and Supplementary Fig. 1) by calculating the proportion difference (p-distance) of aligned amino acid sites of the 490 β-defensin sequences in vertebrates.

Phylogenetic tree for intact β-defensin genes of 29 vertebrates. The tree was constructed by calculating the proportion difference (p-distance) of aligned amino acid sites of the β-defensin sequences according to the neighbor-joining method and was rooted with zebrafish preprohepcidin1 (GenBank:AY363452.1) and preprohepcidin2 (GenBank:AY363453.1) genes. The reliability of each branch was tested using 1,000 bootstrap replications. Branch lengths are drawn to scale, which is measured by the number of amino acid substitutions per site. Refer to Supplementary Figure 1 for the detailed tree with species and gene names.
Because of the relatively smaller number of aligned gapfree sites, the bootstrap values in the tree are generally low. Several lineages show a cluster of genes from the same species or group of closely related species (marked with one color, Fig. 2), whereas other lineages show genes from distantly related species (marked with many colors, Fig. 2).
As shown in Figure 2, most laurasiatherians β-defensins form some species-specific lineages, such as cattle and dolphin. A majority of platypus β-defensin genes form one platypus-specific lineage. Most lizard β-defensin genes form three main separated lizard-specific lineages. The primates and birds β-defensin genes cluster together to form several lineages. These results suggested that multiple species-specific gene duplications occurred in these vertebrates during evolution, indicating that most β-defensin genes duplicated after the divergence of these species. In addition to the marked gene duplication between lizard and mammals, we also found several cases of gene loss during the evolution. There is a marked contraction in the lineage leading to amphibian (frog); it is unclear whether the frog lost β-defensins after its divergence from fish. The marked varied number of β-defensins in vertebrates suggested that varying β-defensin gene numbers give animals a mechanism to adapt to markedly changeable conditions.
Analysis of selective pressures in vertebrate β-defensin genes
Earlier results
21
showed that mean dN was significantly greater than mean dS in the mature peptide, but there was no significant difference between mean dS and mean dN in the signal peptide or prosegment. Therefore, to identify which amino acid positions may be under positive selection in our research, we estimated selection for dN/dS for each codon by using the PAML 4.7 package.
43
In the site models M0 vs. M3, the model M3 (discrete) was significantly better than the M0 (one-ratio) model, with 2δlnL = 4,657.482288 and showed statistical significance (
To examine the distribution of positively selected sites, the deduced positively selected sites in Table 1 were mapped to the sequence logo of β-defensins (Fig. 3). As shown in Figure 3, positively selected sites are located primarily in the mature peptide. Two of three positively selected sites are located in the mature peptide region. These two positively selected sites are all in the N-terminal of the mature peptide region. Implicitly, these sites that have been subjected to positive selection are important in functional diversity of β-defensins. Therein, one positively selected site is located within a region forming an α-helix. Since α-helical regions are often embedded within the membranes, they may be involved in anchoring the β-defensins to the bacterial cell wall. 47 Thus, this positively selected site within the α-helix may play important roles in the immune specificity of β-defensins. We furthermore found an exceptional site subjected to positive selection in the first β-strand. Beta-strands form the structural core of the β-defensins. It is generally suggested that the triple β-strands are characteristic of β-defensins, so these sites are mainly unaffected by positive selection. Thus, the positively selected site identified in this region may represent alterations in the oligomerization of β-defensins. 48

Sequence logo of vertebrate β-defensins. Peptide sequence logo of all 490 β-defensins related to this study. Sites indicated by ‘utrif;’ have been found to be under positive natural selection, which shows selection [significant (
It is believed that sites within the prosegment region have often suffered negative selection. Exceptionally, we also found a positively selected site located in the prosegment region. The positively selected site in this region was also subject to positive selection, suggesting an uncharted important function of prosegment peptide to date.
Sequence characteristics of cattle intact β-defensins
In this study, we have revealed that 42 intact β-defensin genes exist in cattle genome. In view of cattle β-defensin genes having the most diverse repertoire in vertebrate β-defensins identified so far, we further dissected characteristics of β-defensins in cattle. We constructed phylogenetic tree for the 42 intact β-defensin genes using the NJ method (Fig. 4). This phylogenetic tree showed four distinct phylogenetic gene clusters (Cluster I-IV) existing in cattle β-defensins corresponding to the fact that all cattle β-defensin genes were contained in four chromosome clusters. 49 We speculated that Cluster IV may represent the most ancient cluster from the phylogenetic tree.

Phylogenetic tree of cattle β-defensin peptides. The full-length protein sequences of the cattle β-defensins were used to construct a phylogenetic tree using the neighbor-joining method. In this tree, a chicken β-defensin sequence is used as out-group. Bootstrap values were obtained by testing the tree 1,000 times and those greater than 50% are shown.
To identify whether there are specific residues or motifs in each of the four phylogenetic clusters in cattle, we analyzed the peptide sequences of cattle β-defensins. The β-defensin peptides share some common characteristics: cationic properties, small size, and six conserved cysteine residues. However, β-defensins are a rapidly evolving gene family with relatively less sequence similarity between paralogs. To understand the structural features of cattle β-defensin peptides, we analyzed the amino acid sequences of signal peptide, prosegment, and mature peptide regions. From the alignment of the peptide sequences of all intact β-defensin genes in cattle (Fig. 5), we identified five potential residues or motif markers that can distinguish the four clusters of cattle β-defensin sequences. All five residues or motif markers are located in the signal peptide and prosegment region. The Cluster I sequences have comparative conserved LH, H(Y), A(X), F(L), and LSA(S) motifs (Fig. 5). Here, “X” represents any amino acid that appeared due to the substitution of conserved residue at a particular position. In contrast, the motifs present at the same positions in Clusters II, III, and IV cattle β-defensin sequences are relatively less conserved (Fig. 5). Although the identified markers are mostly cluster specific, in certain cases, the conserved amino acid residue or motif in a particular position are shared by two or three phylogenetic clusters. For example, cattle β-defensin sequences in Cluster I possess His residues at position 4 (numbering is according to the cattle sequence

Multiple sequence alignment of the cattle β-defensin amino acid sequences. The six conserved Cys residues are highlighted in gray and the amino acid residues or motifs that distinguish the four phylogenetic clusters (Clusters I, II, III, and IV) are marked with boxes. The positions of the β-strands, signal peptide, prosegment, and mature peptide are all indicated.
In the mature peptide, the conserved Gly-X-CysIV motif is signally attractive. The conserved Gly-X-CysIV motif forms a β-bulge region, which is thought to be responsible for forming a twist in the β-sheets and to be essential for the correct folding. 50 In cattle β-defensins, we found two conserved Gly-X-Cys motifs (GXCII and GXCIV). Clusters I, II, and III have two highly conserved motifs (GXCII and GXCIV), while Cluster IV has only one conserved motif (GXCII).
The net charge and the hydrophobicity determine the antimicrobial activity of β-defensin peptides. 51 As showed in Figure 5, most of the positively charged residues, such as Arg and Lys, are mainly located within the C-terminus of mature peptide in all four cattle β-defensin clusters. There are more negatively charged amino acids Asp/Glu in Cluster IV cattle β-defensins. This is rare in the other three cattle clusters. In cattle β-defensins, some hydrophobic residues, such as Ala, Val, Leu, and Phe, are all rich and reasonably well conserved in the signal/prosegment peptide.
Protein structural characteristics of vertebrate β-defensins
Functional research revealed that β-defensins not only perform diverse functions in protecting against pathogens but are also involved in regulation of the immune response and reproduction. 49 The functional diversity may depend on the structural variation in the β-defensins. To resolve protein structure of vertebrate β-defensins, we have built theoretical models by homology (Fig. 6).

Theoretical three-dimensional models and surface electrostatic potential for the vertebrate β-defensins. The left and right views of each β-defensin structure are all shown. The electrostatic potential (±2 kcal/mol·
In the mature β-defensin peptides, the overall fold of the β-defensins is composed of three β-strands arranged into an antiparallel β-sheet. Different β-defensins share a remarkable similarity at the level of secondary and tertiary structure, in spite of very low similarity in the amino acid sequence. The structural framework does tolerate a substantial variability in amino acid substitution. These models suggest that different β-defensins display the canonical β-defensin disulfide arrangement and a similar fold.
As indicated in Figure 6, the α-helix flanking the β-sheet in human and other vertebrate β-defensin peptides is believed to be involved in anchoring β-defensin to the cell wall and to play important roles in killing the pathogens. Compared with β-defensins in other vertebrates, the classic α-helix region is particularly absent in all four cattle β-defensin peptide clusters (Fig. 6).
It is known that most AMPs show cationic properties, which is essential for their biological activity. 52 The surface charge distribution endows the β-defensin peptides with amphipathicity, which allows them to insert into the cell membrane of pathogens and act as antibacterials. 53 The most significant difference between β-defensins is that they differ markedly in surface charge distribution (Fig. 6). The surface electrostatic potential distribution is the determinant of functional specificity and difference between vertebrate β-defensins. These variations in electrostatic surface distribution indicate that these β-defensin proteins have distinct mechanisms for the pattern of antimicrobial activity. Another remarkable difference among β-defensins is the variation in loop sizes and orientations (Fig. 6). Subtle variation in loop sizes and orientations can manipulate the fold structure and protein conformation. Thus, β-defensins perform specific antibiotic activity and diverse functions.
Discussion
Frequent changes in the number of β-defensin genes in vertebrate evolution
In this study, we obtained and analyzed intact β-defensin gene sequences from a wide range of vertebrate taxa. Data mining methods based on high-coverage genome sequences are considered as a reliable method to detect β-defensin genes. Because of genome quality and searching difficulties, partial sequences and pseudogenes of β-defensins were not included in this study. The results indicated that there are frequent changes in the number of intact β-defensins within vertebrate lineages. Previous studies showed that primate genomes encode α, β, and Θ-defensins, but the cattle genome contains only the β-defensin subfamily. 52 In this study, we found 42 β-defensins in the cattle genome representing a large expansion compared to other mammals. We hypothesized that this extensive duplication and divergence of β-defensins involved in innate immunity may be due to the substantial load of microorganisms present in the rumen of cattle. This hypothesis runs parallel with the so-called “niche adaptation hypothesis”, which suggests that the evolution of the rumen led to a requirement for more sophisticated immune mechanisms to manage the interface between microbes and the animal host. 49 The large number of microorganisms increases the risk of infections at mucosal surfaces and enhances positive selection for the traits that enable stronger and more diversified innate immune responses at these locations. 28 The β-defensin gene number in a species is relevant to the ever-changing microbial challenges in the ecological niches which they inhabit. In addition to cattle, some mammals such as rat, mouse, and especially microbats have extensive β-defensin gene repertoires. This is presumably due to a unique and diverse microbial environment in their habitats. It is apparent that there has been a rapid evolution of β-defensin genes in mammals through gene or genome duplication and sequence diversification. The rapid evolution and diversity of the β-defensin gene family, considered in the context of their varied antimicrobial and immune regulatory activities, indicates myriad functions for β-defensins in mammalian host defense.
Extensive gene and genome duplications
54
have been regarded as an important raw material for the evolution of acquired immunity. It has long been assumed that β-defensins evolved as an AMP to oppose potentially harmful microorganisms in the environment.
55
Species-specific β-defensins may be required for animals to better deal with the species-specific microbial challenges that they face. Therefore, cattle would be expected to develop a greater level of β-defensins. Marine mammals have fewer β-defensin genes (dolphin and manatee have 8 and 17 β-defensins, respectively) compared with most land mammals, and this may be due to their aquatic habitat since there are fewer prokaryotes in freshwaters and saline lakes.
56
Platypus has a relatively small number of β-defensins in mammals, even less than chicken with 14 β-defensins. Platypus has a blend of mammalian and reptilian features. It is the most remarkable mammal, not only because it lays eggs but also because it is venomous. A previous study identified three
Selective pressures on the evolution of β-defensin genes
A previous study has showed that gene duplication followed by positive selection has indeed been observed in β-defensin gene families involved in immune responses. 21 To further understand the driving force for sequence divergence of β-defensins during evolution, we tested whether sites under positive Darwinian selection occur in the vertebrate β-defensin mature peptide domain by estimating selection for (dN/dS) for each codon in β-defensin using the PAML 4.7 package. 43 We found three sites subject to positive selection and that two of the three positively selected sites are located in the mature peptide. These results support the hypothesis that natural selection has acted to diversify the functionally active mature β-defensin region. 32 Selective pressure analysis revealed that the N-terminal of the mature peptide and prosegment peptide are all important for vertebrate β-defensins.
Positive selection can greatly accelerate the rate of amino acid change. Divergence of these β-defensin genes often leads to either an additional layer of functional redundancy or acquisition of functional novelties, both of which conceivably help the host cope more effectively with a broader range of pathogens. Differential production of species-specific copies of β-defensins may help species occupy different ecological niches.
Coherence between the structure and activity of β-defensins
Analysis of the structural and functional characteristics of the β-defensins highlights the ability to engineer these peptides to gain a better understanding of their function. The β-defensin peptides share some common characteristics: cationic properties, small size, and three disulfide linkages.
The classic α-helix in β-defensins is particularly absent in cattle β-defensin peptides. In consideration of the fact that the α-helical region may anchor the β-defensins to the bacterial cell wall and play important roles in the immune specificity of β-defensins, we inferred that there are still uncharted mechanisms for immune specificity of β-defensins in cattle.
The biological activities of the β-defensins result solely from the changes in the specific mutation sites, characterized either by the alterations to the geometry of molecular surface or to its physicochemical properties (such as electric charge, hydrophobicity, etc.). It is believed that positively charged amino acid residues are universally toward the C-terminus and play antimicrobial function. 60 So variations in single-amino acid substitutions and N-terminal deletions do not affect the charge or adequately alter the hydrophobicity. However, these changes can alter the bacterial susceptibility and the overall rate of killing. 61
Conclusions
Investigation of the β-defensin genes in vertebrates has revealed extensive gene gains and gene losses in this study. The number of intact β-defensin genes varies from 1 in the western clawed frog to 42 in the cattle. Multiple species-specific gene gains and gene losses have occurred throughout the evolution of vertebrates. Selective pressure tests show that there are three amino acid sites under significant positive selection and highlight the important value of prosegment/mature peptide regions for antibiotic activity. Structural characteristics analysis suggested that structural diversity determines diverse functions performed by β-defensins.
Author Contributions
Conceived and designed the experiments: JT, DL, MY, QZ. Performed the experiments: JT, DL, QL, LZ, XF. Analyzed the data: JT, DL, LZ, XZ, HX, YY. Contributed materials/analysis tools: DL, MY. Wrote the paper: JT, DL, MY, UG. Agree with manuscript results and conclusions: JT, DL, QL, LZ, QZ, UG, XF, HX, YY, XZ, MY. Jointly developed the structure and arguments for the paper: JT, DL, QL, LZ, QZ, UG, XF, HX, YY, XZ, MY. All authors reviewed and approved of the final manuscript.
