Abstract
Introduction
Influenza viruses are negative-sense RNA viruses that belong to the
Different subtypes of influenza A virus infect different hosts. For example, subtypes with the HA1, HA2, HA3, NA1, and NA2 envelope proteins generally infect humans rather than other species.
6
Although these proteins have identical functions in human and animal influenza viruses, they have some genetic differences. For example, avian influenza HA protein binds the alpha 2-3 sialic acid receptor, human influenza HA binds the 2-6 sialic acid receptor, and swine influenza HA can bind both receptors.
7
The H5N1 subtype has a Glu in avian and a Lys in human influenza A virus at position 627 of the amino acid sequence of the PB2 protein, which is a subunit of the RNA polymerase complex.
6
In addition, this subtype shows co-evolution patterns at the
The HA and NA genes of influenza A virus have shown intergenic interactions and postreassortment substitutions of charged amino acids in the HA proteins of different subtypes. 10 In this study, we therefore selected 10 influenza A virus subtypes of avian, human, and swine influenza viruses to perform a co-evolution analysis using the HA and NA genes. Co-evolutionary traits of the two envelope proteins were identified, and sequence substitution analyses were performed to find correlations between subtypes and hosts.
Materials and Methods
Preparation of HA and NA sequence data for influenza A virus
Nucleotide sequence data were obtained from the NCBI Influenza Virus Resource (https://www.ncbi.nlm.nih.gov/genome/viruses/variation/flu/). We examined the HA and NA gene types of 198 subtypes of influenza A virus for co-evolution analysis (n ⩾ 4). Next, we selected full-length HA and NA gene sequences from each subtype of human, avian, and swine influenza virus isolated in the same year. Identical sequences were collapsed and the most recent representative sequence selected. By this process, the following 10 subtypes were selected: H1N1, H1N2, H2N2, H3N2, H5N1, H7N2, H7N7, H7N9, H7N10, and H9N2. The accession numbers for each nucleotide sequence and the nucleotide sequence data for the HA and NA genes were saved as “>accession_year.” The collected nucleotide data for each subtype are described in Supplemental Tables 1-10.
Co-evolution analysis between HA and NA genes
First, for multiple sequence alignment (MSA), alignment output files for the HA and NA genes were generated in the nexus format for each of the 10 subtypes according to host using the Clustal Ω program. 11 Based on the MSA results, yearly co-evolution analysis was performed on the genes. A co-evolution analysis generally examines the evolutionary relationship between two different species in a host–parasite relationship. However, this study matched the HA and NA genes of influenza A virus by year to examine their associations over time using Jane 4, an event-based program. 12 Costs for individual events were set to: cospeciation = 0, duplication = 1, switches = 2, losses = 1, failures to diverge = 1. HA switches/NA and cospeciation/NA were calculated using the collected data to measure the associations between the HA and NA genes for each subtype and host. Furthermore, to visualize the phylogenetic relationship between the genes, TreeMap 3.0 was used to pair data extracted during a given year for each subtype and host. 13 Tanglegrams were created by matching each of the 10 subtypes with the hosts.
Substitution correlations between HA and NA genes
Two types of sequence analyses were performed to measure the similarities in phylogenetic topology and correlations according to substitutions in the nucleotide sequences. First, correlation analysis was conducted using the yearly pairwise alignment score values between the HA and NA genes for each subtype and host. MEGA6 was used to perform ClustalW alignment with a default gap opening penalty of 15, a gap extension penalty of 6.66, a DNA weight matrix of IUB, and a transition weight of 0.5. Pairwise distance was set to pairs of taxa and maximum composite likelihood was followed. Because the HA and NA gene sequences for the same year were extracted, pairwise alignment scores with the same conditions can be obtained. Based on the pairwise distances, the correlations between the HA and NA genes for each subtype and host were analyzed. SPSS version 24.0 (IBM Corp., Armonk, NY, USA) was used for Pearson’s correlation analysis. A
Results
Cophylogeny mapping of HA and NA genes by subtypes and hosts
The co-evolution analysis showed that humans had higher cospeciation values for the H1N1, H1N2, H2N2, H3N2, H5N1, and H9N2 subtypes than did avians, and that the values for humans were the same or similar to those for swine. In the H1N1 subtype, the cospeciation values for HA and NA genes/number of NA genes were 0.63, 0.19, and 0.27 in humans, avians, and swine respectively. The value in the H1N2 subtype was also the highest in humans (0.75) and lowest in avians (0.43) and swine (0.45). In the H3N2 subtype, the values were 0.50, 0.26, and 0.56 in humans, avians, and swine, respectively. In contrast, in the H5N1 and H9N2 subtypes, humans and swine had higher cospeciation values than the genetic switch value, whereas the opposite was true in avians. Because sequence data from other hosts could not be obtained for the H7N2, H7N7, H7N9, or H7N10 subtypes, the cospeciation values were only compared in avians. Cospeciation was highest in the H7N10 subtype (0.59) and lowest in the H7N2 subtype (0.42). The event values are detailed in Table 1.
Results from event based cophylogeny according to subtypes and hosts (default cost settings of 0, 1, 2, 1, 1 in Jane).
Bold font was used for cospeciations of modest effect size (>0.5).
To visualize the phylogenetic relationship between the HA and NA genes, tanglegrams were created using TreeMap 3.0. As shown in Figure 1, there was a difference in phylogenetic congruence by subtype between humans and avian. In each tanglegram, nodes with bootstrap values of at least 50 were labeled in red. The results verified that the yearly ML phylogenetic tree for humans showed similar topology between the HA and NA genes in the H1N1 and H3N2 subtypes. In contrast, tanglegrams for avian viruses showed different topologies between the HA and NA genes in the yearly ML phylogeny. In particular, as shown in Figure 1C, the phylogenetic trees for each of the HA and NA genes in the H3N2 subtype in humans were similarly matched yearly from 1981 to 2016, with the exception of a few years. However, the avian showed no parallel matched pairs between each taxon.

Tanglegram describing the source phylogenies used in reconciliation analysis of HA gene (left) and NA gene (right) among the subtypes and hosts. (A) H1N1 subtype from human, (B) H1N1 subtype from avian, (C) H3N2 subtype from human, (D) H3N2 subtype from avian, (E) H5N1 subtype from human, (F) H5N1 subtype from avian, (G) H9N2 subtype from human, and (H) H9N2 subtype from avian.
Comparison of substitutions of influenza A virus by subtype and host
The results of Pearson’s correlation coefficient analysis were mostly similar to those of the reconciliation analysis, with differences in some subtypes. In the H1N1 subtype, there was a high positive correlation between the yearly pairwise alignment scores for the HA and NA genes in humans (r = 0.80,

Correlation coefficient scatter plots of sequence alignment score between HA and NA gene. (A) H1N1 subtype from human, (B) H1N1 subtype from avian, (C) H1N1 subtype from swine, (D) H3N2 subtype from human, (E) H3N2 subtype from avian, and (F) H3N2 subtype from swine.
Next, substitution rates for each subtype and host were computed using the nucleotide sequences of the HA and NA genes for the 10 influenza A virus subtypes. For the H1N1, H1N2, H4N3, and H5N1 subtypes, the substitution rate for the NA gene was higher than that for the HA gene in avians, whereas the opposite was true in humans. There were differences in substitution rates between the HA and NA genes in all subtypes. In the H1N1 subtype, the substitution rate for the HA gene was 2.03 × 10-3 substitutions/site/year (95% highest posterior density [HPD], 1.37 × 10−3 to 2.83 × 10−3), whereas that for the NA gene was 1.79 × 10−3 substitutions/site/year (95% HPD, 1.33 × 10−3 to 2.18 × 10−3). In avian H1N1, the substitution rate for the HA gene was 1.63 × 10−3 substitutions/site/year (95% HPD, 1.30 × 10−3 to 1.96 × 10−3), whereas that for the NA gene was 2.02 × 10−3 substitutions/site/year (95% HPD, 1.77 × 10−3 to 2.28 × 10−3). In human H1N1, the substitution rate for the HA gene was 1.80 × 10−3 substitutions/site/year (95% HPD, 1.33 × 10−3 to 2.28 × 10−3), whereas that for the NA gene was 1.46 × 10−3 substitutions/site/year (95% HPD, 1.31 × 10−3 to 1.62 × 10−3). For the H3N2 subtype, the substitution rates were similar for the HA and NA genes that for the HA gene was 2.78 × 10−3 substitutions/site/year (95% HPD, 4.24 × 10-4 to 4.33 × 10−3), and that for the NA gene was 2.62 × 10−3 substitutions/site/year (95% HPD, 1.93 × 10−3 to 3.19 × 10−3). In avian H3N2, the substitution rate for the HA gene was 8.38 × 10−4 substitutions/site/year (95% HPD, 3.29 × 10-4 to 1.28 × 10−3), whereas that for the NA gene was 2.14 × 10−3 substitutions/site/year (95% HPD, 1.85 × 10−3 to 2.43 × 10−3). In humans, the substitution rate for the HA gene was 3.49 × 10−3 substitutions/site/year (95% HPD, 3.25 × 10−3 to 3.73 × 10−3), whereas that for the NA gene was 2.99 × 10−3 substitutions/site/year (95% HPD, 2.77 × 10−3 to 3.23 × 10−3). Other results are described in Table 2.
Mean nucleotide substitution rate of HA and NA gene according to subtypes and hosts.
Discussion and Conclusions
In this study, we compared the co-evolution patterns and correlations between HA and NA genes of influenza A virus according to subtypes and hosts. The results revealed that humans indicated higher cospeciation values than avian in the subtypes of H1N1, H1N2, H2N2, H3N2, H5N1, and H9N2. Reconciliation analysis showed that humans have higher cospeciation values than switch values for HA and NA. On the other hand, avian show higher switch values than cospeciation values in the subtypes. Especially, H1N1 and H1N2 subtype cospeciation distance was higher than that of the other subtypes. H1N1, H1N2, and H3N2 are known subtypes of the swine influenza A virus and are currently circulating among humans. HA plays an important role in determining the host range of influenza viruses, and an optimal balance between the activities of HA and NA is required for efficient viral replication and transmission. 18 Thus, the HA and NA functional balance due to compensatory mutations may exert selective pressure on hosts.
Comparisons between substitution rates of the HA and NA genes in each subtype also showed that evolution rates differed among avian, humans, and swine. In particular, among the subtypes that showed high cospeciation values in our previous analysis, H1N1, H1N2, H3N2, and H5N1 had higher substitution rates for the NA gene than for the HA gene in avian. On the other hand, the substitution rate for the HA gene was higher than that for the NA gene in humans. There were significant positive correlations between the two genes in the H1N1 and H3N2 subtypes in humans compared with those in avian and swine, confirming that the HA and NA genes co-evolved in some subtypes and hosts. Following these results, the HA and NA genes, which encode envelope proteins that play key roles in viral attachment to hosts.
Reassortment exposes influenza HA to significant changes in selective pressure through genetic interactions with NA. 19 Glycosylation of the receptor binding site is limited to HA and is often a result of antibody escape as antibodies are targeted against the entire globular head domain of HA. The argument has been made that glycosylation at the receptor binding site that reduces substrate affinity necessitates a change in NA to maintain viral fitness. 20 Moreover, the strong receptor binding affinity of HA benefits viral replication in cells given that the HA and NA balance plays an important role in the viral life cycle. 21 Other internal gene influenced the evolution to HA gene. Avian and human influenza viruses typically have a different sialic acid binding preference and only few amino acid changes in the HA protein can cause a switch from avian to human receptor specificity. 22 In the hemadsorption assay, the presence of oligosaccharide side chains in the vicinity of the receptor binding site was shown to induce negative effects that interrupt the efficient HA and SA interaction. 21 Also, as a result of distance correlation of protein in Influenza A genomes by the MirrorTree method shows that HA and NA have high correlation distance than other gene. 23
In this study, we demonstrated the influenza A virus from the human has higher cospeciation value than avian and swine in H1N1, H1N2 and H3N2 subtypes. These subtypes are the known swine influenza A virus and currently circulating among human. The influenza glycoprotein HA plays an important role in determining the host range of influenza viruses. An optimal balance between the activities of HA and NA is required for efficient viral replication and transmission. 18 So, HA and NA functional balance compensatory mutation may have selective pressure to hosts. We performed an event-based cophylogeny analysis with the HA and NA genes of influenza A virus to identify the differences among hosts. Correlations analyzed using sequence alignment scores to verify co-evolution based on sequences. However, as only published genetic sequences were examined, the small set of sequence data resulted in error rates in some subtypes. Therefore, we plan to analyze the co-evolution of the HA and NA genes in all subtypes and hosts by collecting novel data in a future study. Based on these findings pertaining to the phylogenetic evolution in relation to the co-evolution of the HA and NA genes in avian, humans, and swine and sequence variations caused by differences in substitution rates. The results of our phylogenetic co-evolution and sequence variation analyses provide a proof of principle for influenza virus vaccine design and antibody-mediated therapies based on cospeciated regions of the HA and NA genes.
Supplemental Material
Supplementary_Tables – Supplemental material for Comparative Co-Evolution Analysis Between the HA and NA Genes of Influenza A Virus
Supplemental material, Supplementary_Tables for Comparative Co-Evolution Analysis Between the HA and NA Genes of Influenza A Virus by Jinhwa Jang and Se-Eun Bae in Virology: Research and Treatment
Footnotes
Funding:
Declaration of conflicting interests:
Author Contributions
Disclosures and Ethics
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
