Abstract
Introduction
Analysis of current genetic diversity enables inferences regarding the evolution of species. However, factors affecting genetic diversity (eg, demography, migration, genetic drift, and natural selection) interact to such an extent that they are challenging to disentangle, and distinct situations may lead to similar outcomes. Genetic data consequently require specific analytical tools. In this context, computer simulation is a powerful approach to investigate the joint effects of these various processes, since it allows simulation of complex scenarios. Although more complexity does not necessarily equal more realism, 1 computer simulation has been shown to be an invaluable tool for understanding the evolutionary processes that might otherwise remain undetected.2–4
Until recently, spatially explicit simulations have been used mostly for the study of selectively neutral loci. 5 A few studies have investigated the effects of positive or negative selection in a spatially explicit context,6,7 but none of these have considered multiallelic loci under balancing selection, the main characteristics of the major histocompatibility complex (MHC). Moreover, the interaction of balancing selection and population structure has been shown to be complex.8,9 Three types of balancing selection models have been proposed to explain the high diversity of MHC lineages as follows: (1) heterozygote advantage (HA), 10 (2) rare allele advantage (RAA), 11 and (3) fluctuating selection over space and time (FS). 12 The relative role and contribution of these three nonmutually exclusive models in the evolution of MHC is intensely debated.12–15 Even though a spatial model with constant negative frequency-dependent selection (FDS) coefficients on specific human leukocyte antigen (HLA) alleles has previously been implemented, 16 this framework is not flexible enough to test the alternative models aforementioned. HA and RAA selection have also been modeled,17–20 sometimes including sexual selection through disassortative mating, 21 but without any spatial component. The island model 22 has been used to integrate space and selection in order to assess the effects of migration on genetic differentiation when selection is at play,8,9,23,24 but this is not, strictly speaking, a spatially explicit model.
The study of the human MHC, namely, HLA, has great potential to unravel human evolution and settlement,25–30 as large worldwide databases are available for several genes. 31 However, a spatially explicit computer simulation framework capable of taking HLA characteristics into account has not been previously available. In this context, we developed a new computer simulation program called SELECTOR. The originality of SELECTOR lies in the simulation of both population spatial demography and balancing selection at a multiallelic genetic locus. In particular, it merges three main evolutionary processes as follows: (1) population spatial structure and migration, (2) genetic drift, and (3) natural selection (balancing or positive). In SELECTOR, selection may either be uniform over the whole simulated geographic area or vary with latitude and/or longitude. Moreover, SELECTOR allows the user to perform model comparison and parameter estimation, as it can easily be integrated in an approximate Bayesian computation (ABC) estimation framework such as ABCtoolbox. 32 In the work presented here, we describe the main principles of SELECTOR, validate the algorithms using simple models for which theoretical expectations can be computed, and finally illustrate its use through a specific application.
Program Implementation
SELECTOR can be used to simulate complex evolutionary models by taking into account demographic variation of population sizes over space and time, as well as the effect of selection at a given genetic locus. SELECTOR simulates the genes of diploid individuals, generation after generation (ie, forward in time) within a two-dimensional stepping-stone framework. 33 The whole population is subdivided into numerous subunits called demes, where each deme has its own spatial coordinates and can exchange migrants with neighboring demes at each generation. Demographic increase in population within each deme follows a logistic growth model to account for competition for resources among the members of each deme. 34 Demographic parameters such as migration, growth rates and carrying capacity can be modified independently in each deme at different periods of time during the simulation. A wide variety of demographic scenarios, including a succession of demographic events, can thus be tested. Even though the demographic and migration algorithms in SELECTOR are derived from those implemented in the program SPLATCHE (model no. 1: even number of migrant, SPLATCHE user manual, p. 21, 35 ) SELECTOR simulates forward in time all the individuals and genes in the entire population and does not use a backward coalescent algorithm 36 to reconstruct the genetic diversity of samples. This full forward approach has the major advantage of allowing the user to consider various types of natural selection on genetic lineages.
Demographic model
Each deme has its own demographic characteristics and is regulated independently using a logistic growth model as follows:
Migration model
In every deme, the number of emigrants
Note that in SELECTOR,
Demographic processes
The order of demographic events is as follows:
Demographic regulation occurs in every deme The number of emigrants The number of immigrants in each deme The density in every deme
Note that the number of simulated generations must be either defined by the user, assuming that the generation time is known, or estimated for the organism under study, or a prior distribution may be defined in order to take into account uncertainty regarding the generation time.
Simulated genetic data
SELECTOR simulates the allele frequency trajectories in demes through the effects of genetic drift, migration, demographic variation, and selection. To achieve this, it records the two allelic variants of a given genetic locus for each simulated diploid individual. These variants are simply coded “
Transmission of gametes
Within each deme, the transmission of genes from one generation to the next is based on a Wright–Fisher's model.22,39 For each individual at generation
Effect of selection
Three different models of selection are implemented in SELECTOR. When selection applies (selection coefficient
Symmetric overdominant selection (SOS), also called HA, simulates overdominant balancing selection, where all heterozygotes have the same selective advantage (fitness) over homozygotes.10,40 If the new genotype is heterozygote, it is accepted with the probability
Frequency dependent selection (FDS), also called RAA, simulates a selection in favor of alleles that are less frequent in the population. A new allele
Dominant positive selection (DPS) simulates positive selection for one specific allele
Program Validation
In order to validate the algorithm of allele transmission implemented in SELECTOR, we performed a series of simulations for which theoretical expectations can be analytically computed.
Evolution of allele frequencies within a single deme
In a single deme of size
We varied all three parameters (
Comparison between simulated (
When comparing the simulated heterozygosity to the one expected theoretically under identical conditions (number of generations, deme size, and initial heterozygosity), we found that the average over 1,000 simulations was very close to the expected heterozygosity, differing by ≤0.6% in all cases. The small differences can be explained by the stochasticity of allele transmission in SELECTOR (see standard deviation), while the expected value is deterministic.
Evolution of allele frequencies within a series of interconnected demes
We assessed the validity of the migration algorithm by modeling four interconnected demes in a 2 × 2 stepping-stone area, in the absence of selection. At the beginning of the simulation, each deme had one specific allele with 100% frequency (
Figure 1A shows that the evolution of allele frequencies through time within a deme converges toward an equilibrium frequency of 25% for all four alleles, as expected theoretically. The differences between the theoretical and the simulated curves are due to the differences between the two models: the theoretical curves have been obtained for a deterministic island model with an infinite size, while the simulated curves have been generated by a stochastic stepping-stone model with a finite (but large) size. Figure 1A shows that the frequency of allele

Comparison between simulated frequencies (sim, solid lines) and frequencies expected theoretically (exp, dotted lines). (A) Evolution of allele frequencies in the absence of selection within four interconnected demes. (B) Evolution of two alleles within a single deme under the effect of DPS on allele
Evolution of allele frequencies under DPS
In a single deme of size
Figure 1B shows that the frequency of the allele
Evolution of allele frequencies under overdominant selection
We further simulated the evolution of allele frequencies in a biallelic system (
Figure 1C shows that the frequencies of the two alleles converge to an equilibrium frequency of 0.5 if the size of the population is large enough to neglect drift, for both balancing selection models (SOS or FDS), as expected from theory.
Application
The interaction of balancing selection and population structure is challenging to study. 9 It has, for example, been shown that genes under balancing selection are less sensitive to population subdivision than neutral ones and are thus expected to show limited differentiation among demes compared with neutral genes under the same demographic conditions.8,23,44 However, this conflicts with most observations,15,45 with some exceptions. 46 In human populations, discordant results have been obtained; some studies have not found any reduction of interpopulation differentiation,44,47 while others have found some low interpopulation or intercontinental differentiation for almost all HLA loci, except DPB1.48–50 Because Schierup et al. 23 theoretically showed that the reduction of genetic differentiation under balancing selection, compared to neutrality, is more pronounced when gene flow between demes is low, with almost no difference when gene flow is large, we believe that spatial gene flow heterogeneity between demes may explain these contrasting HLA results.
We tested this hypothesis by assessing the effects of heterogeneous gene flow between populations on measures of genetic differentiation under balancing selection. We used SELECTOR to simulate the evolution of allele frequencies in interconnected demes, when SOS is at play. Thus, we designed a virtual square area made up of 256 demes (16 × 16 demes, Fig. 1D). This area was divided into four identical groups of demes (square area of 64 demes each, Fig. 1D) connected by a migration rate
We measured genetic differentiation at the end of each simulation by computing the indices
Our results show that, as theoretically expected,8,23,44 genetic differentiation between demes is globally reduced under balancing selection compared with neutrality (Figs. 2 and 3). They also support previous results, suggesting that this reduction is generally less pronounced when gene flow is high (increasing

Pairwise


Pairwise
When balancing selection occurs, genetic distances between populations that are not spatially isolated are only slightly reduced compared with neutral expectations, whereas genetic distances between spatially isolated populations are strongly reduced (ie, they are inversely proportional to the gene flow across the geographical barrier). This effect translates into a distortion of the graphical representation of the genetic relationships between populations. For instance, Figure 5 shows 12 examples (randomly taken) of MDS performed on the simulated population samples, with

Examples of MDS obtained in three series of independent simulations. The four MDS on the left pane have been generated from simulations without selection (
Here, we used SELECTOR to investigate patterns of genetic differentiation between spatially distributed populations, accounting for isolation by distance,
55
where gene flow is not uniform (accounting for partial barriers to migration, such as mountains, rivers, or seas). These two condition properties constitute one of the novelties of SELECTOR compared with previous approaches. In a virtual square area subdivided into four groups of demes separated by a partial
Discussion
The rapid development of computational techniques has led to studies of new aspects of genetic evolution. In particular, simulation approaches permit exploration of realistic evolutionary scenarios that are too complex to be studied analytically, and these approaches have the power to integrate models affected by various processes (including genetic, environmental, cultural, and demographic processes). In this context, we present SELECTOR, a program that simulates genetic lineages under selection in a spatially explicit population framework. SELECTOR was primarily adapted to HLA loci because of the large worldwide datasets available for these genes and the need of a simulation program able to take into consideration their specific characteristics (multiple alleles and overdominant balancing selection) in a population genetic framework. We have validated the algorithms of SELECTOR by showing that its outputs are those expected theoretically in simple situations, but the power of SELECTOR resides in the simulation of more complex scenarios for which expectations cannot be computed analytically.30,56,57 While SELECTOR has primarily been designed to investigate the joint effects of selection and demography on HLA markers, as exemplified in the study by Di et al. 30 , it has also been applied to assess the effects of restricted gene flow on selected and neutral loci. 57 SELECTOR can also be used to test comparable evolutionary hypotheses on other markers, such as lactase persistence. 56
Here, we used SELECTOR to investigate patterns of genetic differentiation that could explain contrasting outcomes obtained on HLA data, for which some studies show a reduced interpopulation differentiation compared with neutral loci,48–50 while others show no difference.44,47 Our results suggest that, if gene flow is high between populations, the effect of balancing selection is negligible compared with that of demography.30,47 In contrast, when gene flow is reduced among populations, such as where there is a partial or strong geographic barrier, the effect of selection is visible, as is the case for the Strait of Gibraltar.
57
The same logic may also explain why in the study by Sanchez-Mazas,
50
the variance of components among continents is significantly reduced (
Hierarchical analysis of genetic variance showing the variance components (%) taken from the study by Sanchez-Mazas. 50
Here, we mainly used a model of overdominant selection as a first approximation to the mode of balancing selection acting on HLA loci, which explains the maintenance of numerous alleles. 11 Interestingly, these results do not differ significantly when either a model of HA (SOS) or a model of RAA (FDS, Fig. 4) is applied. However, the patterns of selective pressure may have been more complex, resulting from a combination of the following: (1) overdominant and positive selection at some alleles in response to the presence of specific pathogens 58 ; and/or (2) variability through space due to various pathogen environments 59 ; and/or (3) variable through time due to climatic variation. New selection models, such as divergent allele advantage 20 or selection varying in time and space, 15 and more detailed relationships between selection, demography, and environment could be the next improvements of SELECTOR. Indeed, one of the interests of the simulation approach is that models can be improved and carefully investigated to assess specific combinations of parameter values and processes that may bring deeper understanding of observed patterns of genetic diversity. Another improvement of our approach would be the incorporation of molecular or multilocus data to analyze genomic information.
SELECTOR has been inspired by the program SPLATCHE, 35 and both programs have similar basic demographic processes. The major difference between the two processes is that the forward-in-time process implemented in SELECTOR, while being computationally more demanding than SPLATCHE's backward-in-time coalescent approach, allows incorporating natural selection processes, while SPLATCHE simulates neutral loci only. However, the similarity between the two programs renders their respective outputs easily comparable, and they can consequently be used to associate genetic patterns of selected genes, such as HLA, to neutral molecular markers. 57 The evolution of allele frequencies in all demes can be obtained in a simple tabulated text format that is easily readable. In addition, final allele frequencies in a series of samples specified by the user are output in ARLEQUIN format (currently 3.5). 38 and can thus be easily analyzed to compute various intra- or interpopulation genetic statistics. SELECTOR has been developed in C++, which makes it computationally efficient and particularly suitable for research purposes. Thanks to its Linux version, SELECTOR may be incorporated through a bash pipeline into the ABC approach, 60 such as with ABCtoolbox, 32 in order to estimate parameters and formally compare models. This powerful method permits assessment of the relative probabilities of contrasting evolutionary scenarios, as well as estimation of the best parameter values.30,56,57 Input parameters of SELECTOR can be directly drawn from prior distributions defined by the user (see SELECTOR user manual) 37 and summary statistics computed from the output using ARLEQUIN. Since it is an approximation technique, ABC avoids the calculation of a likelihood function, and thus allows evaluation of complex evolutionary scenarios that would otherwise be intractable by full likelihood approaches. Even though ABC requires a large number of simulations, which depend on the number of parameters to be estimated and on the prior ranges used,61–63 it has been shown to outperform approximate maximum-likelihood approaches. 64 We believe that SELECTOR provides a versatile and computationally efficient framework to investigate such scenarios.
When compared with other forward-in-time simulation approaches, SELECTOR allows simulating (1) various natural selection mechanisms (not available in SPLATCHE), (2) spatially explicit scenarios, in comparison with “simuPOP,” 65 and (3) more than three populations with sophisticated patterns of gene flow between populations and subpopulations, in comparison with “dadi”. 66 SELECTOR is also more research oriented, being faster than dadi or simuPOP since it is written in C++, which is compiled into a binary executable file, instead of interpreted Python.
conclusion
Here, we have presented the simulation program SELECTOR and demonstrated that it is a powerful and robust tool for investigating the combined effects of selection and demography on the genetic variability of MHC loci.30,57 Moreover, its versatility makes it invaluable for tackling other evolutionary questions and gaining insights into the genetic evolution of human beings and other organisms. SELECTOR is freely available for research and teaching purposes at http://ua.unige.ch/en/agp/tools/selector/.
Availability and Requirements
SELECTOR is written in C++, runs on MS windows or Linux, and is freely available for academic purposes at http://ua.unige.ch/en/agp/tools/selector/.
Author contributions
Conceived and designed the experiments: MC, AS-M. Analyzed the data: MC. Wrote the first draft of the manuscript: MC. Contributed to the writing of the manuscript: MC, PG, DD, JMN, AS-M. Agree with manuscript results and conclusions: MC, PG, DD, JMN, AS-M. Jointly developed the structure and arguments for the paper: MC, PG, DD, JMN, AS-M. Made critical revisions and approved final version: MC, PG, DD, JMN, AS-M. All authors reviewed and approved of the final manuscript.
