Abstract
Keywords
Introduction
The immune response is recognized as a robust system able to counteract invasion by diverse pathogens [1, 2]. However, as a complex biological process, the dynamical behavior of its cellular components exhibits a high degree of variability affecting their differentiation, proliferation or death processes. Indeed, the abundance of antigen-specific T cells and their location relative to pathogen invasion will affect the dynamics of the response [2–4]. Similarly, the pathogen load and virulence as well as the host innate response will affect the T cell response [5]. Finally, at the cellular level, between-cell variations in protein content can also generate heterogeneous responses [6]. Genetic variability of the numerous genes controlling the immune response will also be a source of variability among individuals [1]. Even among genetically identical individuals, the response to the same infection can result in highly heterogeneous dynamics [7–9].
Cytotoxic CD8 T cells play an essential role in the fight against pathogens or tumors as they are able to recognize and eliminate infected or transformed cells. Indeed, following encounter of antigen-presenting cells loaded with pathogen or tumor derived antigens, in lymphoid organs, quiescent naive CD8 T cells will be activated. This leads to their proliferation and differentiation in effector cells that have acquired the capacity to kill their targets, and to their ultimate differentiation in memory cells [10, 11]. The CD8 T cell immune response is yet a highly variable process, as illustrated by experimental measurements of cell counts: dynamics of the responses (timing, cell counts) may differ from one individual to another [4, 13], but also depending on the immunogen [3, 9].
The role of genome variability in explaining inter-individual variations of T cell responses has been recently investigated [14, 15] but provided limited understanding of the observed heterogeneity. Li et al. [15] have focused on correlations between gene expression and cytokine production in humans, and have identified a locus associated with the production of IL-6 in different pathogenic contexts (bacteria and fungi). Ferraro et al. [14] have investigated inter-individual variations based on genotypic analyses of human donors (in healthy and diabetic conditions) and have identified genes that correlate with regulatory T cell responses.
To our knowledge, inter-individual variability characterized by heterogeneous cell counts has been mostly ignored in immunology, put aside by focusing on average behaviors of populations of genetically similar individuals. The use of such methodology, however, assumes that variability is negligible among genetically similar individuals, which is not true [7, 16], see Figure 1. Experimental measurements of

CD8 T cell counts after vaccinia virus (VV) immunization in mice. Naive CD44-Mki67-Bcl2+ cell counts (upper left), Early Effector CD44+Mki67+Bcl2- cell counts (upper right), Late Effector CD44+Mki67-Bcl2- cell counts (lower left), and Memory CD44+Mki67-Bcl2+ cell counts (lower right) are measured in 59 individuals over 47 days. Measurements of three different mice are highlighted in blue, green and red, whenever they are available. All other measurements are in grey.
In this work, we propose to model and characterize inter-individual variability based on CD8 T cell counts using nonlinear mixed effects models [17–19]. In these models, instead of considering a unique set of parameter values as characteristic of the studied data set, a so-called
Nonlinear mixed effects models have been used to analyze data in various fields [20], especially in pharmakokinetic studies, and more recently to model cell to cell variability [21, 22] or to study tumor growth [23, 24]. In immunology, Keersmaekers et al. [25] have recently studied the differences between two vaccines with nonlinear mixed effects models and ordinary differential equation (ODE) models for T and B cells. Jarne et al. [26] and Villain et al. [27] have used the same approach to investigate the effect of IL7 injections on HIV+ patients to stimulate the CD4 T cell response. None of these works aimed at identifying immunological heterogeneous processes or characterizing the between-individual variability in CD8 T cell responses, rather nonlinear mixed effects models have been used to characterize the average behavior of the cell populations and explain the data.
In order to characterize inter-individual variability based on experimental cell counts, other methods could be considered. First, one could try to estimate individual parameter values by fitting a mathematical model to individual experimental data. The nature of the data we consider here, illustrated in Figure 1, makes this option unrealistic (not enough data per individual). Another approach could be to use individual-based models, yet such models also require to estimate a lot of parameters and individual data in our case do not provide enough information
A number of models of the CD8 T cell response based on ODEs have been proposed over the years. Of particular relevance here is the work of De Boer et al. [28], where the model accounts for activated and memory cell dynamics but the influence of the immunogen is imposed. Antia et al. [29] proposed a model based on partial differential equations, that includes immunogen effects and dynamics of naive, effector and memory cells. These works describe different subpopulations of CD8 T cells, however most of the time only total CD8 T cell counts are available to validate the models. In Crauste et al. [10], the authors generated cell counts for four subpopulations of CD8 T cells in mice that they used to identify the most likely differentiation pathway of CD8 T cells after immunogen presentation. This approach has led to a model of the average CD8 T cell dynamics in mice after immunization and its representation as a set of nonlinear ODEs. The model consists in a system of ODEs describing the dynamics of naive, early effector, late effector, and memory CD8 T cell subsets and the immunogen.
The goal of this article is to propose, analyze and validate a mathematical methodology for describing individual behaviors and the inter-individual variability observed in CD8 T cell responses, in different immunization contexts. We will rely on a dynamical description of CD8 T cell dynamics, based on a nonlinear model, where parameter values are drawn from probability distributions (nonlinear mixed effects model). Starting from the model introduced and validated in [10], we first select a model of the CD8 T cell immune response dynamics accounting for variability in cell counts by using synthetic then experimental data, generated in different immunization contexts. Second we characterize the main biological processes contributing to heterogeneous individual CD8 T cell responses. Third, we establish that the immunogen-dependent heterogeneity influences the early phase of the response (priming, activation of naive cells, cellular expansion). Finally, we show that besides its ability to reproduce CD8 T cell response dynamics our model is able to predict individual dynamics of responses to similar immunizations, hence providing an efficient tool for investigating CD8 T cell dynamics and inter-individual variability.
CD8 T cell counts have been previously experimentally measured and characterized in 4 subpopulations in Crauste et al. [10]: naive (N), early effector (E), late effector (L) and memory (M) cells (see Section 4.2). Additionally, a mathematical model has been proposed for the description of these data and validated in [10]. Therein, System (3) has been shown to be able to describe average dynamics of CD8 T cell immune responses, when CD8 T cells go through the 4 above-mentioned differentiation stages. In order to describe individual CD8 T cell counts we couple System (3) to a nonlinear mixed effects model that accounts for both population and individual dynamics. This results in a model of CD8 T cell response dynamics that includes more parameters, consequently when fitting this model to new data sets it is mandatory to determine whether all model parameters can be correctly estimated or if some parameters have to be removed and the initial model modified. We first perform this analysis on ideal data, called “synthetic data”, to determine the minimal number of parameters required to fit the model to the data. Synthetic data are generated from simulations of the model, so true parameter values are known and the estimation procedure is performed in a controlled framework. Then, in a second time, we perform again the analysis on real, experimental data, in order to adapt our procedure to realistic data. Finally, we characterize the biological processes depending on the immunization, and we identify model parameters and their corresponding biological processes that vary the most between individuals. Details of the methodology are presented in Sections 4.3 to 4.7.
Model selection on synthetic data
In [10], System (3) introduced in Section 4.3 has been shown to be able to describe average dynamics of CD8 T cell immune responses, when CD8 T cells go through 4 differentiation stages: naive, early- then late- effector cells, and memory cells (see Section 4.2). Here System (3) is coupled to a nonlinear mixed effects model, leading model parameters to be drawn from a probability distribution. Initially we assume that any parameter can carry inter-individual variability, then the number of parameters is reduced to ensure correct estimations on ideal data. Ideal data are generated by simulating System (3), so parameter values are known (probability distributions). These data sets enable to evaluate the potential of the model without data availability-related limitations: we call them “synthetic data”. Here, synthetic data account for 100 individuals (mice) and measurements are available every day from D4 to D10, then every 2 days from D10 to D20, and finally on D25 and D30 pi, see Table 1 and Section 4.6 for details.
Data sets (details in Sections 4.2, 4.3 and 4.6)
Data sets (details in Sections 4.2, 4.3 and 4.6)
We use Synth data sets 1 to 4 (Table 1) to validate the parameter estimation procedure. True parameter values are known and given in Section A1. Parameter estimation is performed with the SAEM algorithm implemented in Monolix software [30]. Using a model selection procedure, based in particular on the use of the relative standard error (RSE) defined in (4) that informs on the confidence in the estimation, we iteratively remove parameters:
Details of the procedure are explained in Sections 4.5, 4.6, A2 and Table A2.
All removed parameters are related to death rates, of late effector cells (
This leads to a reduction of the initial 12-parameters System (3) to the 9-parameters System (1),

Schematic CD8 T cell differentiation diagram following immunization.
Biological data from VV data set 1 (see Section 4.2 and Table 1) are confronted to System (1). Parameter estimation is performed using the SAEM algorithm [30] and, following the procedure described in Section 4.7, leads to further reduction of the model. Using
The first step in the model reduction procedure leads to an estimated value of parameter
Steps in estimating parameter values using VV data set 1 and System (1). The procedure is detailed in Section 4.7. At Step 1 , the procedure leads to removing parameter μ
L
. At Step 2 , the random effect of δ
LM
is removed. At Step 3 , the random effect of δ
EL
is removed. At Step 4 , the random effect of ρ
E
is removed. At Step 5 , the random effect of μ
N
is removed. At Step 6 , no other action is required. Values used to take a decision are highlighted in bold at each step. In the first column, ‘m.v.’ stands for mean value, RSE is defined in (4), ‘r.e.’ stands for random effect, and the shrinkage is defined in (5). Note that values (mean values and random effects) of parameters μ
E
, μ
L
, and μ
I
have to be multiplied by 10-6 (for μ
E
and μ
L
) and 10-5 (for μ
I
). Units are omitted for the sake of clarity
Steps in estimating parameter values using VV data set 1 and System (1). The procedure is detailed in Section 4.7. At
In the second step of the procedure however, several random effects have large RSE and high shrinkages (Table 2,
The resulting model, System (2) (see Figure 2B), comprises 8 parameters, 4 of them vary within the population (
Bars highlight fixed parameters within the population. This system enables to describe VV data set 1 and its inter-individual variability (see Figure 3). The inter-individual variability is entirely contained in the activation rate of naive cells (

Experimental and simulated individual cell counts for VV data set 1 (logarithmic scale).
Figure 3A shows the good agreement between model predictions and individual measurements for each CD8 T cell subpopulation. Model predictions are obtained from numerical simulations of System (2) performed with estimated individual parameter values. Despite over- or under-estimation of some individual observations, the 90th percentile of the difference between observed and predicted values (dashed line) shows that most experimental cell counts are efficiently predicted (estimated errors are relatively small for all subpopulations:
Estimated parameter values for VV and Tumor data sets 1 (median of log-normal distribution for parameters with random effects, RSE (%) in parentheses), obtained with System (2), and estimated parameter values from [10] (VV immunization). Estimations have been performed independently
Figure 4 shows the estimated dynamics of early- and late-effector and memory cells of two individuals. One individual (Figure 4A) was monitored on days 7, 15 and 47pi leading to three measurements points for late effector cells and two for early effector and memory cells. Despite missing measurements (memory cell counts on D7pi, and early effector cell counts on D47pi), estimated dynamics are in agreement with what is expected, especially regarding the time and height of the peak of the response and the following contraction phase. The other individual (Figure 4B) had cell count measurements only on day 8pi, yet the estimated dynamics correspond to an expected behavior. This could not have been obtained by fitting this individual alone. Hence we are able to simulate likely dynamics even with a small amount of data points and missing cell count measurements at some time points, thanks to the use of nonlinear mixed effects models and the parameter estimation procedure. By focusing first on the population dynamics (based on a collection of individual dynamics), the method enables to recover the whole individual dynamics. This is a huge advantage when data sampling frequency is low.

The dynamics of three subpopulations (early effector - red, late effector - green, memory - purple) are simulated with System (2) for two individuals. Experimental measurements are represented by dots, simulations of the model by straight lines.
Similar good results are obtained for Tumor data set 1 (see Figure 5 and parameter values in Table 3). Therefore System (2) enables to describe inter-individual variability in different immunization contexts, here VV and Tumor immunizations, and with only few data points per individual.

Experimental and simulated individual cell counts for Tumor data set 1 (logarithmic scale).
Estimated parameter values obtained with System (2) for VV or Tumor data sets are in the same range as in the estimation previously performed on average cell counts on a similar experimental set (VV immunization [10]), see Table 3. Some differences are observed for estimated values of differentiation rates, yet for the 3 estimations (VV data set 1, Tumor data set 1, [10]) parameter values remain in the same order of magnitude, indicating consistency between the two studies. Estimated values of parameter
From the independent estimations on VV and Tumor data sets (Table 3), we compute differences between estimated values of fixed effects. Differences are large for parameters –in decreasing order –
Consequently, combining both data sets (VV and Tumor) as observations may highlight which parameters have to be significantly different to describe both data sets.
A covariate is added to the fixed effects of the five parameters that showed the larger differences in the initial estimation:
One may note that adding a covariate increases the number of parameters to estimate. However, the number of parameters is not doubled, since we assumed that parameters without covariates are shared by both immunization groups. In addition, the data set is larger, since it combines VV and Tumor measurements. Hence the number of parameters with respect to the amount of data remains reasonable.
From this new estimation, we conclude that among the five selected parameters the covariates of only four of them are significantly different from zero:
Estimated parameter values using combined VV and Tumor data sets 1. Parameters that do not vary within the population are shown in the upper part of the table, whereas individual-dependent parameters are shown in the central part (mean and standard deviation values). RSE (%) are indicated in parentheses. Parameters whose values depend on the immunogen (VV, Tumor) are highlighted in grey, and the p -value characterizing the covariate non-zero value is shown in the last column
Estimated parameter values using combined VV and Tumor data sets 1. Parameters that do not vary within the population are shown in the upper part of the table, whereas individual-dependent parameters are shown in the central part (mean and standard deviation values). RSE (%) are indicated in parentheses. Parameters whose values depend on the immunogen (VV, Tumor) are highlighted in grey, and the
Figure 6 shows the estimated distribution for parameter

Probability distribution of parameter
Table 4 gives the estimated values of all parameters in both groups. Regarding parameters that do not vary within the population, it is required for parameters
In summary, we identified parameters whose values are significantly different according to the immunogen used to activate CD8 T cells. These parameters correspond to the dynamics of naive cells (
To challenge System (2) and the estimated parameters (Table 4), we compare simulated outputs to an additional data set, not used for data fitting, of both VV and Tumor immunizations, VV data set 2 and Tumor data set 2 (Table 1 and Section 4.3Valida).
We already know the probability distribution of parameters (Table 4), so we only estimate individual parameters in order to fit individual dynamics. Results are shown in Figure 7, for both VV and Tumor data sets 2. Individual fits are available in Section A3. It is clear that estimated individual dynamics are consistent with previous individual dynamics estimations. Hence, we validate System (2) and values estimated in both VV and Tumor immunization contexts by showing that estimated parameter values allow to characterize individual CD8 T cell counts obtained in similar contexts (Figure 7 and Section A3).

Observed
When following an
Starting from the model described in Crauste et al. [10] that could efficiently describe CD8 T cell dynamics, at the level of average population cell-counts in peripheral blood, we built and validated this nonlinear mixed effects model in a step-wise fashion. The system was first modified to ensure correct parameter estimation when confronted to ideal, highly informative data. In a second step, the model was again modified and parameter values estimated by using experimental measurements generated through a VV immunization. We next identified parameters –hence biological processes –that vary between individuals and explain the between-individual variability, and other parameters that can be fixed within the population to explain biological data (measured in VV and Tumor immunization contexts). Finally, by including a categorical covariate we additionally identified immunization-dependent parameters.
In order to determine the contribution of each parameter to inter-individual variability, one could argue that performing a sensitivity analysis would shed light on the more sensitive parameters. Even though model’s output sensitivity to parameters and contribution of each parameter to individual variability may be partly related, they are nonetheless different concepts. Sensitivity analysis would highlight the influence of a parameter on the population dynamics, whereas our objective is to reproduce several individual outputs (individual dynamics) which exhibit more or less variability than the average population behavior. Therefore the use of classical tools like Sobol indices [31] or generalized sensitivity functions [32] is not adapted to handle the current question. Hence, we proposed a procedure, based on estimated errors and the shrinkage, to identify a minimal set of parameters (fixed and random effects) required to describe the data sets. It can be noticed that the shrinkage, expressed as a ratio of variances (see (5)), provides an information similar to the one given by Sobol indices.
Noteworthy, from a biological point of view, the removal of one parameter during model reduction (for example, the death rate of late effector cells) must not be understood as if the corresponding process is not biologically meaningful. Rather, based on the available data, our methodology found that some processes are non-necessary in comparison with the ones described by the system’s equations.
Similarly, parameters characterizing immunogen dynamics vary within the population whereas model reduction led to remove the variability of equivalent processes (proliferation for instance) in CD8 T cell dynamics. It is likely that this is due to a lack of experimental measures on immunogen dynamics (whether virus load evolution or tumor growth), and one cannot conclude that inter-individual variability mostly comes from immunogen dynamics. Information on immunogen dynamics, when available, could significantly improve parameter estimation and help refining the information on inter-individual variability during CD8 T cell responses.
In our biological data, inter-individual variability is explained only by variability in the activation rate of naive cells, the mortality rate of effector cells, and dynamics (proliferation and death) of the immunogen. The former is actually in good agreement with the demonstration that in diverse infection conditions the magnitude of antigen-specific CD8 T cell responses is primarily controlled by clonal expansion [33].
Two of the three differentiation rates (early effector cell differentiation in late effector cells, and late effector cell differentiation in memory cells) do not need to vary to describe our data sets. This robustness of the differentiation rates is in good agreement with the auto-pilot model that shows that once naive CD8 T cells are activated their differentiation in memory cells is a steady process [34, 35].
Eventually, using nonlinear mixed effects models and an appropriate parameter estimation procedure, we were able to quantitatively reproduce inter-individual variability in two different immunization contexts (VV and Tumor) and provide predictive population dynamics when confronted to another data set (for both immunogens). This demonstrates the robustness of the model.
The addition of a categorical covariate allowed us to identify parameters that are immunization-dependent. Interestingly they control the activation of the response (priming, differentiation of naive cells, expansion of effector cells) as well as the generation of memory cells. This is again in good agreement with the biological differences that characterize the two immunogens used in this study. Indeed, pathogen-associated molecular patterns (PAMP) associated with vaccinia virus will activate a strong innate immune response that will provide costimulatory signals that in turn will increase the efficiency of naive CD8 T cell activation [5]. In contrast, when primed by tumor cells CD8 T cells will have access to limited amounts of costimulation derived from damage associated molecular patterns [36]. The amount of costimulation will also control the generation of memory cells [37]. Focusing on average CD8 T cell behaviors (not shown) highlights stronger responses following VV immunization, characterized by a faster differentiation of naive cells and a higher peak of the response (at approximately 3 × 105 cells compared to 105 cells for the Tumor induced response). Also, in average, more memory cells are produced following VV immunization. Hence the addition of covariates to the model parameters has allowed to identify biologically relevant, immunogen-dependent parameters.
Using covariates has additional advantages. First, they allow to consider a larger data set (in our case, the combination of two data sets) without adding too many parameters to estimate (4 covariates in our case). This is particularly adapted to situations where only some parameters are expected to differ depending on the data set (here, the immunogen). Second, and as a consequence, model fits may be improved compared to the situation where data sets generated with different immunogens are independently used to estimate parameters. Figure 8 illustrates this aspect: dynamics of two individuals are displayed, with and without covariate. In both cases using the covariate (and thus a larger data set) improved the quality of individual fits, and in the case of Individual 1 generated more relevant dynamics with a peak of the response occurring earlier, before day 10pi. No individual fit has been deteriorated by the use of a covariate (not shown).

Positive side-effect of using covariates. For two illustrative individuals, accounting for covariates allows to better estimate early effector cell dynamics: red plain curve with covariate, blue dashed curve without covariate.
Finally, CD8 T cell response dynamics to both VV and Tumor immunogens were well captured for data sets that had not been used to perform parameter estimation (Section 2.4). The behavior of each individual was estimated with the prior knowledge acquired on the population (i.e. fixed parameter values and variable parameter distributions) and proved consistent with previous estimated individual behaviors. The correct prediction of individual behaviors by the model, in a simple mice experiment, paves the way to personalized medicine based on numerical simulations. Indeed, once the population parameters are defined, numerical simulation of individuals can be performed from a few measurements per individual and consequently would allow to adapt personalized therapies.
Ethics statement
CECCAPP (Lyon, France) approved this research accredited by French Research Ministry under project #00565.01.
Mice were anesthetized either briefly by placement in a 3% isoflurane containing respiratory chamber or deeply by intraperitoneal injection of a mix of Ketamin (70 mg/kg) and Xylazin (9 mg/kg). All animals were culled by physical cervical disruption.
Data
All data used in this manuscript are available at https://osf.io/unkpt/?view_only=ff91bd89bc32421dbcbb356c3509ca55.
Age- and sex-matched litter mates or provider’s delivery groups, which were naive of any experimental manipulation, were randomly assigned to 4 experimental groups (of 5 mice each) and co-housed at least for one week prior to experimentation. Animals were maintained in ventilated enriched cages at constant temperature and hygrometry with 13hr/11hr light/dark cycles and constant access to 21 kGy-irradiated food and acid (pH = 3 ± 0.5) water.
Models of CD8 T cell dynamics
The variables
The immunogen load dynamics are normalized with respect to the initial amount [10, 40], so
Parameters
Death parameters are denoted by
Early and late effector cells are cytotoxic, GrzB+ cells [10], so due to competition for limited resources (such as cytokines) and fratricidal death [41, 42] we assumed fratricide killing by CD8 T cells. Consequently the model accounts for effector-cell regulated death rates of both effector cells and the immunogen [10, 40]. Natural mortality rates are considered for naive and memory cells (
Proliferation parameters of early effector cells and the immunogen are respectively denoted by
System (3) has been introduced and validated on a similar VV data set in [10]. To account for individual behavior, parameters will be complexified assuming they are drawn from probability distributions and in the same time this system will be simplified through a model selection procedure to ensure the correct estimation of parameter values with available data sets (see Sections 4.6 and 4.7).
Nonlinear mixed effects models allow a description of inter-individual heterogeneity within a population of individuals (here, mice). The main idea of the method is to consider that since all individuals belong to the same population they share common characteristics. These common characteristics are called “fixed effects” and characterize an average behavior of the population. However, each individual is unique and thus differs from the average behavior by a specific value called “random effect”.
This section briefly describes our main hypotheses. Details on the method can be found in [17–19, 43].
Each data set {
The function
Individual parameters
The residual errors, combining model approximations and measurement noise, are denoted by
We will write that a parameter is
Parameter estimation
Parameter values are estimated with the Stochastic Approximation Expectation-Maximization (SAEM) algorithm. This algorithm is adapted to nonlinear mixed effects models [18] and has been shown to quickly converge under general conditions [17]. Moreover, an implementation of the SAEM algorithm is available in Monolix [30], a freely available software that provides different indicators to quantify the quality of estimations and fit. We used the SAEM algorithm and Monolix in this work.
Once these parameters have been estimated, each individual vector of parameters
Both estimations are performed with Monolix software [30]. Files to run the algorithm (including all algorithm parameters) are available at https://plmlab.math.cnrs.fr/audebert/cd8-responses.
In order to study whether differences observed in parameter values between VV and Tumor data sets (Table 1) are only related to random sampling or if they can be explained by the immunogen, we use categorical covariates (Section 2.3).
To tackle this question, we first pool together VV and Tumor data sets 1. Second, using this full data set, we estimate parameter values by assuming that fixed effects of some Tumor-associated parameters are different from those of the corresponding VV-associated parameters.
To introduce categorical covariates in our mixed effects model, we assume that if an individual is either in Tumor or VV data set then the probability distribution of its individual parameter vector
Model selection on synthetic data
Model selection relies on criteria that allow to evaluate to which end a model appropriately satisfies
Here, we do not use quality of fit criteria to select a model because all models correctly fit data due to a priori over-informed models that have too many parameters compared to available data (see Paragraph Model selection below). Instead, we focus on the capacity of the parameter estimation procedure to correctly estimate model parameters and to the a posteriori validation of statistical assumptions. To do so, we first use synthetic data (see Paragraph Generation of synthetic data below). We take advantage of the fact that we know the exact parameter values used to generate synthetic data, so in order to evaluate the correctness of estimated parameter values we rely on: the relative difference between the estimated parameter value and the true value, the relative standard error (RSE), defined as the ratio between the standard error (square root of the diagonal elements of the variance-covariance matrix) and the estimated value of the parameter [19],
the
We decided not to consider the mathematical notion of identifiability here. Indeed, studying identifiability in nonlinear mixed effect models is a complicated, open question that has been discussed for instance in [44]. Approaches based on the Fisher Information Matrix (RSE) have been proposed and are often used for evaluating identifiability of population parameters, while analysis of the shrinkage allows to investigate individual parameters identifiability, and we used such methods in this work.
These data consist of time points and measurements for the 4 subpopulations of CD8 T cell counts (in log10 scale) and the immunogen load. These are called
We generate synthetic data for 100 individuals, cell counts are sampled at days 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 25, 30pi (cf. Figures A1 to A4). In agreement with real biological data, we assume that all cell counts below 100 cells are not measured, and remove the data. For the immunogen load, values lower than 0.1 are also not considered.
parameters whose estimated value is different from their true value, and the RSE is larger than 5% random effects of parameters with shrinkage larger than 30%.
In Step 1, model (3) is used, with all parameters varying within the population. This makes 29 parameters to estimate: 12 mean values, 12 random effects, 5 error parameters.
In Step 3, based on the estimations performed in Step 2, we iteratively remove parameters that are not correctly estimated. To do so, we first focus on parameters that are not estimated to their true value (which is known) and whose RSE is larger than 5% (this threshold corresponds to a 5% error on the estimated value, see (4)). We consider that the estimated value is different from the true value if
One must note that every time a parameter is removed from the model (mean value or random effect) then new synthetic data are generated using the same protocol as described above, and Step 2 is performed again.
Errors are known when using synthetic data: since a normal noise, proportional to the observation, modifies each observation then there is a constant error on observations of cell counts in log10 scale, and a proportional error on the immunogen load. As mentioned in Section 4.4, we assume a constant error for all cell populations and a proportional error for the immunogen load. Diagnostic tools in [30] show that error models are correct (not shown here).
Quality of fit criteria do not provide relevant information in our case: the Bayesian Information Criterion (BIC) reaches very low values, even for the initial model (3), whereas observations
Biological data are the ones introduced in Section 4.2. Compared to synthetic data, we do not know the parameter values that would characterize them and they provide less observations, hence it may not be possible to correctly estimate as many parameters as in the synthetic data case.
Model selection on biological data is also performed in 4 steps:
parameters whose RSE is larger than 100% random effects of parameters with shrinkage larger than 75%
In Step 1, model (1) is used, with all parameters varying within the population. This makes 23 parameters to estimate: 9 mean values, 9 random effects, 5 error parameters. This model is the one selected on synthetic data (see Section 2.1).
In Step 3, we iteratively remove parameters that are not correctly estimated. We first focus on parameters that are not estimated with a high confidence, that is RSE> 100 %. Once all parameters are correctly estimated, we remove random effects of parameters with shrinkage larger than 75%. Noticeably, we cannot use the same threshold values for the RSE and the shrinkage when using either synthetic or real data, because measurement errors are different: controlled and known for synthetic data, uncontrolled and a priori unknown for real data, with measurement uncertainties.
The error model is not known, so we use the same error model as for synthetic data: a constant error for all cell populations (note that no data on immunogen is available, so the error parameter for the immunogen is not estimated). Diagnostic tools in [30] show that assuming constant error models is acceptable (not shown here).
A posteriori model validation on biological data
In Section 2.4, the model selected on biological data is compared to data that were not used for parameter estimation. These data are presented hereafter.
In order to assess the model ability to characterize and predict immune response dynamics we compare our results to additional experiments, VV data set 2 and Tumor data set 2 (see Table 1 and Section 4.2), similar to the ones used to estimate parameters (VV and Tumor data sets 1). CD8 T cell counts of naive, early and late effector, and memory cells have been measured following VV and Tumor immunizations, on days 4, 6, 7, 8, 11, 13, 15, 21, 28, 42pi.
The probability distribution of parameters (mean values, random effects) are known since we have estimated them on VV and Tumor data sets 1 (Section 4.7). These parameters are not estimated on the validation data. We use them to estimate the individual parameter values that fit individual behaviors of these new data sets (see Section 4.5).
