Abstract
Introduction
Studying rare events, such as mutation or gene expression, in complex biological systems is challenging because large samples and costly analysis are often required to detect a sufficient number of events. Computational models can help address this challenge by simulating these biological systems. Such computational models can then be applied to more quickly and cost-effectively explore large experimental design spaces. Nonetheless, detailed computational models often suffer from similar barriers as experimental systems; ie, large numbers of simulations are often required to observe these rare events, requiring significant computational time and resources to acquire sufficient data. 1 In this work, we describe a means to circumvent some of these computational challenges, to facilitate efficient experimental design and data analysis. We develop and illustrate a multifidelity analysis approach, incorporating medium-fidelity (MF) analysis (Monte Carlo simulations) and low-fidelity (LF) analysis (Markov chain models) to analyze output from computationally intensive stochastic simulations (high-fidelity [HF] models). We focus on stochastic HF models due to their utility in capturing discrete events, a key feature when studying rare events.
High-fidelity, high-computational-cost models simulate systems with high levels of mechanistic detail. Lower-fidelity models (or surrogate models) of the same system can be used to approximate system dynamics, while eliminating some of the mechanistic details, thereby lowering the computational cost. 2 Various approaches have been employed to accelerate the simulation of rare events in HF computational models, including weighted ensemble simulations, partitioning, and tau-leaping.1,3 Multifidelity models or surrogate models can also be used to simulate and approximate complex system dynamics.4,5 Tunable resolution uses data from fine-grained, mechanistically detailed models to strategically coarse-grain subsections of the fine-grained models as appropriate for various analyses.6,7 These methods typically consist of HF models and LF models running independently, either for different areas of parameter space or for different regions of interest within a single simulation.2,4,7,8
Instead, our approach complements these established methods using lower-fidelity models to analyze output from individual simulations of the HF model. Each HF model simulation produces a set of transitions (timing, type, and number of transitions). This set of transitions maps out a finite state space in the rare event coordinate. Our multifidelity analysis then approximates the state probabilities for this entire rare event state space over time based on the set of transitions. In other words, we assume that system transitions in all coordinates except the rare event coordinate are determined by the HF model simulation. This approach expands on the rare event parameter ranges and types of analyses that can feasibly be explored using the HF model alone. For example, in stochastic models, it could be of interest to discern the influence of stochasticity, which we cannot control, from the influence of parameters that we might be able to control through treatment or other interventions. We will first describe our multifidelity approach in light of a simple illustrative example and then show the application of multifidelity analysis to an HF model of drug-resistance emergence in tuberculosis (TB) disease.
Multifidelity Analysis
To illustrate our approach, we consider the example of rare genetic mutations in a population of bacteria that are dividing and dying over time. An HF model of this system could incorporate any number of mechanistic details driving bacterial division and death, eg, available nutrients, influence of surrounding bacterial species, antibacterial chemicals, and host immune responses (Figure 1A).9–11 Each simulation of such an HF model would produce as output a population trajectory (in this example, the number of bacteria over time). Such a population trajectory is the combined result of 2 series of discrete events: number of bacterial divisions per time unit and number of bacterial deaths per time unit. Because we consider stochastic models, repeated simulations of the same model will result in different population trajectories.

(A) Hypothetical HF model describing bacterial growth and mutation. Model inputs include mechanistic details driving the population growth (eg, available nutrients). Model outputs include population trajectories (number of bacteria over time) and rare event trajectories (number of mutant bacteria over time). Hypothetical outcomes are shown for 2 replications of the stochastic HF model (labeled 1 and 2). Replication 1 had mutant bacteria for a short period and then those mutants died. The time between mutants appearing and disappearing is defined as the mutant period. Rare event trajectories from multiple simulations can be combined to obtain rare event probability distributions for a group of HF model simulations. (B) Flow of information from HF model through MF and LF analyses and the types of outputs possible from each analysis. Multifidelity analysis uses individual HF model population trajectories as inputs. For each population, trajectory MF and LF analyses estimate state probabilities for the entire rare event state space defined by the population trajectory. (C) Table summarizing the capabilities and computational costs associated with each method. *Although MF analysis is capable of simulations at lower frequencies than the HF model, there remain computational limitations on the feasible number replications. **Rare event probability distributions can be obtained from HF model simulations but only for groups of simulations and only at relatively high frequencies. Image components in Figures 1 to 4 were created by New7ducks—Freepik.com.
When the HF model is used to study rare events, in this case genetic mutations, each simulation will also produce a rare event trajectory (in this example, number of mutated bacteria over time, Figure 1A). Rare event trajectories are the combined result of population trajectories and probability of rare events (in this example, the probability of mutation per bacterial division). However, given the low probability of the rare event, a large number of HF model simulations may be required to obtain a sufficient number of nonzero rare event trajectories. In some cases, a weighted ensemble strategy can distribute computational resources to difficult-to-sample areas of the state space. 1 However, the binary nature of systems such as the bacterial mutation example implies that there are no “adjacent” states in the rare part of the state space, which limits the utility of a weighted ensemble approach. Alternatively, Markovian or non-Markovian models can be built to approximate the complex system behavior.12,13 However, these methods assume constant transition probabilities as well as regular transitions in each time interval. We aim to analyze a system where transition probabilities as well as the frequency and timing of transitions are determined by the HF model. The goal of our multifidelity analysis is therefore to lower the computational cost of predicting rare event trajectories and probability distributions corresponding to each HF model population trajectory.
As mentioned above, the HF model provides population and rare event trajectories for individual simulations, as well as distributions of rare event trajectories over groups of simulations. However, these predictions are limited to frequencies that can be reliably observed within a feasible number of HF model simulations. To circumvent this challenge, population trajectories produced by the HF model is used as input into MF and LF analyses (Figure 1B). The MF analysis allows predictions at lower probabilities compared with the HF model as each replication in the MF analysis is typically faster to simulate than one replication of the HF model. However, MF analysis is also limited in the number of replications that can be performed with available computational resources (Figure 1C). The MF analysis can predict individual rare event trajectories as well as the rare event trajectory probability distributions for individual HF model simulations. In contrast, LF analysis can make predictions at any probability values, but is only able to predict the rare event trajectory probability distributions, and not individual resistance trajectories. The computational cost of the LF analysis is lower than MF analysis because no replications are required. The choice of using HF model simulations alone or including multifidelity analysis therefore depends on the rare event frequency, available computational resources, and the types of outputs required to answer the biological question at hand. For example, if the rare event frequencies allow sufficient numbers of MF replications and the dynamics of individual rare event trajectories are of interest, MF analysis is appropriate. If rare event frequencies are low enough to prohibit sufficient MF replications, and rare event probability distributions can provide the necessary insight into the biological question, LF analysis will suffice.
Multifidelity analysis of stochastic simulations can provide insight into expected rare event dynamics in computationally intensive simulations or experiments. The rare event trajectories and their probability distributions produced by MF and LF analyses could inform HF model simulation settings or experimental decisions such as the number of replicates (HF model simulations or experiments) required to obtain a desired number of rare events, the amount of time to allow to observe desired dynamics, and parameter ranges or experimental conditions that can produce these results with available time and resources. Furthermore, in situations where events are rare enough to prohibit experimental or HF model study, multifidelity analysis can be used to extrapolate predictions from computationally or experimentally feasible probabilities to biologically relevant probabilities.
MF Analysis
Our MF analysis uses Monte Carlo simulations to sample the possible outcomes for a single population trajectory produced by the HF model (in our illustrative example, a series of bacterial divisions and deaths). In other words, the MF analysis asks, “given the population trajectory of this particular HF model simulation, what are the possible rare event trajectories and their probabilities?”
To illustrate the MF analysis approach, we will continue our illustrative example of a growing population of bacteria with rare mutation events. Consider HF model simulations where bacteria can be either wild type (WT, ie, unmutated) or a mutant. For each bacterial division and death event, there are a fixed number of possible scenarios (Figure 2). If a mutant bacterium divides, it produces a new mutant bacterium (increasing the mutant population by one). If a WT bacterium divides, one of 2 things can happen: first, no mutation occurs and the division produces a new WT bacterium, increasing the WT population by one; or second, a mutation occurs and a new mutant bacterium is produced, increasing the mutant population by one. Similarly, for each bacterial death event produced during simulation of the HF model, it could be either a WT or mutant bacterium that dies, reducing the respective populations by one.

Possible outcomes for bacterial division and death events produced by the HF model. Division can occur in mutant bacteria (blue) with probability
For each event in the series of bacterial divisions and deaths produced by the HF model, an MF simulation calculates the probability of each of these scenarios. We define the probability of a mutant bacterium division (

Illustration of MF analysis: multiple MF replications for a single HF model population trajectory. The HF model population trajectory is a result of a series of bacterial divisions (+) and deaths (−). This series of events is used to generate multiple possible rare event trajectories. These trajectories are combined to construct probability distributions over time. HF indicates high fidelity; MF, medium fidelity.
LF Analysis
The LF analysis aims to determine rare event trajectory probability distributions for each HF model simulation over time. As in MF analysis, these distributions are based on the HF model–generated population trajectory (eg, a series of bacterial divisions and deaths). Unlike MF analysis, the LF analysis calculates the probability distributions without simulating individual rare event trajectories.
Our LF analysis considers each HF model simulation a Markov chain, where the possible states of the system are the number of rare events. Continuing our bacterial mutation example, the possible states of the system are the number of mutant bacteria in the population. Therefore, if the HF model population trajectory had N total bacteria, then the Markov chain has N + 1 possible states, it could have 0, 1, 2 . . . N mutant bacteria.
We use the same probabilities defined above for MF analysis:
We define the probability of the Markov chain being in state
When event
This probability definition reflects all possible outcomes as illustrated in Figure 2. The first term in equation (1) represents the case where there were

Low-fidelity analysis: Markov chain representing progression of mutant bacterial population. Each circle represents a state of the Markov chain and arrows indicate transition probabilities between states. These transition probabilities are used to construct equations (1) and (2). Note that
When event
The first term represents the case where there were
These MF and LF approaches are easily adapted to analyze other stochastic HF models and rare events. MATLAB code implementing multifidelity analysis is available as supplementary material and described in the supplementary text. Next, we describe the application of multifidelity analysis in a test case.
Test Case Definition: Rare Antibiotic Resistance–Conferring Mutations in Mycobacterium tuberculosis
As a test case to validate multifidelity analysis and illustrate some of its applications, we consider the rare emergence of antibiotic resistance in the context of TB disease. Tuberculosis is caused by
An HF stochastic computational model,
GranSim
Population trajectories obtained from
MF analysis of GranSim simulations
For MF analysis, we use the series of bacterial divisions and deaths from each simulation of
We define
If the division was not a resistant bacterium, we define the probability of acquiring a new resistance-conferring mutation (
For each population trajectory, we simulate 50 000 MF analysis replications. This allows the prediction of rare events at much lower frequencies than the HF model, but there remains a limit on how many MF analysis replications are feasible.
LF analysis of GranSim simulations
For LF analysis of
Multifidelity analysis assumptions
While some mechanistic details of the host-pathogen interactions within granulomas are captured in multifidelity analysis parameters (eg, the relative fitness of resistant bacteria), others cannot be included. Multifidelity analysis implicitly makes the following simplifying assumptions.
The MF and LF analyses cannot track individual bacteria and their varying generation times based on their location within granulomas (eg, intracellular vs extracellular). The per-generation mutation probability (
The current multifidelity analysis methods do not account for the delay between divisions of the same bacterium. That is, based on the probability definitions in equations (3) and (4), the same bacterium can divide twice in consecutive model time steps, which is not biologically accurate. Below, we compare HF model results with multifidelity analysis predictions to validate the method and to illustrate that these assumptions do not significantly affect our predictions in the TB test case.
Validation in Test Case: Multifidelity Analysis Recapitulates HF Model Rare Event Dynamics
To validate multifidelity analysis, we compare their rare event predictions with observed values in the HF model (

Multifidelity analysis validation against HF model results. Each panel shows observed values from HF model simulation at high mutation frequencies (2 × 10−8 per 10-minute HF model time step), along with MF and LF predictions at these frequencies. (A) The average population trajectory produced by the HF model shows early increase in bacterial numbers, followed by a steep decline induced by adaptive immunity, and subsequent stable bacterial numbers. Solid lines show means, dotted lines show ±standard deviations. Note that the lower standard deviation dotted line cannot be displayed on log-scale for the entire simulation period. (B) Probability of resistance per granuloma. Note that curves for MF and LF analyses overlap. HF model curve shows the fraction of granulomas with at least one resistant bacterium. (C) Distribution of number of resistant bacteria per granuloma. In panels B and C, HF model results have a single value because the fraction is calculated out of the 348 granulomas studied. MF and LF results show mean and standard deviations because these analyses predict a resistance probability for each individual granuloma, which we average to compare with HF model results. (D) Number of resistant bacteria per granuloma over time for 10 illustrative examples of HF model and MF analysis (this information is not available from the LF analysis). (E) Distributions for the number of days that resistant bacteria remain within individual granulomas (resistance period) for all HF model granuloma simulations and for 10 illustrative replications of MF analysis for each granuloma. HF indicates high fidelity; LF, low fidelity; MF, medium fidelity.
Table 1 shows computational times recorded for the HF model, MF, and LF analyses. While implementation of multifidelity analysis in other models would have different computational times depending on the number of events being simulated, this test case illustrates the potential computational savings in studying rare events with multifidelity analysis. We next apply multifidelity analysis to illustrate how it can be leveraged to analyze rare events with reduced computational cost.
Simulation time for HF model, MF analysis, and LF analysis.
Abbreviations: HF, high fidelity; LF, low fidelity; MF, medium fidelity.
Predictions in Test Case
MF and LF analyses enable sensitivity analysis at biologically relevant mutation frequencies
One challenge in using HF models is to quantify model sensitivity to parameters when the model outputs of interest are rare events, eg, the emergence of resistance in TB granulomas. Here, we illustrate how multifidelity analysis can be used to perform sensitivity analysis (SA) in biologically relevant mutation frequency ranges. For the TB test case, some model outputs of interest are the probability of resistance per granuloma, the expected number of resistant bacteria per granuloma, and the average resistance period. The model parameters we vary in this SA are mutation frequency (
Both the probability of resistance per granuloma (Figure 6A) and the expected number of resistant bacteria (Figure 6B) appear to correlate more strongly with mutation frequency than relative fitness, where relative fitness has a stronger effect on the resistance period (Figure 6C). Indeed, partial correlations between parameters and each of the model outputs indicate that mutation frequency is the main driver of the expected number of resistant bacteria (Table 2). Although both mutation frequency and relative fitness significantly contribute to the probability of resistance and the resistance period, relative fitness has a stronger correlation with the resistance period than with the probability of resistance. This indicates that a higher relative fitness of a resistant strain allows that strain to persist in the granuloma for longer, whereas a higher mutation frequency will allow more new mutations to occur, both of which result in higher probability of resistance per granuloma. Multifidelity analysis predicts that similar parameter influences are at play at high and low mutation frequencies.

MF analysis–predicted effect of relative fitness (
Sensitivity analysis results.
Partial correlation coefficients (
These parameter influences can be contextualized for TB in terms of different
Multifidelity analysis can discern influence of stochastic events and parameter fluctuations
If stochasticity in the system is strong enough to mask parameter effects, it could affect our ability to predict rare events in the system. In the TB test case, we know the mutation frequency, but the timing of mutations is entirely random. Being able to predict the most dangerous times for mutations to occur could inform public health policy related to diagnosis and case-finding strategies. We therefore apply multifidelity analysis to ask, “what is the influence of the timing of mutations (a stochastic event) vs the influence of relative fitness or mutation frequency (model parameters)?”
We incorporate the timing of mutations into MF and LF analyses by manipulating the initial distribution of resistant bacteria and varying the starting time of the analyses. In other words, we assume that there is exactly 1 resistant bacterium at various times of each HF model simulation (5, 15, 30, 50, and 100 days after infection) and then predict the rare event trajectories and probability distributions over the rest of the infection (up to 200 days). Therefore, we ask whether there is one mutation at days 5, 15, 30, 50, or 100; what is the probability of resistance after 200 days of infection; and how does the timing of the initial mutation affect this probability.
For low relative fitness values, the timing of the first mutation has little influence because the presence of resistant bacteria after 200 days will almost entirely depend on new mutations (Figure 7A and C). In contrast, at high relative fitness, a nonmonotonic relationship emerges between the timing of the first mutation and the probability of resistance after 200 days of infection (Figure 7B and D). Although the nonmonotonic relationship is evident for both high and low mutation frequencies, the relative influence of timing is larger at lower mutation frequencies. At low mutation frequencies, the probability of resistance after 200 days decreases by 99% (from .13 to 6 × 10−4) if the first mutation occurs at 5 days compared with 30 days after infection, respectively. At high mutation frequencies, the probability of resistance decreases by 33% (from .3 to .2). The influence of timing in this case relates to the population trajectories of the HF model (Figure 5A). A mutation at 30 days after infection would occur immediately prior to a large contraction in the bacterial population, making it unlikely that this mutant bacterium will survive. Taken together, these results illustrate how MF and LF analyses can be applied to extrapolate HF model predictions to lower frequencies.

Effect of relative fitness, mutation frequency, and timing (in days) of the first mutation on the probability of resistance after 200 days of infection. Mutation frequencies of 2 × 10−8 (A, B) and 2 × 10−12 (C, D) per 10-minute HF model time step are considered along with relative fitness values of 0.25 (A, C) and 1 (B, D). Results are shown for MF analysis predictions, but LF analysis predictions are nearly identical (Supplement Figure S3). Bars represent mean probabilities more than 348 simulations and error bars represent standard error of the mean. Stars indicate statistically significant differences (
Discussion
Rare events can be challenging to study in complex biological systems, either experimentally or computationally. We outline a multifidelity analysis approach that can be applied to compute probability distributions of rare events over time. Our analysis method takes as input a series of discrete population-level events generated by an HF computational model of the biological system. This series of events provides input for MF and LF analyses to predict the probability and dynamics of rare events for each series of events (ie, each simulation of the HF model). We show that MF and LF analyses can accurately predict HF model results at high frequencies and illustrate how they can be applied to analyze results obtained from HF model simulations.
Multifidelity analysis could be applied as an intermediate step toward more sophisticated approaches such as accelerating HF model simulation or construction of multifidelity models.2,4,8 For example, MF and LF predictions could be used to identify specific time frames to focus on during weighted ensemble simulations or to isolate certain model mechanisms that are important to maintain when constructing multifidelity models.
The multifidelity analysis presented here was performed separately from the HF model, using data from previous simulations of the HF model. However, it should be noted that both MF and LF analyses can be integrated with the HF model so that the rare event trajectories and probability distributions are updated in real time following each discrete event as the HF model simulation is running. Integrating MF analysis with the HF model simulation would be costlier than LF analysis as multiple replications would be performed for each update. Such an integrated approach would be valuable if large amounts of high time resolution data need to be stored in order for multifidelity analysis to be performed separately and if multifidelity analysis parameter values do not need to be varied.
Multifidelity analysis could also be applied to experimental data. As the tools for single-cell analysis become more sophisticated and accessible, experimentally obtained single-cell data could be used to generate the series of discrete events required for multifidelity analysis. For example, bacterial growth in microfluidic devices has been used to carefully monitor bacterial division and death at the single-cell level.37,38 These data could be used as input for multifidelity analysis, thereby predicting possible rare event trajectories for experimental data. Transition probabilities could be based on other measurements in the system at the same time (eg, gene expression data 39 ) or from other experimental systems (eg, mutation frequency in animal models 28 ). In this way, multifidelity analysis could inform iterative experimental design (eg, time points, number of replications) without the construction of an HF model or could bridge single-cell and batch culture results (eg, how does a certain mutation frequency drives resistance at the single-cell level over time?).
In summary, as the sophistication of computational and experimental models advances, strategic and stepwise analysis of the vast amounts of data they produce will become important to maximize the amount of knowledge extracted from these data. We believe that multifidelity analysis could provide one such approach.
Supplemental Material
2018-06-11-Pienaar-SupplementaryText – Supplemental material for Multifidelity Analysis for Predicting Rare Events in Stochastic Computational Models of Complex Biological Systems
Supplemental material, 2018-06-11-Pienaar-SupplementaryText for Multifidelity Analysis for Predicting Rare Events in Stochastic Computational Models of Complex Biological Systems by Elsje Pienaar in Biomedical Engineering and Computational Biology
Footnotes
Funding:
Declaration of conflicting interests:
Author Contributions
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
