Sage Journals: Discover world-class research

Abstract

A non-probabilistic validation framework based on probability boxes (p-boxes) is developed for estimating predictive uncertainty involving input parameter, model form, and experimental uncertainties. These uncertainties are treated as aleatory and epistemic uncertainties. In order to characterize model form uncertainty, an interval-valued area validation metric is introduced to characterize the disagreement between the quantitative predictions from simulation model and experimental data when either or both of predictions and experimental data are represented by probability boxes, whose scales are the same as the original physical units. When experimental data belong to different distributions or physical observations, an extended u-pooling method is used to pool all the comparisons at different sites together, and it offers a global metric for model validation and is interval-valued too. The proposed method is examined by the thermal challenge problem from the Sandia Validation Challenge Workshop.

Keywords

Model validation area metric probability boxes epistemic uncertainty aleatory uncertainty

Introduction

Model validation is defined as the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model.¹ In other words, how accurately predictions from a model match up with observations from physical experiments. With the increasing use of computational modeling in engineering design, performance estimation, and safety assessment, a quantitative method, also called validation metric, is needed to provide a quantitative measure of the mismatch between predictions and experimental observations. As both aleatory and epistemic uncertainties can exist in the practices of model validation, characterizing the effect of mixed uncertainties associated with the validation metric is a challenge.

The topic of validation metric has received considerable attentions in the last decade. A validation metric is a formal measure of the mismatch between predictions and data that have not previously been used to develop the model. There are many desirable properties of a validation metric that would be useful in assessing the accuracy of models. First of all, it is an objective measure of distance. Then, the validation metric should reflect differences in the full distribution of predictions and the data. Third, it should express its result in physical units rather than some esoteric statistical units.¹ Area metric^2–4 is one of the most popular validation metrics, because (1) it could be the same physical unit with predictions and (2) it is not too sensitive to the long tails of distributions. Ferson et al.¹ used area metric to characterize the model form uncertainty by comparing the distributions of predictions and experiment data. Li et al.⁴ extended the concept of “area metric” and the “u-pooling method” for multiple correlated responses, while the new validation metrics are still the statistical units which are not convenient to use in prediction. Roy and Oberkamph² presented a framework of validation based on probability boxes (p-boxes). As shown in their study, area metric was still adopted to compare p-box and p-box/distribution. However, the proposed method in their study can still provide a single value based on the definition of minimal area. Such result only emphasizes the evidence of mismatch and may underestimate the risk of model form uncertainty.

In practice, a model is often used to make several different predictions, and multiple values of the validation metric are, therefore, needed to assess the accuracy of the model. u-pooling^4,5 is introduced to combine all the usable values by pooling all the validation metrics into a universal scale via the probability transformation.⁶ This strategy allows us to pool fundamentally incomparable data in terms of the relevance of each datum as evidence of the mismatch of the model with experimental data. To date, u-pooling method can be only used for prediction distributions, and it is inapplicable to the case where predictions and/or experiment data are represented as p-boxes.

The example of the thermal challenge problem is one of the three problems posed at the Sandia Validation Challenge Workshop. The mathematical model and the solution provided by Dowding et al.⁷ are based on one-dimensional, linear heat conduction in a solid slab, with heat flux boundary conditions. Multiple approaches were used to evaluate the validity of the provided mathematical solution for its use in a specified application with a defined regulatory criterion, and to predict regulatory compliance. Ferson et al.¹ discussed the validation questions with multiple predictions about different outputs and prediction and/or experiments with probability distribution, and the area metric has been presented. Rutherford⁸ discussed the methodology for compensating for computational /experimental discrepancies identified in the validation analysis. Hills and Dowding⁹ presented a multivariate validation metric accounting for model parameter uncertainty and correlation between multiple measurement/prediction differences. But, none of them validated the problem based on the mixed uncertainties framework. Until 2018, Wang et al.¹⁰ proposed a novel model validation approach using evidence theory with considering epistemic uncertainty, but evidence theory abandons the probability information and the separation of aleatory and epistemic uncertainties.

To overcome the above issues, a validation and uncertainty quantification (UQ) framework based on p-boxes is proposed in this article. By introducing new concepts for interval-valued area validation metric and interval-valued u-pooling method, model form uncertainty can be well quantified with the physical units. The thermal challenge problem is used to examine the performance of the proposed method.

The organization of this article is as follows: first, the thermal challenge problem is briefly introduced. Second, the UQ based on p-boxes is proposed to quantify the uncertainty associated with the data from the challenge problem. The meaning and the difference of interval-valued area metric from the original single area metric are demonstrated in detail, and a new u-pooling method based on p-boxes is also presented. Finally, the proposed method is implemented to characterize the predictive capability of this model in the thermal challenge problem.

Summary of the thermal challenge problem

The thermal challenge problem consists of a set of mathematical models, three sets of experimental data set with different sizes (“low,”“medium,” and “high”), and a regulatory requirement. The mathematical model of the temperature under heating is formulated as following

$T (x, t) = {\begin{matrix} \begin{matrix} T_{i} + (q L / k) {(k / ρ C_{p}) t / L^{2} + 1 / 3 - x / L + (1 / 2) {(x / L)}^{2} \\ - (2 / π^{2}) \sum_{n = 1}^{6} (1 / n^{2}) \exp [- n^{2} π^{2} (k / ρ C_{p}) t / L^{2}] \cos (n π x / L)}, \end{matrix} & t > 0 \\ T_{i}, & t = 0 \end{matrix}$ (1)

where T is the temperature, x is the location within the material, t is the time since the onset of heating, $T_{i}$ is the initial ambient temperature, q is the heat flux, L is the thickness of the material, and k and $ρ C_{p}$ are two properties of the material. The regulatory requirement is as following

$Prob (900 ° C < T_{x = 0 cm, t = 1000 s, T_{i} = 25 ° C, q = 3500 W / m^{2}, L = 1.90 cm} < 0.01)$ (2)

with given material properties of the device associated with a particular manufacturing process. The challenge is to use available empirical data to assess whether the regulatory requirement in equation (2) can be satisfied. The empirical data of the two material properties k and $ρ C_{p}$ are observed at various settings of t, L, x, and q, which are all different from the regulatory conditions of interest.

The data provided for the thermal challenge problem include (1) data developed to characterize the thermal conductivity k and the thermal capacity $ρ C_{p}$ , (2) data associated with a series of validation experimental data, and (3) data from accreditation test. The ensemble data are from experiments performed at selected magnitudes of the heat flux q and thickness L.

For the sake of simplicity, in this article, the effects of different experimental data sets are not considered. The material characterization data from the “medium” data set is used here to take account of the uncertainty associated with samples. The p-boxes are used to characterize both aleatory and epistemic uncertainties associated with input and output variables. The focus of this work is placed on the UQ and propagation, validation metric, and extrapolation with p-boxes.

UQ based on p-boxes

Probability boxes

P-boxes can characterize both epistemic and aleatory uncertainties in a way that does not confound the two. P-boxes based on probability bounds analysis take advantage of probability and set theories and keep the separation of aleatory and epistemic uncertainties. As shown in Figure 1, the horizontal breadth of the p-box quantifies the amount of the epistemic uncertainty in terms of the system response quantity (SRQ), and the slope of the p-box is associated with the frequency of the aleatory uncertainty. See Ferson¹¹ for a detailed discussion on p-boxes.

Figure 1.

Representation of p-box for both epistemic and aleatory uncertainty.

Constructing p-boxes for a small amount of data

There are many methods to construct p-boxes. For instance, with a small amount of data, the family of distributions can be specified, whereas the parameters of distribution are unknown. In this situation, it is straightforward to construct a p-box that encompasses all the possible cumulative distribution functions (CDFs). The upper and lower confidence bounds of the CDFs can be identified by parameter estimation and the associated confidence intervals. For a more generalized case, without any hypothesis for distribution, you can get a CDF and confidence bounds by the Kaplan–Meier estimate and the Greenwood formulation. Most recently, the kernel density estimation and the bootstrap were used to estimate the bounds of p-boxes.

For the thermal challenge problem, 20 observations are available for thermal conductivity k as reported in Dowding et al.,⁷ we use normal distribution to characterize the aleatory uncertainty of the thermal conductivity k. The epistemic uncertainty associated with the thermal conductivity due to limited data is quantified by the confidence intervals obtained from the traditional parameter estimation method. The mixed uncertainties for thermal conductivity are, therefore, quantified by a family of normal distribution with its mean and standard deviation in the range of [0.0576, 0.0662] and [0.007, 0.0135], respectively. The distribution families and the p-box with bounds are shown in Figure 2.

Figure 2.

p-box and the distribution family for the thermal conductivity.

Validation based on p-boxes

Validation metric for p-boxes

A validation metric is a formal measure of the mismatch between predictions and observations that have not previously been used to develop the model. We are interested in validation metric that can be applied when predictions are p-boxes. In this section, by extending the definition of single value area metric to interval-valued area metric, a new validation metric is developed to compare quantities characterized by p-boxes.

In the context of p-boxes, the area metric between two p-boxes is defined as¹²

$d (F, S_{n}) = \int_{- \infty}^{\infty} Δ ([F_{R} (x), F_{L} (x)], [S_{nR} (x), S_{nL} (x)]) dx$ (3)

where F and S_n denote the distributions of prediction and observations, respectively. The subscripts L and R denote the left and right bounds for those distributions, respectively. One has

$Δ (A, B) = min_{a \in A, b \in B} | a - b |$ (4)

which is the shortest distance between two intervals. When one of F and S_n is a distribution, the other is a p-box, and the left and right bounds degenerate to one, as shown in Figure 3 (right).

Figure 3.

Area metric between p-box and distribution/p-box.

When $d = 0$ , you cannot conclude that the model form uncertainty is zero. Rather, there is no evidence to support the mismatch. The single value area metric may underestimate model form uncertainty, leading to the potential risk. In our study, the original single value area metric is extended to the interval-valued area metric between p-box and distribution/p-box based on the definition of the area between distributions. Mathematically, the area between the curves is the integral of the absolute value of the difference between the functions as following

$d (F, S_{n}) = \int_{- \infty}^{\infty} | F (x) - S_{n} (x) | dx$ (5)

When one of F and S_n is a p-box or both of them are p-boxes, all possible area d for all possible pairs of distributions are calculated as equation (5), and the interval-valued area metric $[d_{min}, d_{max}]$ is achieved as

$\begin{matrix} \tilde{d} (F, S_{n}) = \\ [min_{i, j} \int_{- \infty}^{\infty} | F_{i} (x) - S_{j} (x) | dx, max_{i, j} \int_{- \infty}^{\infty} | F_{i} (x) - S_{j} (x) | dx] \end{matrix}$ (6)

where the range of d is resulted from the epistemic uncertainty in the quantification of model form uncertainty.

As shown in Figure 4, it is assumed that experimental data are enough, and the experimental distribution is normal (5, 1). The predicted p-boxes are normal ([4, 6], 1) and normal ([6, 8], 1), respectively. Figure 5 illustrates the area measure of the mismatch of predicted p-boxes and experimental distribution.

Figure 4.

Example of mismatch an experimental distribution and different prediction p-boxes.

Figure 5.

Example of mismatch an experimental p-box and different prediction p-boxes.

It is assumed that experimental p-box is normal ([4, 6], 1), while predicted p-boxes are normal ([5, 7], 1) and normal ([7, 9], 1), respectively. Figure 6 shows the area measure of the mismatch of predicted p-boxes and the experimental p-box.

Figure 6.

Translation of observations (spikes) through prediction distributions (gray) to a universal probability scale.¹

Both Figures 4 and 5 show how the interval-valued area metric differs from a single area metric. As seen from them, the original definition of area metric is the lower bound of the interval-valued area metric, which may be much less than the upper bound. The potential risk could be underestimated.

u-pooling for p-boxes

When several observations are collected for a single prediction distribution (or p-box), the empirical distribution (or p-box) is used to pool these experimental data into a single object for comparison of the prediction distribution (p-box). However, pooling is not possible when data are to be compared with different distributions (or p-boxes). One could compute all the areas separately for each pair of prediction distribution (or p-box) and the corresponding observations. One needs to merge all the areas together into an aggregate measure of the overall discrepancy.

By integrating the evidence from all relevant data over the entire validation domain into a single measure, the overall mismatch can be assessed by a u-pooling approach which uses the probability integral transform theorem in statistics. The process of u-pooling can be described simply as follows:¹

Transforming to get u_i. Transforming the observations into a universal probability scale is useful for aggregating incomparable data. According to the corresponding prediction distribution F_i, every experimental datum $x_{i}$ is transformed to u_i, $u_{i} = F_{i} (x_{i})$ . As shown in Figure 6, three such transformations for hypothetical observations are depicted as spikes, and their corresponding prediction distributions shown as gray curves. For the thermal challenge problem, these three graphs can represent temperature responses at three different values of time or location. The intersections of the spikes and their distribution functions are the mean values on the probability scale for each u-value.

Pooling and back-transformation. All these $u_{i}$ should have a uniform distribution on [0, 1]. However, we find that they are not distributed according to the uniform distribution over [0, 1]. Area metric can be used here to represent the mismatch without any physical unit. All $u_{i}$ need a back-transformation against the regular prediction distribution from the u-scale to an archetypal scale, because $u_{i}$ have no physical units, shown in Figure 7 (left).

Comparing to get area. According real experiment data at different conditions, the pooled back-transformed values construct the dummy experimental distribution for the intended condition. The area between dummy experimental distribution and the predicted distribution at intended condition is calculated for validation metric, shown in Figure 7 (right).

Figure 7.

Back-transformation from the u-scale to an archetypal scale determined by a predicted distribution G (left) and the area metric for the pooled back-transformed values against the G distribution (right).¹

It is noted that there would be a little difference when the corresponding prediction distribution in step 1 and the regular prediction distribution in step 2 are p-boxes. Finally, two p-boxes are compared to get the area metric in step 3. A set of interval-valued u, rather than a single u, for each $x_{i}$ are obtained as shown in Figure 8 (left). During back-transformation against p-boxes are carried, each u_i can get a set of interval-valued y_i as shown in Figure 8 (right).

Figure 8.

Translation to a universal probability scale and back-transformation to archetypal scale by p-boxes.

Following the flowchart in Figure 9, the extended u-pooling for p-boxes is calculated based on the following procedure:

Transforming to get $[u_{i min}, u_{i max}]$ . According to the corresponding prediction p-box, every datum $x_{i}$ is transformed to interval-valued u $[u_{i min}, u_{i max}]$ as shown in Figure 8 (left).

Back-transformation and pooling. One needs back-transformation for all $[u_{i min}, u_{i max}]$ against the regular prediction p-box from the u-scale to an archetypal scale, and one will get four values, choosing the maximum and minimal value to be the interval $[y_{i min}, y_{i max}]$ as shown in Figure 8 (right). All the minimal values of these intervals form the left boundary of dummy experimental p-box, and the maximum values of these intervals form the right boundary of dummy experimental p-box for the intended condition.

Obtaining the area. The areas between dummy experimental and predicted p-boxes are calculated for an interval-valued area metric $[d_{min}, d_{max}]$ .

Figure 9.

Flowchart of validation based on the extended u-pooling method.

Validation and prediction for the challenge problem

The normal distribution hypothesis is accepted by Lilliefors hypothesis tests for the “medium” (20 samples) set of the thermal problem. Interval estimate of means and point estimate for standard deviations for the pooled property data are as follows

$\begin{matrix} k : normal ([0.0576, 0.0662], 0.0092) \\ ρ C_{p} : normal ([3.84 \times 10^{5}, 4.21 \times 10^{5}], 3.95 \times 10^{4}) \end{matrix}$ (7)

The p-boxes can be constructed as given in equation (7) and nested iterations are performed, that is, each sample drawn from the epistemic variables on the outer loop results in a sampling over the aleatroy variables on the inner loop. The resulting predicted p-box of the surface temperatures after 1000 s based on equation (1) is shown in Figure 10 as the solid and dotted distribution, and the dashed line is the distribution without considering epistemic uncertainty, which lies in the p-box. Both minimum and maximum estimated probabilities are much larger than 0.01.

Figure 10.

The predicted p-box against the regular requirement and the exceeded probability.

Every temperature observation in the validation domain is thereby paired with a prediction p-box of temperature. There are 140 of these pairs in the “medium” data set. The pairs define the u interval values $[u_{j min}, u_{j max}]$ , and $u_{j} = F_{j} (T_{j})$ , where T denotes an observed temperature and F denotes the associated prediction p-box. Figure 11 shows empirical p-boxes of the u-values, the p-box is to be compared against the uniform distribution over [0, 1], which is the line of perfect agreement, and the difference between p-box and straight line is due to the model form uncertainty, but it is dimensionless. To get the physically meaningful scales of temperature, you need to back-transform p-box of u-values against the regular predicted p-box, as shown in Figure 12. The interval value area metric d is [1.18, 111.47] based on equation (6), which means the difference of temperature values ranging between 1.18 and 111.47. One can get the single value of d with 1.18 if you use the traditional definition with equation (3), which may grossly underestimate the model form uncertainty.

Figure 11.

p-box of u-values compared to theoretical uniform distribution.

Figure 12.

p-box of back-transformed u-values compared to predicted temperature p-box.

When there are no direct experimental observations to be available under given conditions, in order to extrapolate the model form uncertainty at any conditions with given t, L, x, and q, the relation between d and conditional parameters should be developed. There are 140 interval-valued areas d with method mentioned before, and the relation of d against the input variables can be regressed with a linear model to look for any trends that might be presented. Considering the maximum or the minimum areas cannot be presented at the same time, the best fitting linear model for the expected value of the mean of interval-valued area metric d is

$\begin{matrix} d = 42.1 + 0.019 q + 1930.8 L - 2028.7 x \\ + 0.0006 t + N (0, 21.0) \end{matrix}$ (8)

At the configuration for the regulatory requirement, this equals 112 + N(0, 21.0), which means the expected area metric is 112.9 with uncertainty ranges from around 90 to 130.

It is still an open question for extrapolation of area metric, and the parallel push for both sides of the prediction p-box⁴ is adopted here as shown in Figure 13. The risk of exceeding 900° ranges from 0.01 to 0.91, which seems too large range to decision-maker, with epistemic uncertainties quantified. But, the range can be reduced by increasing of experiment samples, improving the heat transform model and considering the relation of input parameters and temperature.

Figure 13.

Parallel push for estimated area metric value as predictive capability.

Summary/conclusion

This article has proposed a new model validation metric under p-boxes. The UQ processes in the context of p-boxes are developed, including the input UQ, propagation, comparison and prediction UQ. The extended interval u-values are used to build the back-transformation-predicted p-box in the validation domain. The extended interval-valued area metric is then used to obtain the model form uncertainty for given parameters, and the relations of appropriate area metric and parameters are captured via regression. It seems that the extended interval area metric obviously expands the uncertainty range too large to be acceptable, especially for extrapolation, which may be over conservative. Although interval-valued area metric can characterize more complicated epistemic uncertainty, more reasonable area metric definition and extrapolation method should be studied in the future work.

Footnotes

Handling Editor: Zhaojun Li

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the Science Challenge Program (grant no. TZ2018007).

ORCID iD

Qinshu He

References

Ferson

Oberkampf

Ginzburg

Model validation and predictive capability for the thermal challenge problem. Comput Meth Appl Mech Eng 2008; 197: 2408–2430.

Roy

Oberkamph

WL.

A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing. Comput Meth Appl Mech Eng 2011; 200: 2131–2144.

Liu

Chen

Arendt

et al . Toward a better understanding of model validation metrics. J Mech Des 2011; 133: 071005.

Chen

Jiang

et al . New validation metrics for models with multiple correlated responses. Reliab Eng Syst Safe 2014; 127: 1–11.

Oberkampf

Roy

CJ.

Verification and validation in scientific computing. Cambridge: Cambridge University Press, 2010.

Angus

JE.

The probability integral transform and related results. SIAM Rev 1994; 36: 652–654.

Dowding

Pilch

Hills

RG.

Formulation of the thermal problem. Comput Meth Appl Mech Eng 2008; 197: 2385–2389.

Rutherford

BM.

Computational modeling issues and methods for the “regulatory problem” in engineering—solution to the thermal problem. Comput Meth Appl Mech Eng 2008; 197: 2480–2489.

Hills

Dowding

KJ.

Multivariate approach to the thermal challenge problem. Comput Meth Appl Mech Eng 2008; 197: 2442–2456.

10.

Wang

Matthies

et al . Evidence-theory-based model validation method for heat transfer system with epistemic uncertainty. Int J Therm Sci 2018; 132: 618–627.

11.

Ferson

Kreinovich

Ginzburg

et al . Constructing probability boxes and Dempster-Shafer structures (SAND2002-4015). Albuquerque, NM: Sandia National Laboratories, 2003.

12.

Oberkampf

Barone

MF.

Measures of agreement between computation and experiment: validation metrics. J Comput Phys 2006; 217: 5–36.

Model validation based on probability boxes under mixed uncertainties

Abstract

Keywords

Introduction

Summary of the thermal challenge problem

UQ based on p-boxes

Probability boxes

Constructing p-boxes for a small amount of data

Validation based on p-boxes

Validation metric for p-boxes

u-pooling for p-boxes

Validation and prediction for the challenge problem

Summary/conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References