Sage Journals: Discover world-class research

Abstract

The Schwarz or Bayesian information criterion (BIC) is one of the most widely used tools for model comparison in social science research. The BIC, however, is not suitable for evaluating models with order constraints on the parameters of interest. This article explores two extensions of the BIC for evaluating order-constrained models, one where a truncated unit information prior is used under the order-constrained model and the other where a truncated local unit information prior is used. The first prior is centered on the maximum likelihood estimate, and the latter prior is centered on a null value. Several analyses show that the order-constrained BIC based on the local unit information prior works better as an Occam’s razor for evaluating order-constrained models and results in lower error probabilities. The methodology based on the local unit information prior is implemented in the R package “BICpack” which allows researchers to easily apply the method for order-constrained model selection. The usefulness of the methodology is illustrated using data from the European Values Study.

Keywords

Bayesian information criterion order constraints truncated priors European Values Study model selection

Introduction

The Bayesian information criterion (BIC) is one of the most commonly used model evaluation criteria in social research, for example, for categorical data (Raftery 1986), event history analysis (Vermunt 1997), or structural equation modeling (Lee and Song 2007; Raftery 1993). The BIC, originally proposed by Schwarz (1978), can be viewed as a large sample approximation of the marginal likelihood (Jeffreys 1961) based on the so-called unit information prior. This unit information prior contains the same amount of information as would a typical single observation (Raftery 1995).

The BIC has several useful properties. First, it can be used as a default quantification of the relative evidence in the data between two statistical models. Second, it can straightforwardly be used for evaluating multiple statistical models simultaneously. Third, it is consistent for most well-behaved problems in the sense that the evidence for the true model converges to infinity (Kass and Wasserman 1995). Fourth, it behaves as an Occam’s razor by balancing model fit (quantified by the log likelihood function at the maximum likelihood estimate [MLE]) and model complexity (quantified by the number of free parameters). Fifth, it is easy to compute using standard statistical software: Only the MLEs, the maximized loglikelihood, the sample size, and the number of model parameters are needed to compute it. All these useful properties have contributed to the popularity and usefulness of the BIC in social research.

Despite the general applicability of the BIC, it is not suitable for evaluating statistical models with order constraints on certain parameters. In a regression model, for instance, it may be expected that the first predictor has a larger effect on the outcome variable than the second predictor, and the second predictor is expected to have a larger effect than the third predictor. This can be translated to the following order-constrained model, $M_{1} : β_{1} > β_{2} > β_{3}$ , where $β_{k}$ denotes the effect of the kth predictor on the outcome variable. This model can then be tested against conflicting models, such as a model with competing order constraints, for example, $M_{2} : β_{3} > β_{1} > β_{2}$ , a model where the effects are expected to be equal, $M_{3} : β_{1} = β_{2} = β_{3}$ , or the complement of these models, denoted by M₄. Under the complement model M₄, the true values for the $β$ s do not satisfy any of the constraints under models M₁, M₂, or M₃.

The reason that the BIC is not suitable for testing models with order constraints is that the number of free parameters does not properly capture the complexity of a model. In the above model M₁, all three $β$ parameters are free parameters, but saying that M₁ is equally complex as a model with no constraints, that is, $(β_{1}, β_{2}, β_{3}) \in ℝ^{3}$ , seems incorrect. Furthermore, the BIC is based on the Laplace approximation of the marginal likelihood. It is as yet unclear how well the approximation performs in the case of models with order constraints. The complicating factor is that the approximation assumes that the maximum value of the integrand is an interior point of the integrated region. This assumption is violated if the maximum likelihood (or posterior mode) does not lie in the integrated region.

Testing order constraints is particularly useful because effect sizes can only be interpreted relative to each other in the study and relative to the field of research (J. Cohen 1988). An effect size of, say, .3, of educational level on attitude toward immigrants might seem substantial for a sociologist, while .3 might not be interesting when it quantifies the effect of a medical treatment on the amount of pain of a patient. Thus, instead of interpreting the magnitudes of effects by their estimated values, it may be more informative to interpret them relative to each other, as is done using order-constrained model selection. This would allow us to assess which effects dominate other effects in the study. Further, order-constrained model selection is useful when testing scientific expectations that can be formulated using order constraints. Examples will be presented in the second section in sociological applications, but there is also an increasing body of literature on this topic (e.g., Böing-Messing et al. 2017; Braeken, Mulder, and Wood 2015; de Jong, Rigotti, and Mulder 2017; Hoijtink 2011; Klugkist, Laudy, and Hoijtink 2005; Kluytmans et al. 2012; Mulder and Fox 2019; Mulder and Pericchi 2018; van de Schoot et al. 2012). By testing order-constrained models, we can quantify the evidence in the data for one scientific theory against others. Order-constrained models are also naturally specified when one is interested in the effect of an ordinal categorical variable on an outcome variable of interest.

It is also important to note that the inclusion of order constraints results in more statistical power. This can be explained by the smaller subspace for the parameters under an order-constrained model compared to an equivalent model without the order constraints. The order constraints make the model “less complex,” resulting in a smaller penalty for model complexity, and thus in more evidence for an order-constrained model that is supported by the data.

In this article, we explore how the BIC can be extended to enable order-constrained model selection. First, a unit information prior is considered that is truncated in the order-constrained subspace. This results in a BIC that may not properly incorporate the relative complexity of an order-constrained model. For this reason, an alternative local unit information prior is considered which is centered on a null value. This prior results in a BIC that properly incorporates the relative fit and complexity of order-constrained models. The R package “BICpack” has been developed for order-constrained model selection in popular models such as generalized linear models, survival models, and ordinal regression models.

To our knowledge, there have been two other proposals for the BIC for evaluating models with order (or inequality) constraints by Romeijn, van de Schoot, and Hoijtink (2012) and Morey and Wagenmakers (2014), and we will compare our proposal to theirs. The article is organized as follows. We motivate the evaluation of statistical models with order constraints on the parameters of interest in the context of the European Values Study in the second section. In the third section, we discuss BIC approximations of the marginal likelihood under an order-constrained model. Fourth section provides a numerical evaluation of the methods, while fifth section describes software to implement the methods. Sixth section explains how to apply the new method for testing social theories in the European Values Study, and seventh section discusses the results.

Order-constrained Model Selection in Social Research

In this section, we present two situations where order-constrained model selection is useful. First, theories often make an assumption about the relative importance of certain predictors on an outcome variable. This can be formalized by specifying order constraints on the effects of these predictor variables. We will show this in Application 1 using Ethnic Competition Theory (Scheepers, Gijsberts, and Coenders 2002). Second, a researcher may have an expectation about the direction of an effect of a predictor variable with an ordinal measurement level. When modeling this ordinal predictor variable using dummy variables, the expected directional effect can be translated to a set of order constraints on the effects of these dummy variables. This will be shown in Application 2 by considering Inglehart’s Generational Replacement Theory.

Application 1: Assessing the Importance of Different Dimensions of Socioeconomic Status

In most European countries, the majority of immigrants are located in the lower strata of society. For this reason, lower-strata members of the European majority population who hold similar social positions as the ethnic minorities, having a relatively low social class, low educational level, or low-income level will on average compete more with ethnic minorities than will other citizens in the labour market. Therefore, Ethnic Competition Theory (Scheepers et al. 2002) would predict that higher social class, educational level, or income level would result in a more positive attitude toward immigrants. Furthermore, it is likely that social class (which reflects the type of job a person has) has the largest impact because one’s social class is directly related to the labour market. The effects of education are less direct, and therefore, it is expected that one’s educational level has a lower impact on attitude toward immigrants than social class. Finally, it would be expected that the effect of income would be the lowest but still positive. This expectation will be formalized in model M₁ which is provided below.

Alternatively, due to the importance of education in shaping one’s identity (A. K. Cohen et al. 2013; van der Waal, de Koster, and ten Kate 2015), it might be expected that education is the most important factor explaining one’s attitude toward immigrants, followed by social class and income for which no specific ordering is expected (formalized in M₂). A third hypothesis is that all three dimensions have an equal and positive effect on attitudes toward immigrants (M₃). Finally, it may be that none of these three hypotheses is true (M₄).

To evaluate these expectations, we first write down the linear regression model where the attitude toward immigrants is the outcome variable, and social class, educational level, and income are the predictor variables while controlling for age. The ith observation is modeled as follows:

\begin{matrix} attitude (i) = θ_{0} + class (i) \times θ_{class} + education (i) \times θ_{education} \\ + income (i) \times θ_{income} + gender (i) \times θ_{gender} + error (i), \end{matrix}

for $i = 1, \dots, n$ . The predictor variables are all standardized. In equation (1), $θ_{class}$ , $θ_{education}$ , and $θ_{income}$ are the standardized coefficients for social class, educational level, and income, respectively, $θ_{gender}$ is the standardized coefficient for gender, and the errors are assumed to be independent and normally distributed with unknown variance.

The four expectations given above can be formalized using competing statistical models with different order constraints on the standardized effects, namely,

\begin{array}{l} M_{1} : θ_{class} > θ_{education} > θ_{income} > 0, \\ M_{2} : θ_{education} > (θ_{class}, θ_{income}) > 0, \\ M_{3} : θ_{class} = θ_{education} = θ_{income} > 0, \\ M_{4} : “ neither M_{1}, M_{2},nor M_{3} ”, \end{array}

where $θ_{class}$ , $θ_{education}$ , and $θ_{income}$ denote the effects of social class, educational level, and income on attitude toward immigrants, respectively. Consequently, the goal is to quantify the evidence in the data for these three models to determine which model receives the most support.

Note that nuisance parameters (e.g., effects of control variables) are omitted in the above formulation of the models of interest to simplify the notation. Further note that additional competing constrained models could be formulated in this context as well. For the current application, however, we restrict ourselves to these models.

Application 2: The Importance of Postmaterialism for Young, Middle, and Old Generations

Experiences in preadult years are known to have a crucial impact on the development of basic values in later life. Due to the increase in welfare in recent decades, Generational Replacement Theory predicts that the values of younger generations are different from those of older generations. In particular, postmaterialistic values, such as the desire for freedom, self-expression, and quality of life, are expected to have increased for younger generations as a result of improved economic standards in Western countries (Inglehart and Abramson 1999; Welzel and Inglehart 2005).

In the European Values Study, generation was operationalized using an ordinal variable with three categories corresponding to a young, middle, or old generation. Similarly, postmaterialism has been measured on an ordinal scale as well, having three categories. When setting the younger generation as the reference group and using dummy variables for the middle and older generations, Generational Replacement Theory can be translated to an order-constrained model (M₁). We contrast it with a model that assumes no generation effect on postmaterialism (M₀) and with a complementary model that assumes neither an increased effect nor a zero effect (M₂).

The models of interest can be summarized as follows:

\begin{array}{l} M_{0} : θ_{old} = θ_{middle} = 0, \\ M_{1} : θ_{old} < θ_{middle} < 0, \\ M_{2} : “ neither M_{0},nor M_{1} . ” \end{array}

Furthermore, we hypothesize that the inclusion of order constraints on the generational effects of interest results in an increase of statistical power in comparison to testing the classical alternative, say, $M_{3} : θ_{young} \neq θ_{middle} \neq 0$ versus the null model $M_{2} : θ_{young} = θ_{middle} = 0$ . In terms of the BIC, this implies we would obtain more evidence against M₀ when testing it against the order-constrained model M₁ (if the constraints are supported by the data) than when testing M₀ against the unconstrained alternative M₃.

BIC Approximations of the Marginal Likelihood

In this section, extensions of the BIC are derived for a model with order (or inequality) constraints on certain model parameters. Consider an order-constrained model M₁ with d unknown model parameters, denoted by $θ$ , which are restricted by r ₁ order constraints, that is, $M_{1} : R_{1} θ > r_{1}$ where $[R_{1} | r_{1}]$ is an augmented $r_{1} \times (d + 1)$ matrix containing the coefficients of the order constraints under M₁.

For example, the order-constrained model $M_{1} : θ_{class} > θ_{education} > θ_{income} > 0$ in equation (2) can be translated to

R_{1} θ > r_{1} \Leftrightarrow [\begin{matrix} 0 & 1 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & - 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{matrix}] [\begin{array}{l} θ_{0} \\ θ_{class} \\ θ_{education} \\ θ_{income} \\ θ_{gender} \\ σ^{2} \end{array}] > [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}],

where the first element of $θ$ denotes the intercept, the fifth element the gender effect, and the sixth element denotes the error variance, which are nuisance parameters. The order-constrained model is nested in an unconstrained model that will be denoted by M_u.

The likelihood function under M₁ is a truncation of the likelihood under an unconstrained model, that is, $p_{1} (D | θ) = p (D | θ) \times I_{Θ_{1}} (θ)$ , where $p (D | θ)$ denotes the likelihood function of the data $D$ under the unconstrained parameter space Θ, Θ ₁ denotes the order-constrained subspace under M₁, and I denotes the indicator function $Θ$ . The prior for $θ$ under M₁ will be denoted by $p_{1} (θ)$ . Two different types of priors will be considered for approximating the marginal likelihoods under M₁ and M_u.

Truncated Unit Information Prior

First, we assume that the unconstrained posterior mode, denoted by ${\tilde{θ}}_{u}$ , falls in the inequality-constrained space of model M₁, that is, $R_{1} \tilde{θ} > r_{1}$ . The BIC approximation of the marginal likelihood under the inequality-constrained model is then obtained using a second-order Taylor expansion of the logarithm of the integrand around the posterior mode. This approximation introduces in an error that is $O (n^{- 1})$ .¹ Let us define $g (θ) = log p_{1} (D | θ) + log p_{1} (θ)$ . Then, the marginal likelihood can be derived by

\begin{matrix} log p_{1} (D) = log \int_{R_{1} θ > r_{1}} p_{1} (D | θ) p_{1} (θ) d θ \\ = log \int_{R_{1} θ > r_{1}} exp {g (θ)} d θ \\ = log \int_{R_{1} θ > r_{1}} exp {g ({\tilde{θ}}_{u}) + \frac{1}{2} (θ - {\tilde{θ}}_{u})^{'} H ({\tilde{θ}}_{u}) (θ - {\tilde{θ}}_{u})} d θ + O (n^{- 1}) \\ = log p_{1} (D | {\tilde{θ}}_{u}) + log p_{1} ({\tilde{θ}}_{u}) \\ + log \int_{R_{1} θ > r_{1}} exp {\frac{1}{2} (θ - {\tilde{θ}}_{u})^{'} H ({\tilde{θ}}_{u}) (θ - {\tilde{θ}}_{u})} d θ + O (n^{- 1}) \\ = log p_{1} (D | {\tilde{θ}}_{u}) + log p_{1} ({\tilde{θ}}_{u}) + \frac{d}{2} log (2 π) - \frac{1}{2} log | - H ({\tilde{θ}}_{u}) | \\ + log  Pr (R_{1} θ > r_{1} | D, M_{u}) + O (n^{- 1}), \end{matrix}

where $H ({\tilde{θ}}_{u})$ denotes the Hessian matrix of second-order partial derivatives of $g (θ)$ evaluated at ${\tilde{θ}}_{u}$ .

Hence, the only difference with the original derivation is that the resulting approximation also includes the posterior probability that the order constraints of M₁ hold under the larger unconstrained model M_u. From large sample theory, the unconstrained posterior mode can be approximated with the unconstrained MLE, that is, ${\tilde{θ}}_{u} \approx {\hat{θ}}_{u}$ , and $- H ({\tilde{θ}}_{u}) \approx n I_{E} ({\hat{θ}}_{u})$ , where $I_{E} ({\hat{θ}}_{u})$ is the expected Fisher information matrix of one observation (which can be obtained using standard statistical software). This introduces an additional approximation error of $O (n^{- 1 / 2})$ . Subsequently, the approximated logarithm of the marginal likelihood is given by

\begin{matrix} log p_{1} (D) = log p_{1} (D | {\hat{θ}}_{u}) + log p_{1} ({\hat{θ}}_{u}) + \frac{d}{2} log (2 π) - \frac{d}{2} log (n) \\ - \frac{1}{2} log | I_{E} ({\hat{θ}}_{u}) | + log  Pr (R_{1} θ > r_{1} | D, M_{u}) + O (n^{- \frac{1}{2}}) . \end{matrix}

As was pointed out by Raftery (1995), certain terms cancel out when plugging in the so-called unit information prior (see also Kass and Wasserman 1995). The unit information prior has a multivariate normal distribution with mean equal to the MLE and variance equal to the inverse of the expected Fisher information matrix of one observation, that is, $p^{UI} (θ) = N ({\hat{θ}}_{u}, I_{E} {({\hat{θ}}_{u})}^{- 1})$ . Under the constrained model M₁, we propose using a truncated unit information prior, that is,

p_{1}^{UI} (θ) = p^{UI} (θ) \times I (R_{1} θ > r_{1}) \times {Pr}^{UI} {(R_{1} θ > r_{1} | M_{u})}^{- 1},

where the prior probability serves as a normalization contant so that the truncated unit information prior integrates to one, that is,

{Pr}^{UI} (R_{1} θ > r_{1} | M_{u}) = \int_{R_{1} θ > r_{1}} p^{UI} (θ) d θ .

Evaluating the logarithm of the unconstrained unit information prior at the unconstrained MLE yields $log p^{UI} ({\hat{θ}}_{u}) = - \frac{d}{2} log (2 π) + \frac{1}{2} log | I_{E} ({\hat{θ}}_{u}) |$ , and therefore, equation (5) becomes

\begin{matrix} log p_{1} (D) = log p_{1} (D | {\hat{θ}}_{u}) + - \frac{d}{2} log (n) + log Pr (R_{1} θ > r_{1} | D, M_{u}) \\ - log {Pr}^{UI} (R_{1} θ > r_{1} | M_{u}) + O (n^{- 1 / 2}) . \end{matrix}

The corresponding order-constrained BIC is then obtained by multiplying the logarithm of the approximated marginal likelihood by $- 2$ and ignoring the error term. This yields

\begin{matrix} OC - BIC (M_{1}) = - 2 log p_{1} (D | {\hat{θ}}_{u}) + d log (n) - 2 log Pr (R_{1} θ > r_{1} | D, M_{u}) \\ + 2 log {Pr}^{U I} (R_{1} θ > r_{1} | M_{u}), \end{matrix}

where the first two terms form the ordinary BIC of model M₁ without the order constraints (i.e., M_u), and the additional third and fourth term are used for the evaluation of the order constraints of M₁ within M_u.

Next, we consider the case where the unconstrained posterior mode does not lie in the inequality-constrained subspace of M₁, that is, $R_{1} \tilde{θ} ≱ r_{1}$ . In this case, the second-order Taylor expansion of $g (θ)$ around the unconstrained posterior mode (or MLE) may not be a good approximation. The rationale is that the mode under M₁, which will have a nonzero gradient, will lie on the boundary space where $R_{1} θ = r_{1}$ .²

Because of the exponential tails of the normal distribution, a first-order Taylor expansion of $g (θ)$ at the posterior mode under M₁, denoted by ${\tilde{θ}}_{1}$ , seems more appropriate (Avramidi 2000). For example, let us consider a simple inequality-constrained model, $M_{1} : θ \geq 0$ , and let the unconstrained mode be smaller than 0, that is, $\tilde{θ} < 0$ , so that the posterior mode under M₁ is located on the boundary, that is, ${\tilde{θ}}_{1} = 0$ , which has a negative gradient, $g^{'} (0) < 0$ . The function $g (θ)$ for such a situation is plotted in Figure 1 (black line; solid line under $θ \geq 0$ , dotted line under $θ ≱ 0$ ). The second-order Taylor approximation at the unconstrained mode is also plotted (red line; solid line under $θ \geq 0$ , dotted line under $θ ≱ 0$ ). A first-order Taylor expansion at ${\tilde{θ}}_{1} = 0$ can be used to approximate the function in the region $θ \geq 0$ according to

g (θ) = g (0) + g^{'} (0) θ + O (θ^{2}) .

Figure 1.

Plot of the log of prior times likelihood, $g (θ)$ (black line), first-order Taylor approximation at $θ = 0$ (green line), and second-order Taylor approximation around the unconstrained posterior mode of $\tilde{θ} \approx - .5$ (red line), for an inequality-constrained model $M_{1} : θ \geq 0$ . The left panel displays the functions of the log scale and the right panel on the regular scale. The inequality-constrained region under $M_{1} : θ \geq 0$ has solid lines, and the complement region has dotted lines.

The marginal likelihood can then be approximated as follows:

\begin{matrix} log p_{1} (D) = log \int_{θ > 0} p_{1} (D | θ) p_{1} (θ) d θ = log \int_{θ > 0} exp {g (θ)} d θ \\ \approx log \int_{θ > 0} exp {g (0) + g^{'} (0) θ} d θ = g (0) - log (- g^{'} (0)) . \end{matrix}

Hence, instead of the normal distribution which is used to compute the integral in the case of a second-order Taylor expansion, an exponential distribution is used to compute the integral using this first-order Taylor expansion. The approximated line is also plotted in Figure 1 (green line).

The figure suggests that the second-order Taylor approximation at the unconstrained posterior mode is less accurate than the first-order Taylor approximation at the boundary point. This suggests that the approximated marginal likelihood under the inequality-constrained model will generally be better using first-order approximation at the boundary point in the case that the inequality constraints are not supported by the data. In the remainder of this article, however, we shall use the second-order Taylor approximation at the unconstrained posterior mode both when the posterior mode does and does not lie in the subspace of the inequality-constrained model under investigation.

When the order constraints are not supported by the data, the crudeness of the approximation is less important because the order-constrained model will not be selected because of the bad fit. Instead another, better fitting model will be selected for which the approximated marginal likelihood can be accurately estimated. Another reason for working with the second-order Taylor approximation is that it can easily be computed using equation (7), also for more complex systems of inequality constraints on multiple parameters, for example, $θ_{1} > θ_{2} > θ_{3} > 0$ , than when using a first-order Taylor approximation at boundary point of the inequality-constrained subspace where the mode is located. The numerical experiments presented later illustrate that the approximation error is acceptable when the posterior mode does not lie in the constrained subspace.

It has been argued that data-based priors, such as the unit information prior, may result in Bayes factors that do not function as an Occam’s razor when evaluating inequality-constrained models (Mulder 2014a, 2014b). To see that this is also the case for the unit information prior, the approximated Bayes factor of an inequality-constrained model M₁ against an unconstrained model M_u (where the inequality constraints are omitted) is given by

B_{1 u}^{UI} \approx \frac{Pr (R_{1} θ > r_{1} | D, M_{u})}{{Pr}^{UI} (R_{1} θ > r_{1} | M_{u})} .

This follows automatically from equation (7).

Now in the case of overwhelming evidence for M₁, that is, $R_{1} {\hat{θ}}_{u} ≫ r_{1}$ , both the posterior probability and the prior probability based on the unit information prior will be approximately1, resulting in equal evidence for M₁ and M_u. This is a consequence of the fact that the unit information prior is concentrated around the MLE. Because both models fit the data equally well while the inequality-constrained model can be viewed as a less complex model (because a “smaller” subspace is spanned). This property suggests that the approximated Bayes factor does not properly function as an Occam’s razor.

Truncated Local Unit Information Prior

Due to the behavior of the unit information prior when evaluating order-constrained models, we consider a “local” unit information prior with a mean that is located on the boundary of the inequality-constrained space (we borrow the term “local” from Johnson and Rossell 2010). Note that the boundary space is equal to the parameter space under the null model $M_{0} : R_{1} θ = r_{1}$ . The rationale for centering the prior around the null space dates back at least to Jeffreys (1961) who argued that when the null model is false, the effects are expected to be close to the null; otherwise, there is no point in testing the null. This implies that the prior under the alternative model should be located around the null value. Furthermore, there have been reports in the literature where the use of such local priors results in desirable selection behavior when evaluating order-constrained models (e.g., Mulder 2014a; Mulder, Hoijtink, and Klugkist 2010). Here, we explore this class of priors for the BIC.

We set the mean of the local unit information prior equal to the MLE under the null model, denoted by ${\hat{θ}}_{0}$ . Furthermore, the covariance matrix will be equal to the covariance matrix of the unit information prior. Thus, the unconstrained local unit information prior can be written as $p^{LUI} (θ) = N ({\hat{θ}}_{0}, I_{E} {({\hat{θ}}_{u})}^{- 1})$ . The truncated prior under $M_{1} : R_{1} θ > r_{1}$ is then equal to

p_{1}^{LUI} (θ) = p^{LUI} (θ) \times I (R_{1} θ > r_{1}) \times \frac{1}{{Pr}^{LUI} (R_{1} θ > r_{1} | M_{u})} .

By applying equation (14) in Kass and Raftery (1995), changing the unit information prior to the local unit information prior results in an approximated logarithm of the marginal likelihood of

\begin{matrix} log {\hat{p}}_{1}^{LUI} (D) = log p (D | {\hat{θ}}_{u}) - \frac{d}{2} log (n) - \frac{1}{2} ({\hat{θ}}_{u} - {\hat{θ}}_{0})^{'} I_{E} (\hat{θ}) ({\hat{θ}}_{u} - {\hat{θ}}_{0}) \\ + log P r (R θ > r | D, M_{u}) - log P r^{LUI} (R θ > r | M_{u}) . \end{matrix}

Consequently, the approximated Bayes factor based on the local unit information prior of an inequality-constrained model against an unconstrained model is given by

B_{1 u}^{LUI} \approx \frac{Pr (R_{1} θ > r_{1} | D, M_{u})}{{Pr}^{LUI} (R_{1} θ > r_{1} | M_{u})} .

Now in the case of overwhelming evidence for M₁, in the sense that $R_{1} {\hat{θ}}_{u} ≫ r_{1}$ , the Bayes factor will be equal to the reciprocal of the prior probability that the inequality constraints hold under the unconstrained local unit information prior, that is, $B_{1 u}^{LUI} \approx {({Pr}^{L} (R_{1} θ > r_{1} | M_{u}))}^{- 1}$ , which is strictly larger than 1 because the prior mean is located on the boundary of the constrained space where $R_{1} θ = r_{1}$ . Note that this prior probability can be viewed as a quantification of the relative size of the inequality-constrained subspace. For example, in the case of a diagonal covariance matrix, the prior probability of k one-sided constraints, $θ > 0$ , is equal to $2^{- k}$ , and the prior probability of k order constraints, $θ_{1} < \dots < θ_{k}$ , is equal to ${(k!)}^{- 1}$ , similar to the Bayes factors proposed by Mulder, Hoijtink, and Klugkist (2010) and Morey and Wagenmakers (2014).

Instead of working with equation (9), we consider a slightly cruder approximation where the third term, which quantifies prior fit, is omitted. This yields

\begin{matrix} log {\hat{p}}_{1}^{LUI *} (D) = log p_{u} (D | {\hat{θ}}_{u}) - \frac{d}{2} log (n) + log   Pr (R θ > r | D, M_{u}) \\ - log   Pr^{LUI} (R θ > r | M_{u}) . \end{matrix}

The rationale for omitting this term is that we are not interested in quantifying prior misfit. Another reason is that equation (11) can be combined with the ordinary BIC approximation for an unconstrained model (i.e., “ $log p (D | {\hat{θ}}_{u}) - \frac{d}{2} log (n)$ ”) to obtain the approximated Bayes factor in equation (10).

The terms on the right-hand side of equation (11) have the following intuitive interpretations. The first and second term can be interpreted as measures of model fit and model complexity of the unconstrained model where the inequality constraints are excluded (similar to the ordinary BIC approximation based on the unit information prior). The third term, which is the approximated posterior probability that the inequality constraints hold under the unconstrained model, can be interpreted as a measure of the relative fit of an order-constrained model M₁ relative to the unconstrained model M_u. Finally, the fourth term, which is the local prior probability that the order constraints hold under the unconstrained model, can be interpreted as a measure of the relative complexity of the order-constrained model M₁ relative to the unconstrained model M_u.

Thus, equation (11) will behave as an Occam’s razor when evaluating order-constrained models by balancing the fit and complexity of the order-constrained model. The corresponding order-constrained BIC based on the local unit information prior then yields

\begin{matrix} OC - BIC (M_{1}) = - 2 log p_{u} (D | {\hat{θ}}_{u}) + d log (n) - 2 log   Pr (R θ > r | D, M_{u}) \\ + 2 log   Pr^{LUI} (R θ > r | M_{u}) . \end{matrix}

Comparison With Other BIC Extensions

The order-constrained BIC in equation (12) shows some similarities with the BIC extensions proposed by Romeijn et al. (2012) and Morey and Wagenmakers (2014). In the proposal of Romeijn et al., the prior can be chosen by users allowing a subjective quantification of the relative size of the constrained space. Although this may be useful in certain situations, the BIC is typically used in an automatic fashion, and thus, it may be preferable to also let the prior probability be based on a default prior. The advantage of using the local unit information prior for this purpose is that it results in a reasonable default measure for the relative size of an order-constrained parameter space because the prior is centered on the boundary of the constrained space (unlike the [nonlocal] unit information prior). For example, when considering a univariate one-sided constraint, $θ < 0$ , the prior probability based on the local unit information prior will be $\frac{1}{2}$ , which seems reasonable because half of the unconstrained space of $θ$ is covered by the one-sided constraint.

Furthermore, Romeijn et al. set the posterior probability that the order constraints hold to 1 in the case the MLE is in agreement with the constraints and 0 elsewhere. This additional approximation step follows directly from large sample theory: When the sample size goes to infinity, the posterior probability converges to 1 if the true parameter value is an interior point of the order-constrained subspace and 0 if it is an interior point of the complement of this subspace. Thus, for extremely large samples, the prior-adapted BIC of Romeijn et al. may perform similarly to the order-constrained BIC in equation (12).

For small to moderately sized samples, however, setting the posterior probability to either 1 or 0 may result in crude approximations of the posterior probability. As will be shown in the empirical application in Application 1: Assessing the Importance of Different Dimensions of Socioeconomic Status subsection, for example, the posterior probabilities that two competing sets of order constraints hold under an unconstrained model are equal to .50 and .18. Setting these probabilities to 1 and 0, respectively, would result in an unnecessarily crude estimate of the marginal likelihood. Instead, we recommend using the actual posterior probability that the order constraints hold based on the unconstrained approximated posterior (the third term in equation [12]).

In the proposal of Morey and Wagenmakers (2014), the prior probability that a specific ordering of d parameters hold, for example, $θ_{1} < \dots < θ_{d}$ , is set to $1 / d!$ . This probability is thus based on the assumption that each ordering is equally likely a priori, similar to the priors proposed by Mulder et al. (2010) and Klugkist, Laudy, and Hoijtink (2005) when using Bayes factors. This probability however holds only for specific covariance structures such as a diagonal covariance structure. The prior probability may not be invariant for reparameterizations of the model (see also Mulder 2014a). For example, if we would define $ξ_{d^{'}} = θ_{d^{'}} - θ_{d^{'} - 1}$ , for $d^{'} = 2, \dots, d$ , and $ξ_{1} = θ_{1}$ , the above order constraints would be equivalent to the one-sided constraints $(ξ_{1}, \dots, ξ_{d - 1}) > 0$ . If one would use a prior diagonal covariance structure for $ξ$ and zero means, the prior probability would be equal to ${1 / 2}^{d - 1}$ . This may be very different from $1 / d!$ , resulting in a serious violation of invariance to reparamaterizations. The prior probability based on the local unit information (the fourth term on the right-hand side of equation [12]) on the other hand would be invariant for such reparameterizations as the prior covariance structure is automatically transformed along with the reparameterization.

Numerical Analyses

The behavior of approximated Bayes factors based on the unit information prior and the local unit information prior will be investigated in a numerical example of the linear regression model, $y_{i} = θ_{0} + θ_{1} x_{i 1} + θ_{2} x_{i 2} + ∊_{i}$ , with $∊_{i} \sim N (0, σ^{2})$ , for $i = 1, \dots, n$ . Here, $θ_{0}$ is the intercept, and $θ_{1}$ and $θ_{2}$ are the effects of the first and second predictor. We consider a model selection problem between an order-constrained model $M_{1} : θ_{2} > θ_{1} > 0$ , a null model $M_{0} : θ_{2} = θ_{1} = 0$ , and the complement model, $M_{2} : θ_{2} ≱ θ_{1} ≱ 0$ . To gain more insight into the behavior of the criterion as an Occam’s razor, we also test the order-constrained model M₁ against the unconstrained model, $M_{u} : (θ_{1}, θ_{2}) \in ℝ^{2}$ .

Statistical Evidence for Order-constrained Models

To better understand how the approximated Bayes factors quantify statistical evidence for order-constrained models, we computed the approximated Bayes factors for data with $({\hat{θ}}_{1}, {\hat{θ}}_{2}) = (a,2 a)$ , for $a \in (- 1.5, 1.5)$ , while fixing ${\hat{θ}}_{0} = 0$ , ${\hat{σ}}^{2} = 1$ , $n = 20$ , and $X' X = [n 0 0; 0 n n / 2; 0 n / 2 n]$ (the exact choice of these fixed values did not qualitatively affect the results). Thus, there is evidence for M₁, M₀, and M₂ when $a > 0$ , $a = 0$ , and $a < 0$ , respectively.

The logarithm of the approximated Bayes factors can be found in Figure 2. Based on the approximated Bayes factors of M₁ versus M_u (left panel), we can see that the evidence based on the unit information prior (dotted line) for M₁ against M_u starts to decrease for larger effects (for approximately $a = .3$ and larger), which seems counterintuitive. Eventually, the weight of evidence (i.e., the log Bayes factor) converges to 0. Thus, in the case of overwhelming evidence for an order-constrained model, we obtain equal evidence for an order-constrained model M₁ that is fully supported by the data and the “larger” unconstrained model M_u when using the unit information prior, even though M₁ is a simpler model. This suggests that the approximated Bayes faction based on the unit information prior does not work as an Occam’s razor when evaluating order-constrained models.

Figure 2.

The logarithm of the approximated Bayes factors based on the unit information prior $log ({\hat{B}}^{UI})$ (dotted line), and the local unit information prior $log ({\hat{B}}^{L})$ (solid line) of $M_{1} : θ_{2} > θ_{1} > 0$ versus $M_{u} : (θ_{1}, θ_{2}) \in ℝ^{2}$ (left panel), of M₁ versus the complement model M₂ (middle panel), and of M₁ versus $M_{0} : θ_{2} = θ_{1} = 0$ (right panel). The criteria are plotted for $n = 20$ as a function of a, where $({\hat{θ}}_{1}, {\hat{θ}}_{2}) = (a,2 a)$ .

The evidence for M₁ against M_u based on the local unit information prior (solid line) on the other hand increases as a function of a. Eventually, the weight of evidence converges to the reciprocal of the prior probability of $θ_{2} > θ_{1} > 0$ under the unconstrained model M_u, which is strictly larger than 0. Furthermore, the local unit information prior results in more evidence for a model that is supported by the data in comparison to the unit information prior when comparing model M₁ versus model M₂ (Figure 2, middle panel) and model M₁ versus M₀ (Figure 2, right panel). Based on these considerations, we conclude that the order-constrained BIC based on the local unit information prior better balances fit and complexity when evaluating order-constrained models than the order-constrained BIC based on the unit information prior.

Error Probabilities

Next, we investigate the probabilities of selecting the true data generating model when including order constraints in the alternative model or not. First, we consider testing the null model, $M_{0} : θ_{1} = θ_{2} = 0$ , against an unconstrained alternative, $M_{u} : θ \in ℝ^{2}$ , using the ordinary BIC. Second, we consider testing the null model $M_{0} : θ_{1} = θ_{2} = 0$ versus $M_{1} : θ_{2} > θ_{1} > 0$ against two order-constrained alternative, namely, $M_{2} : θ_{2} ≱ θ_{1} ≱ 0$ , using the two order-constrained BICs. Note that the BIC for M₀, with no inequality constraints, is the same in both tests. Further note that because the second test contains three models instead of two, the error probabilities in the second selection problem will be slightly larger when M₀ is true, as a result of the design. The true effects will be set to $(θ_{1}, θ_{2}) = (a,2 a)$ , for $a = 0$ , so that M₀ is true, and $a = .1, . 2, and .4$ , so that M_u (M₁) is true in the first (second) test.

Figure 3 displays the error probabilities as a function of the sample size (on a log scale). All the criteria show consistent behavior in the sense that the error probabilities go to 0 as the sample size grows. Furthermore, we see that when M₀ is true (upper left panel), the error probabilities are very similar and the ordinary BIC in test 1 results in the smallest errors. In the case of a true effect in the direction of the order constraints of M₁, we see that the order-constrained BIC based on the local unit-information prior results in considerably smaller errors than the other criteria.

Figure 3.

Probability of selecting the wrong model when using the ordinary Bayesian information criterion (BIC) for testing $M_{0} : θ_{1} = θ_{2} = 0$ against $M_{u} : θ \in R^{2}$ (dashed line), and the two order-constrained BICs when testing $M_{0} : θ_{1} = θ_{2} = 0$ , $M_{1} : θ_{2} > θ_{1} > 0$ , and $M_{2} : θ_{2} ≱ θ_{1} ≱ 0$ (dotted and solid line for the nonlocal and local unit-information prior, respectively) for true effects of $(θ_{1}, θ_{2}) = (a,2 a)$ , for $a = 0, .1, . 2, and .4$ . The sample size on the x-axis is on a logarithmic scale.

The error probabilities of the order-constrained BIC based on the local unit-information prior were only slightly larger in the case of a nonzero effect. This is partly a consequence of the design of the test having three instead of two models under investigation. We conclude that overall, the order-constrained BIC based on the local unit-information prior performs best in terms of error probabilities.

Approximation Errors of the Order-constrained BICs

Finally, we investigated the relative approximation errors of the order-constrained BICs by comparing them to nonapproximated counterparts, for example, $\frac{log B_{12} - log {\hat{B}}_{12}}{log B_{12}}$ for model M₁ against M₂. The approximation errors were investigated when the order-constrained model is supported by the data, namely, when testing $M_{1} : θ_{2} > θ_{1} > 0$ versus $M_{0} : θ_{1} = θ_{2} = 0$ , with $({\hat{θ}}_{1}, {\hat{θ}}_{2}) = (.5, 1)$ , and when the order constrained is not supported by the data, namely, when testing $M_{1} : θ_{2} > θ_{1} > 0$ against its complement $M_{2} : θ_{1} ≯ θ_{2} ≯ 0$ , with $({\hat{θ}}_{1}, {\hat{θ}}_{2}) = (- .5, - 1)$ , while increasing the sample size.

The results can be found in Figure 4. As can be seen from the left panel, the relative error goes to 0 fast when the effects are in agreement with the order constraints of model M₁. When the effect are not in agreement with the constraints (right panel), we see that the relative error does not go to zero. This is a consequence of the somewhat crude approximation we already observed in Figure 1 (red line). The approximation error, however, is not large enough to be a serious practical problem. Other settings resulted in qualitatively similar results.

Figure 4.

Left panel: Relative approximation error of the order-constrained Bayesian information criterion (BIC) of $M_{1} : θ_{2} > θ_{1} > 0$ versus $M_{0} : θ_{1} = θ_{2} = 0$ when $({\hat{θ}}_{1}, {\hat{θ}}_{2}) = (.5, 1)$ . Right panel: Relative approximation error of the order-constrained BIC of $M_{2} : θ_{1} ≯ θ_{2} ≯ 0$ versus $M_{1} : θ_{2} > θ_{1} > 0$ when $({\hat{θ}}_{1}, {\hat{θ}}_{2}) = (- .5, - 1)$ .

Software

The R package “BICpack” was developed for evaluating order-constrained models using the order-constrained BIC based on the local unit-information prior. The R functions can be downloaded from www.github.com/jomulder/BICpack ³ or from CRAN in the near future. The order-constrained BIC based on the truncated unit information prior was not considered because of its poorer performance that we observed in the numerical simulations. The package makes use of the mvtnorm-package (Genz et al. 2016) for computing the probabilities in equation (11). The key function is “bic_oc,” which can be used for computing the order-constrained BIC for various statistical models including generalized linear models and survival models. As input the function needs a fitted model object (e.g., a fitted glm-object or coxph-object), a character string denoting the order constraints on certain parameters, and a Boolean argument denoting whether the order-constrained subspace or its complement is considered (the default is the order-constrained subspace).

For example, in the case of a regression model with three predictors, say, X1, X2, and X3, on an outcome variable y, and it is expected that X1 has the largest effect on the outcome variable, followed by X2, and X3 is expected to have the smallest effect, and all effects are expected to be positive, the order-constrained BIC can be computed by executing the following lines

fit1 <- glm(y ˜ X1 + X2 + X3, data)bic_oc(fit1, ″0 < X1 < X2 < X3″)

The use of the function will be illustrated in two empirical applications in the next section.

Empirical Applications Revisited

The models from the applications in the second section are evaluated using the order-constrained BIC based on the local unit information prior using the R package BICpack. For the empirical analyses presented in this section, only the European Values Study data of Germany are considered.

Application 1: Assessing the Importance of Different Dimensions of Socioeconomic Status

Model 1 can be fitted in R using the lm function:

lm1 <- lm(atti_immi ˜ class + education + income + gender, data=EVS_Germany)

The estimated coefficients of interest were $(β_{class}, β_{education}, β_{income}) = (.312, .250, .041)$ , with standard errors .067, .075, and .072, respectively.

The order-constrained BIC for model $M_{1} : θ_{class} > θ_{education} > θ_{income} > 0$ in equation (2) can then be computed using the new bic_oc-function:

bic_oc(lm1, ″class > education > income > 0″)

This resulted in a BIC of 3,918.46. The function also provides the posterior probability that the constraints hold under the unconstrained model, which was equal to .50. Next, the BIC for model $M_{2} : θ_{education} > (θ_{class}, θ_{income}) > 0$ is computed using the command

bic_oc(lm1, ″education > (class, income) > 0″)

The resulting order-constrained BIC was 3,921.98. For this set of constraints, the posterior probability under the unconstrained model equaled .18.

The BIC for model $M_{3} : θ_{class} = θ_{education} = θ_{income} > 0$ is computed. First, the model is fitted with the equality constraints on the effect but without the inequality constraint. Because the effects of social class, education, and income are equal under M₃, the regression model in equation (1) becomes

atti_immi = θ_{0} + (class + education + income) \times θ_{class .educ .income} + error

where $θ_{class .educ .income}$ denotes the equal effect of social class, educational level, and income on attitude toward immigrants. Thus, this model can be fitted by including the sum of the class, education, and income as a linear predictor:

EVS_Germany$class.educ.income <- EVS_Germany$class + EVS_Germany$education + EVS_Germany$income lm2 <- lm(atti_immi ˜ class.educ.income + gender, data=EVS_Germany)

The order-constrained BIC can then be computed based on the resulting fitted model:

bic_oc(lm2, ″class.educ.income > 0″)

This resulted in a BIC of 3,917.84.

Finally, to compute the BIC of the complement model $M_{4} : “neither M_{1}, M_{2}, nor M_{3}, ”$ , first note that the marginal likelihood of the union of M₁, M₂, and M₃ would be the same as the marginal likelihood of the union of only M₁ and M₂ because M₃ has zero probability due to the presence of the equality constraints of M₃. Thus, we need to compute the marginal likelihood of the complement model of the joint of models M₁ and M₂. First, we combine the two sets of order constraints in one vector and then compute the order-constrained BIC using the new function:

constraints_M4 <- c(″class > education > income > 0″, ″education > (class, income) > 0″) bic_oc(lm1, constraints_M4, complement = TRUE)

This resulted in a BIC of 3,926.13. The BIC values are summarized in Table 1. From these values, we can conclude that model M₃ receives most support, but the evidence is negligible in comparison to the evidence for the order-constrained model M₁, given the BIC difference of .62.⁴ The evidence for M₂ and M₃ is considerably lower than for M₁ and M₃.

Table 1.

Order-constrained BICs and Posterior Model Probabilities for the Competing Models in Application 1.

Models	OC-BIC*	$P (M_{t} \| D)$
$M_{1} : θ_{class} > θ_{education} > θ_{income} > 0$	3,918.46	.391
$M_{2} : θ_{education} > (θ_{class}, θ_{income}) > 0$	3,921.98	.067
$M_{3} : θ_{class} = θ_{education} = θ_{income} > 0$	3,917.84	.533
$M_{4} :$ “neither M₁, M₂, nor M₃”	3,926.13	.008

Note: BIC = Bayesian information criterion.

For interpretation purposes, it can be useful to translate the BICs to posterior model probabilities. A posterior model probability quantifies the probability of the data having been generated by one of the models considered, after observing the data given certain prior model probabilities. This probability is conditional on the data having been generated by one of the models considered.

In this application, we assume equal prior probabilities for the models. The posterior model probabilities can be computed from the BIC values using the “postprob” function in BICpack. The posterior probabilities together with the BICs can be found in Table 1. Hence, the posterior probability for model M₃, which assumes equal and positive effects of social class, education, and income on attitude toward immigrants, is largest with 53.3 percent. The posterior probability of M₁, which assumed ordered positive effects of social class, education, and income based on the Ethnic Competition Theory, is only slightly smaller with 39.1 percent. There is not much evidence for either M₂ or M₂, given their posterior probabilities of 6.7 percent and 0.8 percent, respectively. In sum, we can conclude that there is considerable model uncertainty, and more data would be needed to choose a single best model.

Application 2: The Importance of Postmaterialism for Young, Middle, and Old Generations

Because the outcome variable “postmaterialism” has an ordinal measurement level with three categories (“low,” “medium,” and “high”), an ordinal regression model can be fitted using the polr-function of the MASS-package. Thus, the ordinal variable “postmaterialism” is regressed to the ordinal predictor “generation” with categories “young,” “middle,” and “old” while controlling for “gender,” “income,” and “education”:

fit3 <- polr(postmaterial ˜ generation + gender + income + education, data=EVS_Germany, Hess=TRUE)

In the fitted model, the “young” generation is the reference group, and dummy variables are created for the “middle” and “old” generation. These variables are called “generationmiddle” and “generationold” in the fitted polr-object. The estimated effects under this model were equal to $({\hat{θ}}_{generationmiddle}, {\hat{θ}}_{generationold}) = (- .444, - .848)$ , having standard errors of .154 and .150, respectively.

Thus, the order-constrained BIC of model $M_{1} : θ_{generationold} < θ_{generationmiddle} < 0$ , representing the Generational Replacement Theory, can be computed by the command

bic_oc(fit3, ″generationold < generationmiddle < 0″)

The resulting BIC equaled 3,154.82.

Next, the BIC of the null model $M_{0} : θ_{generationold} = θ_{generationmiddle} = 0$ is computed with no generation effect. Because this model does not contain any order constraints, we can simply compute an ordinary BIC. This can also be done using the bic_oc-function by omitting any order constraints:

fit4 <- polr(postmaterial ˜ 1 + gender + income + education, data=EVS_Germany, Hess=TRUE) bic_oc(fit4)

The resulting BIC was equal to 3,177.69.

Finally, the BIC of the complement model was computed. Similarly to the previous example, this can be done as follows:

bic_oc(fit3, ″generationold < generationmiddle < 0″, complement=TRUE)

This resulted in a BIC of 3,170.15. The BICs and respective posterior model probabilities can be found in Table 2. Clearly, there is overwhelming evidence for M₁ which implies that postmaterialism has increased for younger generations.

Table 2.

Order-constrained BICs and Posterior Model Probabilities for the Competing Models in Application 2.

Models	OC-BIC*	$P (M_{t} \| D)$
$M_{0} : θ_{old} = θ_{middle} = 0$	3,177.69	0.00
$M_{1} : θ_{old} < θ_{middle} < 0$	3,154.82	1.00
$M_{2} :$ “neither M₀, nor M₁”	3,170.15	0.00

Note: BIC = Bayesian information criterion.

Finally, we show that the inclusion of order constraints in the alternative model results in more evidence against a null model if the order constraints are supported by the data. First, note that the BIC for the order-constrained model, M₁, against the null model, M₀, equals $BIC (M_{1}, M_{0}) = BIC (M_{1}) - BIC (M_{0}) = 3, 154.82 - 3, 177.69 = - 22.87$ . The BIC for an unconstrained alternative model, $M_{3} : θ_{generationold} \neq θ_{generationmiddle} \neq 0$ , against the null model equals $BIC (M_{3}, M_{0}) = BIC (M_{3}) - BIC (M_{0}) = 3, 158.18 - 3, 177.69 = - 19.51$ .⁵ Hence, the inclusion of order constraints results in a substantial increase of the evidence against the null model in the case where the order constraints are supported by the data. We also get a more informative answer about how the effects are related to each other in the case there is evidence against the null model than when testing the null against an unconstrained alternative.

Discussion

We have presented two extensions of the BIC for evaluating models with order constraints on certain parameters of interest. In the first extension, a truncated unit information prior was considered under the order-constrained model, and in the second extension, a truncated local unit information prior was considered. Theoretical considerations and numerical analyses revealed that the local unit information prior resulted in better model selection behavior than the nonlocal unit information prior for order-constrained model selection.

The new order-constrained BIC based on the local unit-information prior can easily be computed using the new R package “BICpack.” This will allow researchers to test multiple social theories that can be translated into conflicting sets of equality and order constraints on the parameters of interest. The methodology can also be used for testing directed effects of ordinal predictors, as these expectations can be translated into order-constrained models in a natural manner.

Footnotes

Acknowledgments

The authors thank Anton Olsson Collentine for helping with the R code for reading order constraints and Tim Reeskens and John Gelissen for useful insights on the empirical applications from the European Values Study.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: Mulder’s research was supported by a NWO Vidi grant (452-17-006). Raftery’s research was supported by NIH grants R01 HD054511 and R01 HD 070936 and by the Center for Advanced Study in the Behavioral Sciences at Stanford University.

ORCID iD

J. Mulder

Notes

References

Avramidi

. 2000. Lecture Notes Methods of Mathematical Physics MATH 536. New Mexico Institute of Mining and Technology. Retrieved November 15, 2019 (https://www.researchgate.net/publication/266229957_Lecture_Notes_Methods_of_Mathematical_Physics_MATH_536).

Böing-Messing

van Assen

Hofman

Hoijtink

Mulder

. 2017. “Bayesian Evaluation of Constrained Hypotheses on Variances of Multiple Independent Groups.” Psychological Methods 22:262–87.

Braeken

Mulder

Wood

. 2015. “Relative Effects at Work: Bayes Factors for Order Hypotheses.” Journal of Management 41:544–73.

Cohen

1988. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum.

Cohen

A. K.

Rai

Rehkopf

D. H.

Abramsn

. 2013. “Educational Attainment and Obesity: A Systematic Review.” Obesity Review 14:989–1005.

de Jong

Rigotti

Mulder

. 2017. “One after the Other: Effects of Sequence Patterns of Breached and Overfulfilled Obligations.” European Journal of Work and Organizational Psychology 26:337–55.

Genz

Bretz

Miwa

Leisch

Scheipl

… Hothorn

2016. R-Package ‘mvtnorm’ [Computer Software Manual] (R package version 1.14.4—For new features, see the ‘Changelog’ file (in the package source)).

Hoijtink

2011. Informative Hypotheses: Theory and Practice for Behavioral and Social Scientists. New York: Chapman & Hall/CRC.

Inglehart

Abramson

P. R.

. 1999. “Measuring Postmaterialism.” American Political Science Review 93:665–77.

10.

Jeffreys

1961. Theory of Probability. 3rd ed. New York: Oxford University Press.

11.

Johnson

V. E.

Rossell

. 2010. “On the Use of Non-local Prior Densities in Bayesian Hypothesis Tests.” Journal of the Royal Statistical Society Series B 72:143–70.

12.

Kass

R. E.

Raftery

A. E.

. 1995. “Bayes Factors.” Journal of American Statistical Association 90:773–95.

13.

Kass

R. E.

Wasserman

. 1995. “A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion.” Journal of the American Statistical Association 90:928–34.

14.

Klugkist

Laudy

Hoijtink

. 2005. “Inequality Constrained Analysis of Variance: A Bayesian Approach.” Psychological Methods 10:477–93.

15.

Kluytmans

Schoot

R. V. D.

Mulder

Hoijtink

. 2012. “Illustrating Bayesian Evaluation of Informative Hypotheses for Regression Models.” Frontiers in Psychology 3:Article ID 2.

16.

Lee

S.-Y.

Song

X.-Y.

. 2007. “A Unified Maximum Likelihood Approach for Analyzing Structural Equation Models with Missing Nonstandard Data.” Sociological Methods & Research 35:352–81.

17.

Morey

R. D.

Wagenmakers

E.-J.

. 2014. “Simple Relation between Bayesian Order-restricted and Point-null Hypothesis Tests.” Statistics and Probability Letters 92:121–24.

18.

Mulder

2014a. “Bayes Factors for Testing Inequality Constrained Hypotheses: Issues with Prior Specification.” British Journal of Statistical and Mathematical Psychology 67:153–71.

19.

Mulder

2014b. “Prior Adjusted Default Bayes Factors for Testing (In)Equality Constrained Hypotheses.” Computational Statistics and Data Analysis 71:448–63.

20.

Mulder

Fox

J.-P.

. 2019. “Bayes Factor Testing of Multiple Intraclass Correlations.” Bayesian Analysis 14:521–52.

21.

Mulder

Hoijtink

Klugkist

. 2010. “Equality and Inequality Constrained Multivariate Linear Models: Objective Model Selection Using Constrained Posterior Priors.” Journal of Statistical Planning and Inference 140:887–906.

22.

Mulder

Pericchi

L. R.

. 2018. “The Matrix-F Prior for Estimating and Testing Covariance Matrices.” Bayesian Analysis 13:1189–210.

23.

Raftery

A. E.

1986. “Choosing Models for Cross-classifications.” American Sociological Review 51:145–46.

24.

Raftery

A. E.

1993. “Bayesian Model Selection in Structural Equation Models.” Pp. 163–80 in Testing Structural Equation Models, edited by Bollen

K. A.

Long

J. S.

. Beverly Hill, CA: Sage.

25.

Raftery

A. E.

1995. “Bayesian Model Selection in Social Research.” Sociological Methodology 25:111–63.

26.

Romeijn

J.-W.

van de Schoot

Hoijtink

. 2012. “One Size Does Not Fit All: Proposal for a Prior-adapted BIC.” Pp. 87–105 in Probabilities, Laws, and Structures, edited by Dieks

Gonzalez

Hartmann

St oltzner

Weber

. Dordrecht, the Netherlands: Springer.

27.

Scheepers

Gijsberts

Coenders

. 2002. “Ethnic Exclusion in European Countries: Public Opposition to Civil Rights for Legal Migrants as a Response to Perceived Ethnic Threat.” European Sociological Review 18:17–34.

28.

Schwarz

G. E.

1978. “Estimating the Dimension of a Model.” Annals of Statistics 6:461–64.

29.

van de Schoot

Mulder

Hoijtink

Van Aken

Semon Dubas

Orobio de Castro

… Romeijn

J.-W.

2012. “An Introduction to Bayesian Model Selection for Evaluating Informative Hypotheses.” European Journal of Developmental Psychology 8:713–29.

30.

van der Waal

de Koster

ten Kate

. 2015. “Educational Attainment and Obesity in the Netherlands: Class or Status?” Paper presented at ‘Dag van de Sociologie,’ Amsterdam.

31.

Vermunt

J. K.

1997. Log-linear Models for Event Histories. Thousand Oaks, CA: Sage.

32.

Welzel

Inglehart

. 2005. “Liberalism, Postmaterialism, and the Growth of Freedom.” International Review of Sociology 15:81–108.

BIC Extensions for Order-constrained Model Selection

Abstract

Keywords

Introduction

Order-constrained Model Selection in Social Research

Application 1: Assessing the Importance of Different Dimensions of Socioeconomic Status

Application 2: The Importance of Postmaterialism for Young, Middle, and Old Generations

BIC Approximations of the Marginal Likelihood

Truncated Unit Information Prior

Truncated Local Unit Information Prior

Comparison With Other BIC Extensions

Numerical Analyses

Statistical Evidence for Order-constrained Models

Error Probabilities

Approximation Errors of the Order-constrained BICs

Software

Empirical Applications Revisited

Application 1: Assessing the Importance of Different Dimensions of Socioeconomic Status

Application 2: The Importance of Postmaterialism for Young, Middle, and Old Generations

Discussion

Footnotes

Acknowledgments

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

References