Abstract
Keywords
Introduction
Item response theory (IRT) is a class of models for categorical observed variables where an underlying latent variable is assumed to generate the observed variables. Through different formulations of the probability, conditional on the latent variable, for the response in each category of each observed variable, different IRT models are obtained. The primary areas of application of IRT models are in education and psychology, where the models are used to estimate the individual latent variable, to assess the properties of scales and tests, and to infer population characteristics.
Maximum likelihood estimation of IRT models involves the calculation of integrals which, for most IRT models, have no tractable analytical solution and have to be approximated. With three latent dimensions or fewer, computationally efficient implementations using fixed numerical quadrature rules are the most commonly used approaches (Bock & Aitkin, 1981). However, as the dimension increases, the computational expense grows exponentially which makes the fixed quadrature methods excessively computationally and memory intensive when the dimension is higher than three. A second approach is to replace the fixed quadrature rules with adaptive quadrature, which means that each unique integral is approximated with a unique set of quadrature points (Cagnone & Monari, 2013; Rabe-Hesketh et al., 2002; Schilling & Bock, 2005). The adaptive quadrature methods reduce the required number of quadrature points per dimension, but the computational expense still increases exponentially with higher dimensions, making the approach prohibitively computationally demanding in very high dimensions. A third approach uses stochastic approximations such as the Metropolis–Hastings Robbins–Monro (MH-RM) method (Cai, 2010a, 2010b; Chalmers, 2015). The stochastic methods are slower than the alternatives with few dimensions, but the computational expense only grows linearly with an increased number of dimensions. A fourth option is the Laplace approximation (Huber et al., 2004; Joe, 2008; Pinheiro & Bates, 1995; Shun, 1997), which uses asymptotic expansions to approximate the required integral. The first-order Laplace approximation is formally equivalent to adaptive quadrature with only one quadrature point per dimension. Hence, the computational demand of the first-order Laplace approximation grows only linearly with increasing dimensionality and as a result, in high-dimensional models, the first-order Laplace approximation is by far the most computationally efficient method out of the four referenced. The downside is however the inaccuracy of the approximation, especially with few observed variables per dimension and with dichotomous observed variables (Joe, 2008). To improve the computational accuracy, higher order Laplace approximations can be pursued, which has been done with generalized linear models (Raudenbush et al., 2000), generalized linear latent variable models with ordinal data (Bianconcini & Cagnone, 2012), and confirmatory factor analysis with ordinal data and a probit link (Jin et al., 2017). However, a higher order Laplace approximation requires a substantial amount of higher order derivatives and greatly increases the computational expense, especially for high-dimensional models (Bianconcini, 2014).
An extension to the regular IRT model that incorporates a latent regression component is used in many areas, for example, in large-scale educational assessment programs such as the Programme for International Student Asessment, the National Assessment of Education Progress, and the National Assessment of basic Education Quality (NAEQ; Jiang et al., 2019). With a latent regression IRT model, the usual assumption of a fixed mean vector is replaced with an assumption that each individual has a mean vector defined by observed covariates and regression parameters. These models have also been called explanatory IRT models (De Boeck & Wilson, 2013) in the literature and can, for many IRT models, be formulated as nonlinear mixed models (Rijmen et al., 2003). The estimation of latent regression models again requires the approximation of integrals. In current operational procedures in large-scale educational assessment programs, estimation is done in two steps (von Davier & Sinharay, 2013). First, the item parameters are estimated using a unidimensional model without considering the covariates. Second, assuming fixed item parameters, the latent regression and covariance parameters are estimated. The estimation of the latent regression and covariance parameters can be done with an expectation-maximization (EM) algorithm using a second-order Laplace approximation (Thomas, 1993) or using stochastic approximation (von Davier & Sinharay, 2010). It is also possible to utilize adaptive quadrature, stochastic approximations, or a Laplace approximation and estimate the item and regression parameters simultaneously (Chalmers, 2015; Harrell, 2015; Raudenbush et al., 2000), but such methods have typically not been applied to large-scale assessment programs partly because of the large computational requirements and partly because a second-order Laplace approximation has not been available for many IRT models.
The purpose of this article is to introduce an estimation method for latent regression IRT models using second-order Laplace approximations of the integrals in the likelihood function. As a special case of the approach, a second-order Laplace estimation method for regular IRT models without covariates is obtained. The method is introduced for multidimensional simple structure IRT models (also called between-item IRT models) where each item can be individually modeled with either of the nominal response, graded response, generalized partial credit, and three-parameter logistic (3-PL) models. Previous implementations of the higher order Laplace approximation with item response models have focused on estimation of only the latent regression and covariance parameters (Thomas, 1993), Rasch type models (Raudenbush et al., 2000), or particular models for ordinal data without a latent regression (Jin et al., 2017). The approach in this article extends previous research using the second-order Laplace approximation by providing a general method that applies to any item response model and by applying the method to several common models in IRT, for which the second-order Laplace method was not previously available.
Latent Regression IRT Models
Let
and the marginal log-likelihood for all
IRT Models
The IRT model is defined by the likelihood
where
where
with
For dichotomous items, with
With dichotomous data, we will also consider the 3-PL model, with probabilities defined by (Birnbaum, 1968)
Note that, for numerical stability in estimation, the parametrization
will be used when estimating the unknown parameters with the 3-PL model.
Depending on the exact model formulation, IRT models may be part of other modeling frameworks. For example, the Rasch model can be formulated as a generalized linear mixed model (Clayton, 1996), the graded response model with a probit link as a confirmatory factor analysis model with categorical observed variables (Bartholomew, 1980), and the generalized partial credit model and nominal response model as generalized linear latent variable models (Bartholomew et al., 2011) or as nonlinear mixed models (Rijmen et al., 2003). Meanwhile, the 3-PL model can be viewed as a type of latent class model (Vermunt, 2001).
Parameter Estimation Using a Second-Order Laplace Approximation
The Laplace approximation has been proposed to approximate integrals of the form
with
with
where
and
Consider now the latent regression model defined by
and hence the Laplace log-likelihood is
Define the
since all third-order and higher cross-derivatives are 0. We then maximize the expression
with respect to the unknown parameters to obtain the second-order Laplace approximation maximum likelihood estimator of
where, for
and
Maximization of the approximated marginal likelihood is done by employing a modified Newton–Raphson method. In iteration
where the
To obtain an estimator of the asymptotic covariance matrix of the parameter estimator, it is proposed to use one of three different estimators. The first is equal to the
the second is equal to the
and the third is the
In practice, the calculation of the observed information matrix is done by numerically differentiating the observed gradient from Equation 18 with respect to the unknown parameters. This calculation can be time-consuming since the gradient evaluation involves the minimization of
Numerical Results
To explore the properties of the proposed estimation method, three simulation studies were performed. In the simulations, the R package
Covariance Matrix Used in the 12-Dimensional Simulation
The estimation methods EM, first-order Laplace (Lap1), second-order Laplace (Lap2), MH-RM1, and MH-RM2 were considered in the study, but the EM method was only used for models with a dimension of three since estimation in reasonable time was not possible for the higher dimensional models. All estimation methods used the absolute difference in the parameter estimates between successive iterations as the stopping criterion with a tolerance of 0.0001. The EM algorithm with fixed quadrature used 20 quadrature points, the MH-RM1 method used five draws per iteration, the gain function
Simulation Study 1: Multidimensional 2-PL Models
To evaluate the properties of the first- and second-order Laplace approximation estimation methods with dichotomous items, three- and six-dimensional 2-PL models were used where 3 and 6 dichotomous items per dimension were considered. The covariance matrices for the models were set to the corresponding upper submatrices of the matrix in Table 1. Sample sizes 500, 1,000, and 2,000 were used for each condition. The nonconvergence rates for each simulation setting are given in Tables 2 and 3. Due to high nonconvergence rates, the results for Lap1 were excluded for all models except the six-dimensional model with 6 items per dimension. For the six-dimensional model with only 3 items per dimension, MH-RM1 also had excessively high nonconvergence rates, whereas Lap2 had moderate nonconvergence rates (Table 3). MH-RM2 had no issues with nonconvergence. Note that only the replications for which all compared estimators converged are included in the presented results. The full results can be obtained from the first author upon request.
Nonconvergence Rates (%) for the Three-Dimensional 2-PL Model With the EM, First-Order Laplace (Lap1), Second-Order Laplace (Lap2), MH-RM1, and MH-RM2 Estimation Methods
Nonconvergence Rates (%) for the Six-Dimensional 2-PL Model With the First-Order Laplace (Lap1), Second-Order Laplace (Lap2), MH-RM1, and MH-RM2 Estimation Methods
The results for the multidimensional 2-PL models can be found in Tables 4 through 7. Concerning the parameter estimates, the three-dimensional model with 3 items per dimension (Table 4) has higher bias overall with Lap2 compared to EM, MH-RM1, and MH-RM2, while the Lap2 method has lower RMSE, excepting the covariance parameters. The bias is not much reduced for Lap2 with an increased sample size. With 6 items per dimension (Table 5), the differences between EM, MH-RM1, MH-RM2, and Lap2 decrease, and there are no substantial differences between the estimation methods. With six dimensions and 3 items per dimension (Table 6), Lap2 is uniformly better than MH-RM2 with regard to bias and RMSE. Furthermore, the bias is reduced with a larger sample size. For the case of six dimensions and 6 items per dimension (Table 7), the performance of Lap2 is overall somewhat better compared to MH-RM1 and MH-RM2. Meanwhile, both the bias and RMSE are substantially improved with Lap2 compared to Lap1 for all parameters with sample size 2,000. The bias and RMSE of the standard errors are the lowest for Lap2 across all sample sizes.
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for EM, Second-Order Laplace (Lap2), MH-RM1, and MH-RM2 With the Three-Dimensional 3-Item 2-PL Model
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for EM, Second-Order Laplace (Lap2), MH-RM1, and MH-RM2 With the Three-Dimensional 6-Item 2-PL Model
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for Second-Order Laplace (Lap2) and MH-RM2 With the Six-Dimensional 3-Item 2-PL Model
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for First-Order Laplace (Lap1), Second-Order Laplace (Lap2), MH-RM1, and MH-RM2 With the Six-Dimensional 6-Item 2-PL Model
Simulation Study 2: High-Dimensional Graded Response Model
To evaluate the proposed estimation method with a combination of dichotomous and polytomous observed variables, the three-dimensional simple-structure graded response model in Cai (2010b) was used as the basis for a simulation study where, in addition to the three-dimensional case, six- and 12-dimensional models were also considered. As in Simulation Study 1, the covariance matrices for the three- and six-dimensional models were set to the corresponding upper submatrices of the matrix in Table 1. The 12-dimensional model used the full covariance matrix. For the three-dimensional model with a sample size of 500, the second-order Laplace estimation took an average of approximately 3 seconds using a 3.3GHz CPU (Core i5-4590) with 16GB RAM, which can be compared to the time with MH-RM reported in Cai (2010b), which was 20 seconds with a 2.0GHz CPU and 2GB RAM. The estimation methods EM, Lap1, Lap2, MH-RM1, and MH-RM2 were considered in the study, but EM was only used with the three-dimensional models since estimation in reasonable time was not possible for the higher dimensional models. In addition, it was not possible to use MH-RM1 in the simulation with the 12-dimensional model due to the excessively long time required in estimation and for calculation of the observed information matrix (with sample size 500 one replication took more than 150 minutes to finish). For all conditions considered, the nonconvergence rates were below 1%.
The results for the three-dimensional model can be seen in Table 8. Overall, Lap1 has higher bias than the other estimation methods. However, Lap1 has lower RMSE for the item parameters compared to the other estimation methods with the smaller sample sizes. For the parameter estimates, EM, Lap2, MH-RM1, and MH-RM2 are virtually identical across all settings and evaluation criteria, with the exception of higher bias and RMSE for the covariance parameters with MH-RM2. The six-dimensional results can be found in Table 9. The results mirror those for the three-dimensional case, with Lap1 having higher bias than the alternatives but maintaining a lower RMSE for the small sample case. Lap2 has the overall lowest bias across all estimation methods for each setting although the differences are small with the largest sample size. The standard errors with the three- and six-dimensional models are very close regarding the accuracy and precision between the different estimation methods, with the exception that MH-RM1 has slightly higher RMSE overall and that MH-RM2 has higher bias and RMSE overall. With the 12-dimensional model, the Laplace and MH-RM2 estimation methods were the only ones possible to use and the results can be seen in Table 10. The results are similar to the lower dimensional cases in that Lap1 has higher bias but lower variance, resulting in a lower RMSE for the small sample sizes, and that MH-RM2 has higher bias and RMSE with respect to the covariance parameters. Lap2 has bias which is slightly lower on average than with lower dimensional models and has the lowest overall bias among the considered estimation methods. The standard errors are highly similar with respect to the bias and RMSE between Lap1 and Lap2, while the bias and RMSE are slightly higher with MH-RM2.
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for EM, First-Order Laplace (Lap1), Second-Order Laplace (Lap2), MH-RM1 and MH-RM2 With the Three-Dimensional Graded Response Model
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for First-Order Laplace (Lap1), Second-Order Laplace (Lap2), MH-RM1, and MH-RM2 With the Six-Dimensional Graded Response Model
Average Absolute Bias and RMSE of the Parameter Estimates and Standard Errors for First-Order Laplace (Lap1), Second-Order Laplace (Lap2), and MH-RM2 With the 12-Dimensional Graded Response Model
Simulation Study 3: Latent Regression in Large-Scale Assessment
A five-dimensional model based on the NAEQ 2015 mathematics assessment was also used, where the correlations between the latent variables were all set to .8. The model had 60 generalized partial credit model items (12 in each dimension, with 6 dichotomous and 6 polytomous three-category items in each dimension) and 100 uncorrelated covariates, yielding a total of 690 parameters. A sample size of 60,000 was used, similar to the real Grade 8 NAEQ assessment, where each individual answered on average 20 items in total, meaning that on average 40 item responses per individual were missing due to the matrix sampling design. To obtain reasonable starting values for the item and regression parameters, an initial analysis of the data using a one-dimensional latent regression model was used with a random sample of 10,000 from the simulated data in each replication. The five-dimensional model estimation then used starting values that were equal to the estimated item and regression parameters from the one-dimensional model. Only the first- and second-order Laplace approximation estimation methods were possible to use due to the large computational and memory requirements. With the first- and second-order Laplace approximation methods, each replication took approximately 30 minutes to finish, and for this reason, only one simulation condition was considered and the observed information matrix was not calculated.
The simulation results are given in Table 11. The bias is substantially lower with Lap2 for the item and covariance parameters, while the improvement is smaller for the regression parameters. The same applies with respect to the RMSE, except that the differences between Lap1 and Lap2 are somewhat smaller. The standard errors were estimated with the cross-product approximation due to the large computational resources required to calculate the observed information matrix. The standard errors are approximately equally accurate and precise for either of the two estimation methods.
Average Absolute Bias and RMSE of the Parameter Estimates and Average RMSE of the Standard Errors for First-Order Laplace (Lap1) and Second-Order Laplace (Lap2) With the Five-Dimensional Latent Regression Generalized Partial Credit Model
Conclusions
In this article, a second-order Laplace approximation was introduced for the estimation of multidimensional simple structure item response models with a latent regression component. Through numerical illustrations using realistic settings in education and psychology, it was shown that the estimation method gave a substantial improvement over the first-order Laplace approximation for the estimation of latent regression models and multidimensional IRT models with both binary and ordinal data. Furthermore, the estimation method was typically equally or more precise in estimating multidimensional IRT models when compared to alternatives such as the EM algorithm with fixed quadrature and two implementations of the MH-RM, while considerably reducing the time required for convergence with high-dimensional models. The second-order Laplace approximation provides a fast yet accurate method for estimation of high-dimensional simple structure IRT models and enables the direct estimation of high-dimensional simple structure latent regression IRT models to situations with a large amount of items, covariates, and individuals such as in large-scale educational assessment programs.
In previous research, the first-order Laplace approximation has been shown to function poorly with a low number of observed dichotomous variables (Joe, 2008). The results of this study echo this but suggest that the second-order Laplace approximation can substantially improve over the first-order approximation with dichotomous observed variables. Note that with as few as three observed variables per dimension, the second-order Laplace approximation estimation method can still exhibit bias that is larger than the alternatives. However, when increasing the number of observed dichotomous variables per dimension to six, the second-order Laplace approximation was equally good or better than any of the alternatives considered with respect to the RMSE. The practical performance of the Laplace approximation-based estimation methods is related to the theoretical error rates of the approximations used (Shun, 1997). With an increased number of items, the error of the approximations decrease, and with a higher order approximation, the error decreases at a faster rate compared to a lower order approximation.
Besides the reduced time needed in estimation, the Laplace approximation has several other advantages compared to the stochastic approximation methods. The convergence diagnostics are more straight-forward since the observed-likelihood gradient provides an inexpensive convergence check to supplement the stopping criterion. Although the second-order test via the observed information matrix can be expensive to obtain, the second-order test is also required with the stochastic approximation methods, and for the stochastic approximation methods, this matrix has to be approximated via simulations which often requires a substantial amount of replicates in order for the accuracy to be sufficient. Another advantage is the fast calculation of the observed log-likelihood, used for model comparisons and hypothesis tests. With stochastic approximation methods, obtaining an accurate approximation of the log-likelihood requires heavy computations, which for large sample sizes can take even longer to finish than the actual estimation process. Another advantage to the Laplace estimation method is that the method is not heavily dependent on the estimation settings used. With the MH-RM method, choices have to be made regarding how many samples to draw at each iteration, how to select the gain constants, and how accurate the approximations to the observed information matrix and observed log-likelihood should be. The optimal choices for these settings can vary greatly for different problems, which makes the application of the MH-RM somewhat difficult from the user perspective. It can be noted that the
In the current study, we have only investigated the computational efficiency of the different methods using a single CPU core. With the Laplace approximation methods, multiple cores are easily utilized by parallel computations of each individual contribution to the log-likelihood and gradient. We also note that estimation methods using MH-RM easily utilize multiple cores which will improve their computational efficiency with many cores. In addition, using one draw instead of five draws as done here will increase the computational efficiency with MH-RM. To compare the computational efficiency of estimation methods incorporating parallel computing is a useful future area of research.
A deficiency of estimation with a higher order Laplace approximation is that the computational requirement for models with a more complicated structure increases greatly in higher dimensions. For example, with 12 dimensions and items that load on all latent variables, the summations in Equation 12 contain up to
The proposed method is general but for each new item response model up to the fifth-order derivatives with respect to the latent variables need to be derived and additionally the derivatives with respect to the item parameters have to be derived for the derivatives up to the fourth order with respect to the latent variables. In this article, support for the nominal response, graded response, generalized partial credit, and 3-PL models has been provided, and any combination of these models can be used.
Supplemental Material
supplement4 - Estimation of Latent Regression Item Response Theory Models Using a Second-Order Laplace Approximation
supplement4 for Estimation of Latent Regression Item Response Theory Models Using a Second-Order Laplace Approximation by Björn Andersson and Tao Xin in Journal of Educational and Behavioral Statistics
Footnotes
Declaration of Conflicting Interests
Funding
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
