Abstract
Keywords
With the rapid proliferation of cognitive diagnosis models (CDMs), an increasing number of researchers choose CDMs as the psychometric tool to analyze assessment data because they can provide more informative feedback to support students’ learning. CDMs have been used to examine students’ strengths and weaknesses in many educational subject domains, such as proportional reasoning (Ma et al., 2020), spatial skills (Culpepper, 2015), and digital literacy (Liang et al., 2021). In addition to educational assessments, some researchers used CDMs in clinical assessments for diagnostic purposes (e.g., de la Torre et al., 2018; Tan et al., 2022). A family of CDMs can be found in the literature, including specific and general CDMs. Specific CDMs can be classified as either conjunctive, disjunctive, or additive based on different cognitive assumptions or theories about how attributes function in examinees’ response behaviors. The deterministic inputs, noisy “and” gate (DINA; Junker & Sijtsma, 2001) model, the deterministic inputs, noisy “or” gate (DINO; Templin & Henson, 2006) model, and additive CDM (A-CDM; de la Torre, 2011) are the examples of conjunctive, disjunctive, and additive models, respectively. General CDMs, which have more flexible formulations that subsume some commonly used specific CDMs, have also been developed. The examples of general CDMs include the generalized DINA (G-DINA; de la Torre, 2011), the general diagnostic model (von Davier, 2005), and the log-linear CDM (LCDM; Henson et al., 2009).
Although a large number of existing CDMs are currently available, most of these models and their applications focus on the classification of attributes at a single time point. However, to measure examinees’ changes across multiple time points or before and after a specific event (e.g., the outbreak of COVID-19), longitudinal or pre-/post-test designs should be used. Several latent transition CDMs that can be used with longitudinal assessment data have been developed. Kaya and Leite (2017) and Li et al. (2016) combined latent transition analysis (LTA; Collins & Lanza, 2010) with restricted CDMs (i.e., DINA and DINO models) to assess examinees’ changes in attribute mastery in repeated measurements. Chen et al. (2018) developed a Bayesian procedure of a first-order hidden Markov model in conjunction with the DINA model to track learning trajectories. To relax item parameter constraints, Madison and Bradshaw (2018a, 2018b) proposed the transition diagnostic classification model that combines LTA with a general CDM, namely, the LCDM, to assess growth in attribute mastery status across time points. Using another general CDM, namely, the G-DINA model, Yigit and Douglas (2021) employed a first-order Markov model to track students’ learning trajectories.
In addition to longitudinal CDMs, integrating covariates with CDMs needs research attention because education researchers and practitioners may also be interested in the associations between covariates and attribute mastery statuses. Several studies on this topic for single time-point design employing a one-step or three-step approach can be found in the literature. The one-step approach estimates the CDM (measurement model) and the regression model (structural model) simultaneously. For example, logistic regression is used in the DINA and higher order DINA (de la Torre & Douglas, 2004) models to estimate how covariates affect the probabilities of mastering each attribute (Ayers et al., 2013; Park & Lee, 2014). Although the one-step approach provides more accurate estimates, it lacks model flexibility because any modifications to either component require refitting the entire model. Moreover, such an approach is not applicable to secondary research because model selection, item parameter estimation, and examinee classification have already been determined. In contrast, the three-step approach offers greater flexibility in modeling the relationship between covariates and latent class membership or attribute mastery, where CDM estimation, latent class membership assignment, and subsequent regression are implemented in separate steps. Iaconangelo and de la Torre (2016) proposed a corrected three-step latent logistic regression approach (Vermunt, 2010) to explore the associations between covariates and attribute mastery statuses or mastery profiles, where correction due classification errors are made.
Although equally important, incorporating covariates into longitudinal CDMs has not yet been extensively studied. To date, limited work has been done on modeling the association between covariates and attribute mastery transitions. Wang et al. (2018) proposed a higher order hidden Markov model with covariates to assess changes in attribute mastery status and investigate how covariates affect attribute transitions simultaneously using a one-step approach. As mentioned above, this approach lacks model flexibility. Moreover, this particular model neglects to account for the effects of covariates on the initial state and the profile-level relationships. To provide a more flexible approach of incorporating covariates into longitudinal CDMs, this research proposes a three-step estimation approach for latent transition CDM with covariates using the G-DINA model framework, which allows for the investigation of how covariates may affect both the initial state and transition probabilities. Latent logistic or multinomial regression is employed to investigate attribute-level or profile-level associations between covariates and initial state and transition probabilities. Because stepwise approach underestimates the effects of covariates (Bolck et al., 2004; Di Mari et al., 2016; Vermunt, 2010), correction of classification error probabilities (CEPs) is also taken into consideration in the present study. Therefore, the corrected three-step approach for latent transition CDM with covariates involves three steps: (1) fitting a CDM (measurement model) to the response data at each time point, (2) assigning examinees to latent states at each time point and computing the CEP, and (3) estimating the model with the CEP and computing the regression coefficients.
The rest of this article is laid out as follows. The next section gives a review of the G-DINA model framework and LTA. The third and fourth sections elaborate on the proposed latent transition CDM with covariates and the three-step estimation, respectively. A simulation study for performance evaluation and a real data example for illustration purpose are given in the fifth and sixth sections, respectively. Finally, this article concludes with a discussion of the findings and future directions.
Overview and Background
G-DINA Model
Let
where
Latent Transition Analysis
LTA, also referred as latent or hidden Markov model (Baum & Petrie, 1966), which is a longitudinal analog of latent class model (LCM), has been developed to model not only the initial latent class membership but also the transitions of latent class membership over time (Collins & Lanza, 2010). To differentiate LTA and LCM, we use the term latent state instead of latent class to represent examinees’ temporal states at each time point. Specifically, in LTA, the measurement model is an LCM modeling item response probability at each time point, and the structural model characterizes the latent state prevalence and the changes between latent states across time points. Examples that use LTA can be found in Collins and Lanza (2010) and Lanza and Collins (2002). By combining CDMs with LTA, researchers can study how examinees transition between latent states (attribute profiles or attribute mastery statuses) across time points (e.g., Kaya & Leite, 2017; Li et al., 2016; Madison & Bradshaw, 2018a, 2018b).
Furthermore, incorporating covariates into LTA allows researchers to predict initial latent state membership and latent state transitions. For example, Lanza et al. (2010) used LTA with covariates to model transitions in substance use behavior profiles and found that students’ background and academic characteristics were the significant predictors of substance use behavior profiles and transitions in behavior profiles. Addition work using covariates in LTA can be found in Chung et al. (2007) and Wang et al. (2018). In the context of educational assessments, using latent transition CDMs with covariates can help identify the characteristics that are related to the classification of students’ latent states and the transitions between latent states over time, which can provide useful information for remediation and classroom instruction at different time points. As mentioned earlier, a more general model that allows covariates to affect both initial state and transition probabilities should be studied. Additionally, although the estimation is more accurate, estimating the measurement and structural models simultaneously in one step lacks model flexibility. In contrast, stepwise approaches offer greater flexibility by estimating the measurement and structural models separately. Regular CDM analyses (e.g., Q-matrix validation, model selection) can be implemented in stepwise approaches and adding or dropping covariates can be easily done as well. Because directly treating latent class or state assignment as observed variables produces biased estimates of the regression coefficients, researchers have developed the corrected three-step approach, where classification errors are accounted for (Di Mari et al., 2016; Iaconangelo & de la Torre, 2016; Vermunt, 2010).
A Latent Transition CDM With Covariates
Let
This latent transition model consists of two components: a measurement component,

Latent transition cognitive diagnosis model with covariates.
Logistic regression models are used to parameterize the initial state probabilities and transition probabilities, given the covariates values at each time point. When profile-level associations are of interest, taking the first latent state, namely, the attribute profile of all zeros, as the reference category, the initial state probability is given by
where
At each time point, the transition probability is an
where
When the transition between attribute mastery and nonmastery is of interest, the initial state probability of being classified into the mastery latent state of
and
In the educational assessment context, attribute mastery is typically assumed to be an absorbing state, that is, an attribute does not revert back to its nonmastery state once it has been mastered (Yigit & Douglas, 2021), which is also assumed in this study. However, the model in Equation 2 can still be estimated without the monotonicity assumption. In the current study, for profile-level transition, the transition probability of examinees being classified into a lower latent state at time
Example of Transition Probability Matrix of
also taking the nonmastery state at time
With regard to the objective function of Equation 2, assuming that the tests have been taken by
When covariates are involved in LTA, the regression coefficients (
The Three-Step Approach
The three-step approach for latent transition CDM with covariates involves the following steps: (1) fitting a CDM to the response data without covariates at each time point separately, (2) assigning examinees to latent states at each time point and computing the associated CEPs, and (3) estimating the latent transition CDM with the known CEPs (Di Mari et al., 2016; Vermunt, 2010) and computing the regression coefficients in Equations 3 and 4 or Equations 5 and 6. In the first step, the repeated response data from the same group of examinees are treated as independent datasets at each time point and estimated using the G-DINA model separately. Q-matrix validation and modification, model selection, and item parameter estimation can also be carried out in this step. Item parameters are constrained to be equal across time points to ensure longitudinal measurement invariance, which can avoid classification problems and allow for interpretable results over time.
In the second step, examinees are classified into latent states (discrete), given their responses at each time point using the expected a posteriori (EAP; Huebner & Wang, 2011) method. In this study, we focus on mean assignment (i.e., EAP) only, and other assignment rules (e.g., modal and proportional assignment) can be found in Goodman (2007). Because the stepwise approach will yield biased estimates of the covariate effects on the initial state and transition probabilities, classification errors are accounted for, and correction weights introduced in latent state membership assignment. To address this problem, Bolck et al. (2004), Di Mari et al. (2016), and Vermunt (2010) proposed the three-step approach with correction weights for LCM and LTA. Iaconangelo and de la Torre (2016) also introduced correction weights in the three-step approach in analyzing CDMs with covariates to obtain more reliable parameter estimates.
To obtain correction weights, CEP matrix needs to be computed. CEP matrix estimates the amount of misclassification in the measurement model conditional on the true latent state/class memberships. Denote the predicted or assigned latent state by
where
When attribute-level mastery status is the latent state of interest, the CEP reduces to a
where
According to Iaconangelo and de la Torre (2016) and Vermunt (2010), the sample-level correction weights for examinee
which is the element of the profile-level CEP matrix at the s
which is the element of the attribute-level CEP matrix at the
Finally, the third step estimates the relationships between the covariates and latent state memberships and transition probabilities. Although of interest is the relationship between the true latent state
where
For the uncorrected three-step approach, the estimated latent state membership
where
In contrast, in the third step of the corrected three-step approach, the correction weights

Three-step latent transition cognitive diagnosis model with covariates using correction weights (Step 3).
where
If researchers expect to update the classification results by combining the classification information obtained from CDM estimation (
where
Simulation Study
Design
A simulation study was conducted to evaluate the performance of the proposed three-step estimation approach for latent transition CDM with covariates. The performance of the bias-corrected three-step approach was compared to that of the uncorrected three-step approach. This simulation study was also designed to demonstrate that using directly the estimated latent states as observed variables in regression model produces biased parameter estimates and to investigate the ability of the correction weights to improve the regression parameter estimates. Because the one-step estimation procedure for latent transition CDM with covariates, which does currently not available in the literature, is computationally complicated and difficult to optimize, we did not compare the proposed three-step approach to the one-step approach in the current simulation study. Furthermore, we consider both time-constant and time-varying covariates. The number of time points was fixed at two, which can be viewed as a pre/posttest design. To ensure measurement invariance over time, item parameters at the posttest were constrained to be equal to those estimated at the pretest. Lastly, the transition probability of an attribute was assumed to be independent of other attributes.
Because this study is a longitudinal extension to CDM with covariates (Iaconangelo & de la Torre, 2016), we chose the factors based on their study. Five factors were manipulated in the simulation study. The sample sizes used study were
Q-Matrix for
Q-Matrix for
Finally, two scenarios of covariates were considered. The first scenario (Scenario I) considered three time-constant covariates generated from the multivariate standard normal distribution,

Two scenarios of latent transition cognitive diagnosis model with covariates at two time points.
True Parameters Used in the Simulation Study
We assumed that the posttest items were the same as the pretest items. Item responses were generated based on the complete attribute profiles using the G-DINA model. In sum, the current study consists of 3 (sample sizes) × 2 (numbers of attributes) × 2 (test lengths) × 3 (item qualities) × 2 (scenarios) = 72 conditions. For each condition, we generated 100 data sets and estimated the model using both the corrected and uncorrected three-step approaches. In this study, attribute mastery statuses were used as the latent states in this simulation study; hence, the covariate effects on the initial state and transition probabilities were estimated for each attribute. Due to limited space and the complexity of the design needed to obtain generalizable and interpretable results, the present study did not include a simulation study of the profile-level transition. Data generation and Steps 1 and 2 were carried out using the G-DINA package (Ma & de la Torre, 2020) in the R statistical computing software, whereas Step 3 was implemented by directly optimizing the objective functions (i.e., Equations 13 and 14) via the Adam optimizer (Kingma & Ba, 2015) in Python.
The performance of the estimated parameters was assessed by computing the average absolute bias (ABIAS) and the average root-mean-square error (ARMSE), across the attributes, covariates, and replications, which are calculated as
Results
One of the goals of the simulation study was to investigate whether the proposed corrected three-step approach can correctly estimate the parameters of the initial state and transition probabilities using the G-DINA model. The ABIAS and ARMSE of the initial state and transition parameters using the corrected and uncorrected three-step approach under Scenario I are given in Tables 5 and 6, respectively.
ABIAS and ARMSE for Initial State Parameters in Scenario I
ABIAS and ARMSE for Transition Parameters in Scenario I
As shown in Table 5, under various conditions in Scenario I, the ABIAS of the estimated initial state parameters ranged from 0.08 to 0.79 using the corrected approach and from 0.11 to 1.13 using the uncorrected approach, whereas the ARMSE ranged from 0.10 to 0.87 using the corrected approach and from 0.14 to 1.21 using the uncorrected approach. As expected, for all conditions, the corrected three-step approach led to lower ABIAS and ARMSE compared to the uncorrected smaller ABIAS and ARMSE were observed in larger sample sizes, longer tests, and higher item qualities using either the corrected or uncorrected approach. The corrected approach performed only slightly better than the uncorrected approach when longer test lengths and higher item qualities were involved. For example, when
Table 6 shows the performance of the estimated transition parameters under various conditions in Scenario I. Again, the corrected approach produced smaller ABIAS and ARMSE than the uncorrected approach. The ABIAS of the transition parameters ranged from 0.12 to 1.30 using the corrected approach and from 0.15 to 1.39 using the uncorrected approach, whereas the ARMSE ranged from 0.15 to 1.69 using the corrected approach and from 0.19 to 1.85 using the uncorrected approach. In comparing Tables 5 and 6, the patterns of results for the estimated transition and initial state parameters were similar; however, the ABIAS and ARMSE of the estimated transition parameters were larger than those of the estimated initial state parameters.
For the sake of brevity, the details of the results of Scenario II are not presented. However, we found that the ABIAS and ARMSE of the initial state and transition parameters in Scenario II (see Online Appendix A) had the same patterns as those in Scenario I, which indicates that our proposed method is also applicable to time-varying covariates and that covariates of interest can be added to any time point.
Table 7 presents PAR and AAR averaged over attributes, parameters, and replications for Scenario I when
PAR and AAR for Scenario
Real Data Example
To illustrate the use of the proposed corrected three-step approach for latent transition CDM with covariates, we present in this section an analysis of longitudinal digital literacy assessment (DLA; Jin et al., 2020) data collected over two time points. The DLA was developed to measure students’ performance on five digital skills, namely, information and data literacy (A1), communication and collaboration (A2), digital content creation (A3), safety (A4), and problem solving (A5). The sample consists of 209 students (57.42% girls) from Hong Kong primary schools, who were tested in the 2018/2019 (Primary 3) and 2020/2021 (Primary 5) academic years. For the present analysis, 28 common items that were examined at both time points were used. The Q-matrix of the 28-item test (Table 8) was derived from Liang et al. (2021).
Q-Matrix for 28-Item Test in the Real Data Example
The covariates consisted of two time-constant variables, namely, gender and socioeconomic status (SES). SES involved three indicators: father’s and mother’s highest level of education, and home literacy resource (the number of books at home), and was computed by averaging the
Model-Data Fit of the Real Data Example
Table 10 displays the parameter estimates of the latent logistic regression models. At the initial state, gender was a significant predictor of A2 (communication and collaboration) and SES was a significant predictor of A1 (information and data literacy). Specifically, given the same SES scores, girls were more likely to master A2 than boys, and the odds of a girl being classified as master of A2 were 2.16 (=
Estimates of Logistic Regression Parameters in the Real Data Example
*
Discussion
This study extends the existing LTA methodologies to a general CDM (i.e., G-DINA model) framework in conjunction with covariates. Specifically, a corrected three-step approach for latent transition CDM with time-constant and time-varying covariates, which estimates the latent state classifications and subsequently investigates the relationships between the covariates and latent state memberships in separate steps, while at the same time taking into account the potential classification errors, was developed. This approach also allows modeling not only attribute-level covariate effects but also profile-level covariate effects on both the initial state and transition probabilities, which has not been reported in the literature.
The results of the simulation study indicate, as expected, that the ABIAS, ARMSE, and agreement rates of the proposed method were well behaved under the test conditions considered in this work. Although the uncorrected three-step approach performed similarly to the corrected three-step approach when items were of high quality and the test length was longer, such conditions may not always be satisfied in practice. Thus, when less than high-quality items and small sample size are involved, the corrected three-step approach can be expected to provide more reliable estimates and perform better, in some situations, much better than that of the uncorrected method. These findings indicate that secondary researchers interested in modeling longitudinal data can use the three-step approach to arrive at valid interpretations of the relationships between the latent state membership classification and the covariates of interest.
Both the simulation study and real data example involved only two time points. However, the proposed method can be readily used with data with more time points because the corrected three-step approach is a single-indicator (i.e., the correction weights) latent transition model, where computational complexity is linearly related to the number of time points and the number of covariates (Di Mari et al., 2016). In addition, because the measurement and structural components are estimated separately, the numbers of items and attributes only affect the computational complexity of the measurement model. Such computational complexity of the measurement model can be easily handled in the GDINA package (Ma & de la Torre, 2020). Moreover, the proposed method has sufficient generality to handle time-constant and time-varying covariates, which broadens type of situations where the proposed approach can be used.
Although this study contributes to the CDM literature by developing a corrected three-step estimation approach for latent transition CDMs with different covariate types, there are a number of limitations that point to future research directions. First, in this study, item parameters were constrained to be equal across time points to ensure longitudinal measurement invariance—this can degrade assessment flexibility. The test forms used across time points need not be the same. For test forms that share common items, multiple-group CDMs, such as the multiple-group G-DINA model (MG-GDINA model; Ma et al., 2021), can be used to detect whether there are items that function differentially across different time points. Subsequently, such items can be treated as different items and can have different item parameters. Future research should investigate the performance of different test forms with some common items using the MG-GDINA model in conjunction with the proposed method. Additionally, situations with different measurement models at different time points (Asparouhov & Muthén, 2014) should also be further explored to allow researchers to select the best CDMs for each time point. Di Mari et al. (2016) pointed out that the corrected three-step approach can be adapted to allow for time-specific measurement models by conducting the analysis of Steps 1 and 2 separately and calculating the time-specific CEP at each time point. Second, although it did not limit the applicability of the proposed method, only correlated attribute structure was considered in this study, whereas various attribute structures (independent and correlated) were examined in Iaconangelo and de la Torre (2016). Future research should account for the potential effects that different kinds of attribute structures with varying correlation strengths may cause. Third, comparing the estimates of the initial state and transition parameters, our findings indicate that the covariate effects on the initial state and transition parameters were different, and the estimated transition parameters were more biased regardless of the approach used. This could be due to the larger classification errors introduced in the transitions compared to those introduced in the initial state. The classification errors in the initial state come from the CDM classification at the first time point, whereas the errors in the transition(s) come from the CDM classification from the first to the later time points. Although we adopted a correction for CEP in this study, it was not always sufficient to correct for all the biases caused by the classification errors, which tend to increase over time. Hence, additional work is needed to address this problem in the future. Fourth, the proposed method in its current form was formulated to only handle a single-group, and researchers may be interested in using the corrected three-step approach for multiple-group designs. For example, researchers may want to analyze data from a longitudinal control-group designed experiment or to compare the performance of examinees from different groups (e.g., genders, ages, and countries; Madison & Bradshaw, 2018b). Therefore, the proposed method should be extended to model multiple groups in future research to estimate group-specific growth in attribute mastery and group-specific covariate effects on initial state and transition probabilities. Fifth, classifying the examinees separately at each time point is a limitation of the proposed procedure. In some situations (e.g., when the attributes cannot be measured with sufficient accuracy at each time point), this can produce suboptimal results. Future studies should explore the development of a new method that can simultaneously classify the examinees, as well as simultaneously calibrate the data across multiple time points. Such a method should have the flexibility to impose parameter invariance and can fit into the three-step estimation framework. Lastly, the present study uses covariates as ancillary information to help predict initial latent state membership and transition probabilities. It would also be worthwhile to predict an observed distal outcome from latent state membership (Asparouhov & Muthén, 2014; Lanza et al., 2013). Thus, to expand the scope of CDM applications, future studies may add distal outcome variables to the proposed method to draw a more complete framework of relationships between ancillary variables and latent state membership.
Supplemental Material
Supplemental Material, sj-docx-1-jeb-10.3102_10769986231163320 - Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach
Supplemental Material, sj-docx-1-jeb-10.3102_10769986231163320 for Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach by Qianru Liang, Jimmy de la Torre and Nancy Law in Journal of Educational and Behavioral Statistics
Footnotes
Acknowledgments
Declaration of Conflicting Interests
Funding
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
