Abstract
Keywords
Introduction
The use of psychological questionnaires or tests in research usually involves the assumption of a latent variable measured by the questionnaire items. Latent variable modeling provides a versatile toolkit for measuring such latent traits. There are two main areas where latent variables, and particularly latent variable scores, are used: Scaling individuals on a single construct, and estimating latent variable effects in
The first purpose of psychometric latent variable modeling, individual assessment of psychological traits, is a critical component of the cognitive and behavioral sciences (American Psychological Association [APA], 2014). Individual latent variable scores based on observed responses to items of psychological tests are used for psychopathological diagnoses as well as assessment of abilities and personality in occupations and education. However, a major problem is the validity of psychological tests, especially with respect to social minorities (Reynolds et al., 2021). Generally, validity means that a variable measures what it is supposed to measure. Evidence against test validity usually relies on the hypothesis of construct underrepresentation or construct-irrelevant variance, meaning that a variable measures more or less than it should (APA, 2014, p. 12).
Providing evidence for validity usually includes taking into account deviating response behavior in subgroups. Systematic deviations may indicate that the functioning of the scale item differs with regard to certain construct-irrelevant variables. This phenomenon is referred to as measurement noninvariance (Van De Schoot et al., 2015) or differential item functioning (DIF, Bulut & Suh, 2017), and it is present if item parameters differ between subgroups. An item identified as exhibiting DIF is considered biased if the source of variability is irrelevant to the trait being assessed by the test (i.e., construct-irrelevant). However, because any individual characteristic could be defined as construct irrelevant, controlling for item bias may cause real group differences on these variables to be interpreted as bias (see Davies, 2010).
Latent variable scores can be estimated based on item response theory (IRT) (Hartig & Höhler, 2009; Immekus et al., 2019) or confirmatory factor analysis (CFA) (Li, 2016) models (Bhaktha & Lechner, 2021). Practically, construct underrepresentation can be tested for through model fit tests of CFA or IRT models (APA, 2014). Because parameter heterogeneity leads to parameter instability, the assumption of measurement invariance may be investigated via parameter instability tests (Zeileis & Hornik, 2007). However, such a parameter test usually requires a hypothesis about the covariates that negatively affect the parameter stability of a model. In other words, it requires a priori specification of the subgroups for which DIF is suspected.
In recent years, tree-based machine learning methods have been proposed to algorithmically control for DIF in unidimensional IRT models (Komboz et al., 2018; Strobl et al., 2015) through recursive partitioning (Zeileis et al., 2008). Machine learning methods have also been developed to deal with effect heterogeneity in experimental and observational studies (Athey et al., 2019; Athey & Imbens, 2016; Wager & Athey, 2018). As these methods touch on (distinct) aspects of construct validity, they form the ingredients of our approach that focuses on the estimation of unbiased latent variable scores.
We propose
LV Forest comes with a number of favorable properties that allow to take complex heterogeneities in the context of latent variable modeling into account. First, LV Forest uses a data-driven approach for detecting groups that are subject to parameter heterogeneity. The researcher only needs to specify a set of construct-irrelevant partitioning variables for which she suspects differences in model parameters. The partitioning variables are then used to algorithmically search for subgroups with conditionally stable parameters in a decision tree-like fashion. This approach is particularly valuable in situations in which a priori specification of all relevant subgroups based on theoretical assumptions may not be feasible and/or is likely to be insufficient. Second, LV Forest computes multiple decision trees to account for the instability of single trees to small changes in the data to detect relevant subgroups robustly. This approach is inspired by random forests and includes random split selection and bagging to increase tree diversity (Breiman, 2001a). Third, decision trees in LV Forest are heavily pruned. This means that subgroups that are subject to parameter heterogeneity are only selected if the model fits the data and the model parameters are stable with respect to a prespecified vector of covariates.
When applying LV Forest in practice, the algorithm iteratively learns which subgroups in the sample are relevant for estimation and uses these subgroups to repeatedly estimate latent variable scores. Thus, LV Forest can be used for latent variable score estimation especially if the assumed latent variable model does not fit the (full) data and/or includes parameter estimates that are unstable with respect to construct-irrelevant covariates. We show that LV Forest estimates accurate scores in complex settings and outperforms naive and singe tree approaches in simulations.
In section “Combining Factor Analytic Modeling and Item Response Theory,” we describe the methodological background of this paper and how the ideas of IRT and Confirmatory Factor Analysis (CFA) can be merged. In section “Parameter Heterogeneity,” the issues of parameter heterogeneity are described and the M-fluctuation test is introduced. In section “Tree-based Machine Learning,” we briefly introduce tree-based machine learning methods and how the algorithmic modeling perspective can be used to account for heterogeneity. Subsequently, our LV Forest approach is described (section “LV Forest”). In sections “Simulation” and “Real Data Application,” simulations as well as an empirical application of LV Forest with survey data are presented. The advantages and limitations of the proposed method are discussed in section “Discussion.”
Latent Variable Modeling and Score Estimation
Stochastic models which specify the relationship between individual responses to items with a limited amount of response categories and an underlying continuous latent variable are consolidated under the term IRT. Note that IRT was originally developed to examine the response process of individuals.
Combining Factor Analytic Modeling and Item Response Theory
Usually, in IRT models, a latent variable represents the ability of the respondent. This ability is assumed to underlie the response behavior (Steyer & Eid, 2013). In the following, we refer to this latent variable as
The link function
It is possible to efficiently estimate MIRT parameters via CFA modeling. This means that assumptions of an MIRT model can be translated into a special CFA model and parameters can then be estimated in a computationally efficient manner that is common in the CFA framework (limited information approach, see Li, 2016). For this, a continuous, normally distributed latent response variable
Note that in this model, the
Using the factor analytic approach makes it possible to estimate MIRT parameters through
where
For simplicity, we refer to CFA models with continuous and/or categorical variables as well as multidimensional GRMs as
The latent variable score estimates in
The indeterminacy of latent variable scores varies widely across different models, applications and methods for latent variable score estimation. It may depend, for example, on the degree of commonality between latent variables and response variables (Grice, 2001). It is suggested by Grice (2001), to examine the correlational relationship between
Parameter Heterogeneity
In MIRT models, DIF occurs when an item- or category-specific parameter depends on covariates of the manifest variables (i.e., response variables). Such covariates may take the form of characteristics of the individuals responding to the items. For example, the difficulty of an item may depend on ethnicity, education, or gender. Conditioning on such covariates is equivalent to analyzing separately certain subgroups defined by different values on these covariates. Similarly, in CFA models the structural parameters determining the relation between latent variables and endogenous variables may differ between subgroups. We refer to between-subgroup differences of parameters in both MIRT and CFA models as
Let
Controlling for parameter heterogeneity for ordinal dependent variables in latent variable models can be formalized by assuming
Accordingly, for a numeric response variable
If the latent variables are properly defined, the latent variable vector
In practice, parameter heterogeneity can be very problematic because the number of relevant covariates may be very large. Also, there is an even greater amount of possible values or value ranges of these covariates for which model parameters may differ. In addition, complex interactions within the covariate vector
Systematic parameter instability with regard to a covariate
Summing this function across the sample and maximizing the results yields asymptotically equivalent parameter estimates to limited information maximum likelihood estimation in CFA models for metric variables (maximum likelihood estimation, see Lee & Shi, 2021). Thus, in the estimation process, the score function
Tree-based Machine Learning
In section “Parameter Heterogeneity,” we introduced the problem of parameter heterogeneity in latent variable models. We assume that reducing parameter instability by conditioning on a set of covariates
Machine learning models are considered parts of the
In contrast, algorithmic models serve the purpose of predicting new or future observations through flexible modeling with minimal assumptions. Algorithmic models need to be flexible enough to approximate the data generating function while also being robust toward changes in the data used to fit the model. This compromise is referred to as the
Decision trees represent a popular set of nonparametric machine learning methods that are usually used for prediction of an outcome variable. A predictive model (referred to as a
For the purpose of iteratively reducing parameter heterogeneity, it is important not to overfit a decision tree. At first, a minimum sample size within the terminal nodes (leaves) of the tree must be established so that parameters for latent variable models can be properly estimated for the subsamples in the terminal nodes. Then, only splits that significantly reduce parameter heterogeneity (according to the generalized M-fluctuation test) should be performed, otherwise spurious parameter heterogeneities may be induced for the models in the terminal nodes.
A popular extension to single decision trees is random forests. They are purely predictive methods where the true functional form of the relationship between input and response variables is assumed to be unknown before the procedure is applied and the function approximated by random forest is not directly interpretable. The predictions of a random forest, however, are likely to be more accurate than the predictions of most data models (Fife & D’Onofrio, 2021; Shmueli, 2010). If we acknowledge that nature produces data in complex and inconceivable ways, the approximation through a nonstochastic but accurate function by random forest might be preferable compared with data models.
Random forest methodology can be tailored to serve other purposes. For example,
LV Forest
We develop a tree-based algorithm for latent variable score estimation: LV Forest. The proposed algorithm is outlined in Figure 1. We begin our considerations with the assumption that the parameters of the proposed latent variable model are not equal for all participants in the population. Parameter heterogeneity in the latent variable model may imply unintended influence of construct-irrelevant variables on the relations within the model. Furthermore, we presume that the proposed latent variable model does not fit the data equally well for all subgroups of the population. With the proposed algorithm, we aim to detect subgroups relevant to bias in estimated latent variable scores, and only latent variable models that fulfill conditional independence from construct-irrelevant variables as well as achieve adequate model fit are chosen for latent variable score estimation. This way, we establish both unbiasedness with respect to construct-irrelevant variables in latent variable score estimation and latent variable scores are not estimated with an underrepresented model. We combine the limited-information approach for parameter estimation (section “Combining Factor Analytic Modeling and Item Response Theory”) and the SEMTree algorithm (section “Tree-based Machine Learning”) to efficiently compute an ensemble of decision trees, in which each tree reduces parameter instability. We then prune the resulting trees to detect subgroups in which the model fits the data and the parameter estimations are stable. Note that we do assume that the proposed latent variable model fulfills the criteria for latent variable score determinacy (Grice, 2001, section “Latent Variable Modeling and Score Estimation”).

Univariate GRM Model.
First, an SEMTree (section “Tree-based Machine Learning”) is grown. Note that for the computation of
We might say that the decision trees in the ensemble are heavily “pruned,” leaving only those leaves that are most likely to contain models that are adequate for latent variable score estimation. Specifically, this means that, we exclude terminal nodes for which (a) the proposed model does not fit the data, and (2) the model’s parameters are instable w.r.t. the covariates. For (a), an RMSEA-cutoff value is defined (Hu & Bentler, 1999; Schermelleh-Engel et al., 2003) and all models that exceed this cutoff are excluded. For (b), the generalized M-fluctuation test for parameter instability (Zeileis & Hornik, 2007) is performed. Classe and Kern (2024) show that the performance of the generalized M-fluctuation test for ordinal data is as good as for metric data and thus can be used for ML-based models.
We learn from the machine learning literature that a single decision tree may be vulnerable to small changes in the training data and the set of partitioning variables (Breiman, 2001a). For the most part, this is a consequence of the hierarchical nature of the decision tree (A. Brandmaier et al., 2016; Kern et al., 2019). In addition, if an SEMTree is grown with ordinal data this can lead to inaccuracies in the partitioning process because the ML estimator is used for the computation of the fitted model scores (section “Parameter Heterogeneity”) at the beginning of the tree growing process. For parameter estimation via maximum likelihood, the dependent variables are assumed to be normally distributed. This assumption rarely holds for ordinal data (Li, 2016). We account for the problem of unstable and potentially inaccurate trees by computing several structurally different decision trees and evaluating the compiled results of this ensemble of trees. We use
After computing all trees in the ensemble, the estimated latent variable scores are accumulated for each individual over all relevant subgroups in the tree ensemble. This means that across all relevant subgroups found by the algorithm that contain individual
For the application of LV Forest, the R function
Simulation
Setup
We test the performance of LV Forest with simulated data. We carried out three simulations. For Simulation 1, the data are simulated based on a simple univariate latent GRM model, that is
In this model, the variance and mean of the latent variable are estimated. The discrimination parameter pertaining to item 1 (i.e.,
The data set used in Simulation 1 consist of 10 model-compliant subsamples
In addition, for each of the simulated subsamples, we created one numerical covariate
This means that the values on
All input model parameters that were used to simulate the data differ between all subgroups
Input Threshold Parameters for Simulation.
Model Fit Indicators and Input Discrimination Parameters of Simulated Data Sets.
We apply LV Forest to the simulated data set and compute a forest of 10000 decision trees. All covariates
Latent variable score estimation accuracy is evaluated by comparing the true simulated latent variable scores
For Simulation 2, we simulated 100 data sets in a simplified form of the procedure described above. We simulated data based on an univariate IRT model with eight items with five categories each (instead of five items with seven categories like in Simulation 1). Furthermore, each of the simulated data sets consist of three model-compliant subsamples for each of which one ordinal
Thus, the model-compliant subsamples are recoverable in the terminal nodes of several decision trees, but not in the terminal nodes of a single decision tree. In Simulation 2, we reduce the number of partitioning variables per simulated data set to six (instead of 30 in Simulation 1).
We apply LV Forest to each of the simulated data sets using the same hyperparameters as in Simulation 1, except that we compute 20 trees per ensemble (instead of 10,000 in Simulation 1). Furthermore, we apply LV Forest to each of the simulated data sets and randomly select 5 out of the 6 relevant partitioning variables to be generally available for the computation of the ensemble. This way, we want to find out how the absence of relevant partitioning variables affects latent variable score estimation with LV Forest. Note that this is not random split selection, but it is a simulation scenario in which not all relevant partitioning variables can be used by the algorithm. We also apply a single SEMTree to each of the simulated data sets, fit a separate model for each of the terminal nodes and estimate latent variable scores using these fitted models.
The accuracy of the latent variable score estimations are evaluated by comparing the true simulated latent variable scores
For Simulation 3, we simulated one data set in a similar way as in Simulation 1, but now the full data set is simulated using a single set of parameters. We simulated the data to fit a univariate model with five response variables with seven response categories each. We simulate three covariates
Results
The application of LV Forest with the simulated data resulted in a tree ensemble in which 425 out of 10,000 decision trees included at least one terminal node in which the assumed model fits well and the model parameters are stable w.r.t. the partitioning variables. Overall, there are 439 terminal nodes in which these two conditions apply. These terminal nodes remained for the estimation of latent variable scores for the whole sample. On a 20 core, 170GB RAM server, LV Forest took 5.89 hours (353.5 minutes) of computation time.
The estimation of the single SEMTree with the simulated data of Simulation 1 took 5.01 minutes on a 20-core, 170GB RAM server. The tree structure is shown in Figure 2. It is obvious that the single SEMTree did not reproduce the simulated subgroup structure. The RMSEA values of the models in the 16 terminal node range from 0.02 to 0.18 but only two of the models have a RMSEA lower than 0.05.

Single SEMTree on Simulated Data
To estimate the naive latent variable scores
The correlation matrix of the four sets of latent variable score estimates and the true simulated latent variable scores are shown in Table 3. We used Spearman’s rank correlation coefficient because the latent variable score estimations may not follow a normal distribution. The accuracy of latent variable score estimations, that is, the correlations with the simulated latent variable scores
Correlations of Estimated Latent Variable Scores From Simulation 1.
The results of Simulation 2 in terms of accuracy are shown in Figure 3a and the results in terms of nonconvergence are shown in Figure 3b. The application of the different latent variable score estimation methods on 100 simulated data sets shows that the accuracy of latent variable score estimation based on a naive model

Results of Simulation 2.
Over all 100 samples, the mean computation time of a single SEMTree was 15.17 seconds. The mean computation time of LV Forest was 37.86 seconds. Note that the computations were executed on a 20 core, 170GB RAM server and the trees were computed in parallel.
The results of Simulation 3 show that no splits were performed in any of the 10 LV Forest trees. Thus, in the absence of parameter heterogeneity, the scores estimated by LV Forest are equal to the scores of the naive model.
Real Data Application
We demonstrate the application of LV Forest using data obtained from the LISS (Longitudinal Internet studies for the Social Sciences) panel administered by Centerdata (Tilburg University, The Netherlands). LISS is a comprehensive longitudinal survey conducted annually, encompassing a wide range of topics such as employment, education, income, housing and personality traits (Scherpenzeel, 2018). For this application, we analyze the data from the first survey wave in 2008. In this wave, 8,722 household members were contacted and 6808 individuals responded. We focus on five items from the satisfaction with life scale (Diener et al., 1985) measuring life satisfaction. We excluded all cases that did not respond on all of the five items which leads to a final sample of
Life Satisfaction Scale Items as Asked in the LISS Panel.
We analyze the data using the same univariate GRM model structure that Simulation 1 is based on (see Equation 8 and Figure 1). First we fit such a model to the whole data set and refer to it as the naive model. We then we apply LV Forest.
For the application of LV Forest, we choose 11 background variables representing the construct-irrelevant variables for our latent variable model. These variables describe the general characteristics of households and household members that participate in the LISS panel. They encode characteristics on the individual level (such as gender, age or civil status) as well as on the household level (such as household income, domestic situation or type of dwelling). The variables are shown in Table 5. We apply LV Forest to the data set using these background variables as partitioning variables and compute an ensemble of 1,000 trees. To reduce computation time and to ensure that LV Forest outputs a manageable number of relevant subgroups w.r.t. post hoc analysis, we set the cutoff RMSEA value to .03. Minimum terminal node size is set to 200 and random split selection to 2.
Partitioning Variables Used in LV Forest Application With LISS Panel Data.
As a sensitivity check, we additionally apply LV Forest with the same data but different partitioning variables. We apply an ensemble with the same hyperparameters as described above while using only the first six variables in Table 5 (
To illustrate the conditional independence of the estimated latent variable scores, we perform post hoc tests for independence between the estimated latent variable scores and the construct-irrelevant variables within the subgroups found by LV Forest. For this, we apply a test based on the d-variable Hilbert Schmidt independence criterion (Pfister et al., 2018). With this kernel-based nonparametric test, we test for stochastic independence (instead of e.g., linear independence).
As the estimated latent variable scores are accumulated for each individual over all relevant subgroups, the resulting latent variable scores are not expected to be independent of construct-irrelevant partitioning variables for the full sample. Within the relevant subgroups, however, the latent variable scores are expected to be independent of construct-irrelevant variables. Thus, any overall effects of background variables on latent variable scores imply real differences between the relevant subgroups. To analyze such effects on the latent variable scores, we apply regression models using the 11 background variables as individual predictors. We do this for three different outcome variables: the LV Forest scores using all partitioning variables, the LV Forest scores using only a subset of partitioning variables and the latent variable scores estimated with the naive model.
We fit the naive model using the WLS estimator (see Section “Combining Factor Analytic Modeling and Item Response Theory”). The model does not fit the data well
In the LV Forest ensemble, 15 trees (1.5% of the ensemble) each generated one terminal node that contained a subsample for which the univariate GRM model fits the data and all parameter estimates are stable w.r.t. all 11 background variables. The model fit indices for all subgroups are shown in Table 6. For these relevant subgroups, latent variable scores were estimated, such that score estimates were available for
Relevant Subgroups Found in LV Forest LISS Data Application.
Results of Kernel-Based Independence Test: Dependence of Latent Variable (Life Satisfaction) on Construct-Irrelevant Variables Within Relevant Subgroups.
We analyzed the effect of the background variables on the different latent variable score estimations (naive model vs. LV Forest vs. LV Forest with subset of partitioning variables). The results are shown in Table 8. The regression coefficients for the scores of the LV Forest with all partitioning variables indicate a linear effect of two variables (partnership status and domestic situation). For these same variables, the regression coefficients for the scores of the reduced LV Forest show a significant effect. Also, the Spearman’s correlation of the scores of the LV Forest with all partitioning variables and the scores of the reduced LV Forest is 0.99. In contrast, the coefficients for the naive scores additionally show significant effects of four other variables (civil status, age, gender, or urban character of dwelling). This indicates that the effect of partnership status and domestic situation on life satisfaction may not be due to bias. The effect of civil status, age, gender or urban character of dwelling, however, may be due to bias w.r.t. the background variables.
Regression Coefficients of Covariates on Latent Variable Scores in the Real Data Application.
The LV Forest with a subset of partitioning variables uses the partitioning variables
Discussion
In this study, we proposed LV Forest, an algorithmic approach to latent variable score estimation. We focused on a setting in which a naive latent variable model is subject to parameter heterogeneity. In this case, fitting a latent variable model and estimating latent variable scores on the basis of this model can lead to false conclusions. The proposed latent variable model may, however, not violate measurement invariance within subgroups that can be defined by covariates. Since tree-based methods have successfully been applied to account for DIF (Komboz et al., 2018; Strobl et al., 2015), we utilized the algorithmic machine learning perspective for handling complex subgroup structures in the context of latent variable score estimation. Assuming that the latent variable scores of a proposed model are determinate (section “Latent Variable Modeling and Score Estimation”), we argue that scores should only be estimated if the latent variable in the proposed model is not underrepresented and independent from construct-irrelevant variables. Construct-irrelevant variables may have an effect on latent variable scores estimated using LV Forest. However, this effect may not be due to bias but due to real differences w.r.t. the latent variable scores between relevant subgroups. We build on the growing body of research that utilizes techniques from the field of machine learning to flexibilize stochastic models when they are confronted with complex covariate structures.
In psychological assessment, bias refers to systematically under- or overestimating personality traits or abilities. Especially cultural bias has been a polarizing issue for many years. The controversy lies in the explanations given for the measured systematic differences in traits and abilities between specific subgroups. Are they based on an interaction of genes and environment (i.e., genuinely different ability levels in different groups) or on different cognitive structures requiring different test characteristics, that is, test bias (see Reynolds et al., 2021). According to Bollen (1989), causality, and therefore validity, is only possible if there are no systematic differences in a latent ability or trait with respect to variables outside of the latent variable model. Thus, if systematic differences between groups are not part of the assumed model, they are attributable to test bias. This way, no real differences of the latent variable scores w.r.t. construct-irrelevant variables are interpretable. As virtually all individual characteristics can be such construct-irrelevant variables, this notion is problematic (see, e.g., Davies, 2010). We propose a solution to this problem by proposing a method to estimate latent variables scores whose subgroup differences w.r.t. construct-irrelevant variables are estimable and interpretable.
Latent variable scores estimated using LV Forest are also very useful when it comes to complex SEMs that include measurement paths between latent variables. In these models,
We applied LV Forest to simulated data to test whether the method is suitable for finding simulated subgroups based on fitting IRT models with stable parameters. The results show that the method works well for an univariate GRM model. We also show that latent variable score accuracy depends, to some degree, on model fit and parameter stability of a latent variable model. Furthermore, we show that latent variable score estimation via a single SEMTree does not perform as good as LV Forest if the subgroup structure behind the sample cannot be recovered by a single tree. Another advantage of LV Forest is that a 0% convergence rate is very unlikely. However, nonconvergence rates are likely to be larger for LV Forest compared with a naive model. However, if there are not many partitioning variables in the data and/or if the data set is not very large, one might prefer using a single SEMTree over LV Forest to estimate latent variable scores.
Furthermore, we applied LV Forest to real data from a large-scale survey. We analyzed five items measuring satisfaction with life and used background variables to recursively partition the sample. As a result, latent variable scores were estimated for 40% of the sample. When the number of partitioning variables was reduced, scores were only estimated for 20% of the sample. This shows that LV Forest may be limited when it comes to exhaustively estimating latent variable scores for the entire sample. In reality, there may always be individuals for which the proposed latent variable model does not apply and relevant partitioning variables are not measured. Our simulations, however, suggest that the accuracy of LV Forest scores is still high, even given considerable nonconvergence. When this is the case, the researcher may increase the RMSEA-cutoff to reduce the nonconvergence rate, but potentially compromise on latent variable score accuracy.
The fact that the estimated latent variable scores were predominantly
Comparison to Related Methods
Another tree-based machine learning approach to identify and account for parameter heterogeneity, which is also applicable to different types of latent variable models, is called
Limitations
In the LV Forest framework, we focus on latent variable models that may be subject to parameter heterogeneity. Simultaneously, we claim that we only use models with
Practical limitations stem from the fact that it is impossible in many cases to measure all construct-irrelevant variables that may confound the measurement paths of a presumed model. The scores estimated by LV Forest should be interpreted with regard to the fact that there may still be potential construct-irrelevant variables that were not collected in the study. The simulation showed that the absence of relevant partitioning variables may lead to nonconvergence score estimation for individuals in the sample. Thus, if not all relevant partitioning variables are measured, it may not be possible to estimate unbiased scores for every individual in the sample. We additionally note that large samples sizes are needed for LV Forest to be efficient. The sample needs to be large enough that sample sizes in terminal nodes in complex trees are sufficient to estimate model parameters, as well as to accurately perform M-fluctuation tests. The simulation also showed that if the subgroup structure of a sample is complex, many trees and therefore long computation times are needed. In practice, if an assumed model does not fit the data and/or has unstable parameters it may be viable for the researcher to adjust the model assumptions before turning to LV Forest. We also acknowledge that LV Forest does not return an inherently interpretable model function. Like random forests, LV Forest allows to model highly complex structures of subgroups. However, a direct interpretation of the composition of these subgroups would lead to results that are unlikely to be generally applicable. Our proposed method therefore explicitly focuses on the estimation of latent variable scores.
