Abstract
Human development research often relies on aggregated variables, that is, composites, to operationalize theoretical concepts of interest (e.g., Blau, 1998; Davis et al., 2004), and “[n]umerous efforts to develop composite indices are underway at all geographic levels” (Ben-Arieh, 2010, p. 18). Already in 1983, Rushton et al. (1983) recognized the aggregation principle’s relevance in the context of human development research. For instance, composite indices such as the United Nations Development Program’s Human Development Index (HDI; Hopkins, 1991; United Nations Development Programme, 1990) or the Centre for Global Development and Foreign Policy’s Commitment to Development Index (Lee et al., 2020) are frequently applied in human development research (e.g., Chowdhury & Squire, 2006; Harttgen & Klasen, 2011; Noorbakhsh, 1998). Other examples are helicopter parenting (Willoughby et al., 2015), child well-being (O’Hare & Gutierrez, 2012), social class (Osborn & Morris, 1979), maternal psychological distress (DiPietro et al., 2006), quality of parent–child relationships (García-Moya et al., 2013), and screen-based media use (Hutton et al., 2020). In all these instances, the theoretical concept of interest has been represented by a composite, that is, a linear combination of more elementary variables.
To assess composites, human development research mostly relies on confirmatory factor analysis (CFA; Jöreskog, 1979)—a special case of structural equation modeling (SEM; Bollen, 1989). For instance, in the existing human development literature, CFA was used to assess work and job withdrawal (Blau, 1998), which were both modeled as composites. Although CFA is not only a quasi-standard tool in human development research and also frequently applied in other research fields such as psychology (DiStefano & Hess, 2005; MacCallum & Austin, 2000), business management (Mak & Sockel, 2001), and criminology (Williams et al., 2007), it is limited to study composites. In CFA theoretical concepts are modeled as common factors, that is, latent variables that are measured by a set of observed variables. Consequently, the theoretical concept is regarded as the common cause shared by the observed variables. In contrast to common factors, composites are formed and not measured and thus using CFA to study composites disregards the nature of composites.
Considering the situation outlined above, it would be illogical for researchers to employ CFA as a statistical tool for construct validation if they want to study and model theoretical concepts that are assumed to function according to a composite. To avoid the misuse of CFA in cases where a theoretical concept is modeled as a composite, researchers are faced with the question of how to assess composites with the same degree of rigor as they are accustomed to when studying common factors with CFA.
Against this background, this article presents confirmatory composite analysis (CCA; Schuberth et al., 2018), a novel technique that is devoted to the analysis of composites. Using a recently proposed specification of composites allows to specify the models studied in CCA as a special case of structural equation models (Henseler & Schuberth, 2020a; Schuberth, 2021b). Overall, CCA shows the same benefits for assessing theoretical concepts modeled as composites as CFA shows for theoretical concepts modeled as common factors. Hence, CCA is a suitable approach for assessing composites as it overcomes the drawback CFA has in assessing composites.
The remainder of this article is structured as follows: The following section emphasizes the need for a proper method to assess composites in the context of human development research by distinguishing common factors and composites and highlighting the important role of composites in this discipline. Subsequently, we present CCA and describe its steps, that is, model specification, identification, estimation, and assessment. Following this description, we provide an illustrative example in the context of human development research. Finally, the article closes with concluding remarks.
The Need for a Proper Method to Assess Composites in Human Development
Research in human development studies theoretical concepts. To assess these concepts, researchers frequently apply CFA and thus use the common factor model for concept’s operationalization (Behrendt et al., 2019; Blau, 1998). Hereby, human development research is not an exception. As pointed out in the literature, other disciplines such as psychology (Rhemtulla et al., 2020) or marketing (Sajtos & Magyar, 2016) also apply the common factor model by default to operationalize theoretical concepts. However, in choosing a statistical model for concept operationalization, it should be ensured that the model matches the nature of the studied concept. Otherwise, questionable conclusions are likely to be drawn from the estimated model (see, for example, Sarstedt et al., 2016).
The model underlying CFA is called common factor model, which is also often referred to as the reflective measurement model. In this model, the theoretical concept is modeled as an unobserved common factor, that is, a latent variable. In addition, the theoretical concept is assumed to be the common cause underlying the set of observed variables, that is, variation in the concept leads to variation in its measures (Bollen & Bauldry, 2011). Consequently, the observed variables are regarded as random measurement error-prone manifestations of the theoretical concept. Typically, the random measurement errors, which capture the variation in the observed variables that cannot be explained by the common factor, are assumed to be uncorrelated. Therefore, the common factor is the only explanation for the correlations among the observed variables and thus the observed variables would be uncorrelated when controlled for the common factor (Kline, 2015; Lazarsfeld, 1959). Examples of theoretical concepts that have been modeled as common factors are moral emotions, that is, guilt, shame, and pride (da Silva et al., 2022).
On the contrary, various fields of human development research study theoretical concepts that are formed, that is, the concept is not assumed to be the common cause underlying a set of observed variables but is an aggregation of more elementary parts. To model such concepts the use of the composite model was proposed (e.g., Cole et al., 1993; Edwards, 2001; Henseler, 2015, 2017; Henseler & Schuberth, 2021; Schuberth et al., 2018; Yu et al., 2021). In the composite model, the theoretical concept is represented by a composite, that is, a weighted linear combination of variables. Moreover, the role of observed variables differs between the composite model and the common factor model. While in the common factor model the observed variables are assumed to be measures of the concept, in the composite model the observed variables serve as ingredients making up the concept. For a more detailed distinction between the common factor model and the composite model, the reader is referred to Henseler (2021) and Yu et al. (2021).
Various fields of human development research study theoretical concepts that are formed and thus they frequently use composites. For instance, fear, anger, and joy were modeled as composites to study their effects on children’s emotional development (Kochanska, 2001). Similarly, Coan (2010) suggested that fear “is constituted of high ANS [autonomic nervous system] arousal, hypervigilance, escape or avoidance behavior, and subjective fear experiences” (p. 279). Another example is core-self evaluation, which was proposed to be modeled as a composite comprising self-esteem, generalized self-efficacy, locus of control, avoidance motivation and approach motivation (Johnson et al., 2008). Furthermore, Jennings and DiPrete (2010) proposed that math drill is composed of “the frequency with which students do math worksheets, use math textbooks, and do math on the chalkboard” (p. 142). A further example is socioeconomic status, which is “composed of items relating to parental educational attainment, occupational prestige, and family income” (Wright et al., 2017, p. 86 S). Similarly, work withdrawal and job withdrawal were modeled as composites composed of unfavorable job behaviors, lateness and absence, and turnover intent, desire to retire and intended retirement age, respectively (Blau, 1998).
Next to these composites, composites often appear as indices, so-called composite indices. Table 1 provides some exemplary composite indices studied in the field of human development research. Arguably, the most prominent composite index in the context of human development research is the HDI, which describes the development status of a country (United Nations Development Programme, 1990). Due to several criticisms of the HDI, the Modified HDI was introduced (Noorbakhsh, 1998). In addition, alternative indices, such as the Composite Global Well-Being Index (Chaaban et al., 2015), have been proposed. Besides the HDI, the Gender Development Index, the Human Poverty Index (United Nations Development Programme, 1990), the Gender Inequality Index, the Multi-dimensional Poverty Index (United Nations Development Programme, 2010), the Sustainable Child Development Index (Chang et al., 2018), the Combined Quality of Life Index (Diener, 1995), the Gender Gap Index (Sharma et al., 2021), and the Physical Quality of Life Index (Morris, 1978) are popular composite indices to evaluate and compare countries in human development research. Alongside composite indices used to evaluate the development status of countries, composite indices are also used in other contexts, such as for assessing the quality of universities (Asif & Searcy, 2014; Murias et al., 2008). Furthermore, composite indices are applied to evaluate children’s development status. Such indices include the Mental Development Index (Bayley, 1969) which focuses on the status of cognitive and language development (Lowe et al., 2011), or the Early Development Index (Janus & Offord, 2000) which evaluates a child’s development status in deciding on school readiness. Moreover, the Parenting Stress Index (Abidin, 1997) evaluates the magnitude of stress in the relation among parents and children and the Psychomotor Development Index is used to evaluate the motoric abilities of children (Carter et al., 2004).
Examples of Indices in Human Development Research.
In all of these instances, composites are used to represent the theoretical concepts of interest, that is, variables are aggregated to represent a theoretical concept. Consequently, applying CFA to study these concepts is limited because the model underlying CFA does not match the nature of these concepts; these concepts are not assumed to be the common cause underlying their sets of observed variables. Moreover, combining variables into a single variable, that is, creating a composite does carry an information loss (Zhou et al., 2010). However, researchers of human development currently do not assess whether the benefits of studying a single variable instead of multiple variables individually, sufficiently compensates for the disadvantage of losing information. Similarly, researchers lack statistical methods to assess whether an aggregation of variables acts as a own variable. Both these issues can be addressed by means of CCA, which we present in the following.
CCA and Its Step-by-Step Application
CCA was first sketched by Jörg Henseler and Theo K. Dijkstra (Henseler et al., 2014) and subsequently fully elaborated by Schuberth et al. (2018). A recently introduced specification allows for expressing the composite model, that is, the model underlying CCA, as a special type of structural equation model (Henseler & Schuberth, 2020a; Schuberth, 2021b). As a consequence, CCA can be understood as a special case of SEM and estimators implemented in the SEM software can be used to obtain the parameter estimates of composite models. Although CCA has been introduced in various fields, such as business research (Henseler & Schuberth, 2020b), managerial science (Schuberth, 2021a), tourism and hospitality research (Yuqing et al., 2022) and information systems research (Hubona et al., 2021), it has not yet been presented to the field of human development research. In the following, we will explain its steps.
Model Specification in CCA
In a first step of CCA, a composite model has to be specified (Cho et al., 2022; Dijkstra, 2013, 2017). Considering
where
To overcome this issue, we rely on a specification that was introduced recently, which allows us to express composite models as a special type of structural equation model (Schuberth, 2021b). In this specification, the relations between a composite and its observed variables are expressed in terms of composite loadings instead of weights. In addition, not only one composite, but as many composites as observed variables are extracted per block. Together, these composites span the same space as their observed variables. Consequently, equation (1) can be rewritten as
We follow Henseler (2021) in denoting the composite of interest
Equation (2) makes it apparent that the relationship between composites and their observed variables can be expressed in terms of composite loadings
Note that the transposed weight matrix
where the diagonal matrix
In addition to extracting emergent and excrescent variables from the blocks of observed variables, their covariances need to be specified. While the emergent variables are typically allowed to covary freely, the excrescent variables do not covary with any other variables in the model than their corresponding observed variables; see also section “Model Identification in CCA.” Consequently, the inter-block covariance matrix
where the matrix
Equation (5) reveals that all correlations between the observed variables of different blocks are accounted for by the corresponding emergent variables and thus all the information between two blocks of observed variables is conveyed by the emergent variables. This is similar to CFA where the common factors account for the correlations between the observed variables of two different blocks. Consequently, the composite model constrains the inter-block covariance matrices to be of rank 1. The complete observed variables’ variance–covariance matrix
It is noteworthy that the role of composites in the composite model presented above can differ from their role in other SEM specifications. For instance, Rose et al. (2019) proposed the pseudo indicator model which allows to specify composites in a structural model in such a way that all the constraints imposed by the composite model are removed. Similarly, Grace and Bollen (2008) proposed to allow for correlations between observed variables of two different blocks. Therefore and in contrast to the composite model, not all correlations between two blocks are accounted for by the composites. For that reason, composites studied in CCA are also labeled as emergent variables to emphasize that they convey all the information between their observed variables and other variables in the model (Schuberth et al., in press). Moreover, composites are often formed outside the model, for example, using unit weights (Rhemtulla et al., 2020). Consequently, the weights are no model parameters. This is in contrast to the composite model where the weights and the composite loadings, respectively, are freely estimated.
Model Identification in CCA
To achieve model identification in CCA, several additional constraints need to be imposed. In the following exposition, we provide concise guidelines; for a more technical explanation of the identification of composite models, see Schuberth (2021b). First, the variances of the emergent and excrescent variables need to be determined. Hence, we recommend that one composite loading for each emergent and excrescent variable be constrained to 1. In doing so, one needs to ensure that an observed variable serves not multiple times as scaling variable. Second, further composite loadings of the excrescent variables need to be fixed to avoid over-parameterization of the model. For this reason, we recommend that excrescent variables’ composite loadings be fixed at 0 in the following way: For the first excrescent variable, no additional constraints are imposed; for the second excrescent variable, we fix one of the composite loadings at 0; for the third excrescent variable, we fix two composite loadings at 0; for the fourth excrescent variable, we fix three composite loadings at 0; and so forth. Consequently, for the last excrescent variable of each block, one composite loading will remain unconstrained. Next to fixing the composite loadings, the correlations among excrescent and emergent variables need to be constrained. While emergent variables are usually allowed to freely correlate, the excrescent variables are not allowed to be related to any other variable in the model except their respective observed variables. Therefore, the degrees of freedom are obtained as follows
To illustrate the composite model specification and the identification rules presented, we consider a situation in which a researcher wants to study two correlated composites

Specification of a Composite Model.
In Figure 1, observed variables are depicted as rectangles. The composites, that is, emergent and excrescent variables are displayed as hexagons to distinguish them from common factors, which are typically expressed as ovals (Grace & Bollen, 2008). This is in contrast to other SEM models studying composites such as latent difference score models in which composites are usually displayed as ovals (McArdle, 2009). Furthermore, the relations between the variables are depicted by different types of arrows. While single-headed arrows display linear regression coefficients, double-headed arrows illustrate covariances. To ensure that the parameters of our example model are identified,
Model Estimation in CCA
After ensuring identification of the model parameters, in the next step the free model parameters including the composite loadings and the correlations among the emergent variables need to be estimated. For this purpose, a variety of estimators implemented in common SEM software can be used, such as the maximum likelihood (ML) estimator (Jöreskog, 1969; Schuberth, 2021b) and the generalized least squares estimator (Browne, 1974). While composites loadings are directly estimated, the relationship between the composite loadings and the weights as described in equation (3) can be exploited to obtain the weight estimates. As a consequence, the weight estimates are obtained as follows
Note that by default SEM software do not provide standard error estimates for weights because the weights are not directly estimated. To address this issue, most SEM software allow users to specify new parameters. This feature can be used to produce the weight estimates, including their standard errors.
Next to SEM estimators, also estimators that emerged outside the realm of SEM can be applied. For instance, partial least squares path modeling (Wold, 1975) and approaches to generalized canonical correlation analysis (Kettenring, 1971) can be used to obtain the model parameter estimates (Dijkstra, 2017; Henseler & Schuberth, 2020b; Jörg & Florian, 2022; Schuberth et al., 2018). However, it is emphasized that these estimators require a different model specification, namely, in terms of weights instead of composite loadings (see Dijkstra, 2017). Implementations can be found in the R package cSEM (Rademaker & Schuberth, 2020) and the commercial software ADANCO (Henseler & Dijkstra, 2017).
Model Assessment in CCA
In the last step of CCA, the model is assessed. This involves assessing the overall model fit and the individual parameter estimates. In SEM, overall model fit refers to the comparison of the observed variables’ sample variance–covariance matrix
To assess the overall model fit in CCA, researchers can rely on the same tests for overall model fit and fit indices that have been proposed for SEM. The most prominent test to assess the overall model fit is a likelihood ratio test, also known as the chi square test (Jöreskog, 1967), which assesses the null hypothesis of exact fit, that is, the model-implied variance-covariance matrix equals the population variance-covariance of the observed variables:
Next to overall model fit assessment, the parameter estimates need to be evaluated. Hereby, the composite loading and weight estimates are of particular interest. The composite loading estimates are the correlations between an observed variable and the composites and thus provide information about the orientation of the composite. Specifically, the scaling indicator, that is, the indicator whose loading was constrained to 1, determines the orientation of the composite. If it turns out that other observed variables forming that composite show negative composite loadings although they are expected to correlate positively with that composite, researchers should either reconsider the scaling variable or fix the loading of the scaling variable to −1 instead of 1, to ensure the right orientation of the composite. Next to that the magnitude and the significance of the composite loading estimates can be assessed. Moreover, the weight estimates should be considered. This is particularly important for researchers who are interested in the composition of the composite, that is, the contribution of each observed variable to the composite, or who want to calculate composite scores. Note, weight estimates are subject to multicollinearity which can lead to differences in the signs of the composite loading and weight estimates.
Illustrative Example
To illustrate the use of CCA in human development research, we focus on one of the five major personality dimensions, that is, Neuroticism (
The data contain observations from 249 managers on 10 variables. Similar to Edwards (2001), who modeled Extraversion as a composite, we model
Parameter Estimates Including Their 95% Confidence Intervals.
CI: confidence interval.
Results are based on 249 observations and rounded to the second decimal.
Equation (9) shows the variance–covariance matrix of the facet and the preference scores:
As explained in the section “Model Specification in CCA,” we model

A Confirmatory Composite Analysis of Neuroticism.
As Figure 2 illustrates,
To obtain the model results, we used the ML estimator as implemented in the R package lavaan (Rosseel, 2012, Version 0.6.11.1683). Moreover, lavaan allows researchers to manually specify additional model parameters as function of other model parameters. As shown in equation (8), the weights are a function of the composite loadings. Consequently, this lavaan feature can be exploited to calculate the composite weight estimates directly. Similarly, the standardized weights can be obtained by multiplying the original weight estimate with the ratio of two standard deviations, that is, the standard deviation of the corresponding ingredient and the standard deviation of the emergent variable. The complete R syntax is provided in the online supplementary material.
The estimation with lavaan converged normally and estimates for the composite loadings, the composite weights, and the covariances between NEUR and the preferences for the four conflict strategies are provided. Considering the overall model fit assessment, the hypothesis about perfect model fit is rejected (
Although the fit indices provide no clear picture of the model quality, we continue here and report in Table 2 the estimated composite loadings, weights, and standardized weights of NEUR, including their 95% confidence intervals.
2
Consequently, the standardized composite
Concluding Remarks
Researchers in human development research often inappropriately assess composites using CFA. To address this issue, we present a recently developed approach to SEM—namely CCA—which allows for assessing composites with the same rigor as researchers who assess common factors in CFA. In doing so, composites are embedded in a model which imposes constraints on the inter-block covariance matrix, that is, the covariances between the variables forming a composite and other variables in the model. Specifically, the composite model assumes that covariances across blocks of observed variables are accounted for by the composites of main interest, that is, emergent variables. This view on composites can differ from other SEM specifications of composites that relax the constraints of the composites model (e.g., Rose et al., 2019) or ignore the formation of the composite in the model (Rhemtulla et al., 2020). Consequently, such specification cannot be used in CCA because they prevent researchers from assessing the overall model fit which exploits the constraints imposed by the composite model. Similarly, in the composite model the weights and the composite loadings, respectively, are usually free model parameters that need to be estimated. It is emphasized that the weights, and thus the composites, are context-specific, that is, the weights of a composite may differ when different variables are related to the composite.
In our article, we explain how to conduct a CCA. Specifically, we show how to specify composite models by means of emergent and excrescent variables. Moreover, we explain how composite models can be identified and how parameter estimates can be obtained using common SEM software. Finally, we elaborate on the assessment of composite models, which helps researchers to evaluate whether the observed variables of a block form a whole or act as a mere pile of parts, and thus should be studied individually. To evaluate the overall fit of composite models, we suggest to employ statistical tests and fit indices. Specifically, in our illustrative example we refer to fit indices, including their cut-off values that have been proposed for structural equation models with latent variables (Schermelleh-Engel & Moosbrugger, 2003) and CFA (e.g., Hu & Bentler, 1999). Although existing studies indicated that these fit indices are able to detect misspecified composite models (Schuberth et al., 2018, 2022), it is up to future research to reassess their cut-off values in the CCA context. Moreover, it is noteworthy that there are differing views in the SEM literature on the value of tests and fit indices for overall model fit assessment. For a discussion, the interested reader is referred to the special issue on overall model fit assessment in the journal
Besides explaining the steps of CCA, we demonstrate its use by means of an illustrative example using the R package lavaan. We deliberately chose lavaan to specify and estimate the model for the following reasons: First, lavaan is a widely used SEM software. Second, lavaan is an open-source software package and thus freely available. Third, the most recent version of lavaan shows a relatively good convergence behavior, whereas other SEM software such as AMOS (Arbuckle, 2014) face bigger difficulties. Fourth, lavaan allows users to specify new parameters as function of other model parameters. Thus, lavaan provides the opportunity to directly calculate the weight estimates including their corresponding standard errors. However, our guidelines are not limited to lavaan and other software such as Mplus can be used as well. For further software tutorials on CCA, the reader is referred to the following website: www.confirmatorycompositeanalysis.com. Moreover, we showed in our illustrative example that CCA is not limited to composites formed of observed variables but can also be used to assess composites formed of latent variables. In this way, random measurement error comprised in the composite’s ingredients can be taken into account. Although we limit our focus in the illustrative example on single-indicator latent variables, this is by no means necessary and multiple-indicator latent variables can also be incorporated. In this case, one could speak of confirmatory composite and factor analysis (CCFA). Moreover, and due to our limited access to the original dataset, in our illustrative example, we report confidence intervals for the parameter estimates that lavaan provides by default, that is, confidence intervals based on the standard normal distribution and standard errors obtained from the inverse of the expected information matrix (e.g., Lai & Kelley, 2011). 3 However, it has been highlighted in the mediation analysis literature that for products of parameters, such as an indirect effect, bootstrap confidence intervals are preferred for statistical inference and hypotheses testing, particularly for smaller sample sizes (e.g., Briggs, 2006; Preacher & Hayes, 2004, 2008; Zhao et al., 2010). Since the weight estimates are also a multiplicative and additive transformation of the composite loadings, that is, they are obtained as the inverse of the composite loading matrix, future research is advised to investigate the benefits of bootstrap confidence intervals over the classical one.
Finally, although researchers can use CCA to assess composites that are composed in a linear way, human development research also studies concepts and indices that are composed in a nonlinear fashion. For instance, the multiplicative HDI (United Nations Development Programme, 2010) is formed in a nonlinear way. It is still an open question how such concepts and indices can be assessed using CCA. A potential avenue might a transformation of the original index. For example, an index that is composed in a multiplicative way can be linearized using the logarithm. Future research should investigate this topic in more detail to make CCA accessible to a broader range of concepts and indices.
Supplemental Material
sj-pdf-1-jbd-10.1177_01650254221117506 – Supplemental material for Confirmatory composite analysis in human development research
Supplemental material, sj-pdf-1-jbd-10.1177_01650254221117506 for Confirmatory composite analysis in human development research by Tamara Schamberger, Florian Schuberth and Jörg Henseler in International Journal of Behavioral Development
Footnotes
Declaration of Conflicting Interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
