Multiple-mediator analyses with clustered data are common in educational and behavioral sciences, but limited methods exist to assess the causal mediation effects via each of multiple mediators. In this study, we extend the multiply robust method to make inferences on the causal mediation effects for two mediators with clustered data. The developed method takes into account unmeasured cluster-level confounders and can incorporate machine learning methods to nonparametrically estimate nuisance models while allowing uncertainty quantification via asymptotic standard errors and confidence intervals. We conduct simulations to evaluate the developed method for inference of both the individual-average and cluster-average causal mediation effects with clustered data. We illustrate our method using data from the Education Longitudinal Study.
Mediation analysis is an important tool for assessing causal mechanisms by which treatment affects outcomes (Baron & Kenny, 1986; VanderWeele, 2015). The causal mechanisms often involve multiple mediators, and quantifying the mediating role of each mediator is frequently of interest (Vansteelandt & Daniel, 2017).
For multiple-mediator analysis, conventional methods were based on the parametric modeling approach (Baron & Kenny, 1986; MacKinnon, 2008). Despite its popularity, the parametric modeling approach to mediation has limitations in the causal interpretation of the effect estimates. For causal interpretation in mediation analysis, the causal inference approach is increasingly popular (VanderWeele, 2015), which uses a counterfactual framework to define and identify causal estimands for mediation analysis, such as the natural indirect effects (Pearl, 2001; Robins & Greenland, 1992) and the interventional indirect effects (IIEs; VanderWeele et al., 2014; Vansteelandt & Daniel, 2017). Recently, the causal inference approach has been extended to multiple mediators (Hong, 2015; VanderWeele & Vansteelandt, 2014). For example, Qin et al. (2021) assessed the natural indirect effects for mediators that are assumed to be independent given pretreatment covariates. Daniel et al. (2015) and Imai and Yamamoto (2013) considered causal mediation for sequential mediators (i.e., one mediator affects the other), assuming no unmeasured common causes for the mediators.
In this study, we focus on a different, yet common scenario of multiple mediators, where (a) there is no substantive knowledge of the causal ordering between mediators and (b) the mediators may be dependent. To illustrate, in Figure 1, under each value of treatment , the mediators may be dependent even after conditioning pretreatment covariates . The mediators’ dependence may be due to a causal relationship (such as in [a] and [b]), sharing an unmeasured common cause (such as in [c]), or both (such as in [d]).
Causal diagrams with dependent mediators. The data are unclustered. (a) M1affects M2. (b) M2affects M1. (c) M1and M2share an unmeasured common cause V. (d) M1and M2have a casual relationship and share an unmeasured common cause V.
For this multiple-mediator scenario (Figure 1), a practical approach to causal mediation analysis has been the interventional effects approach (Vansteelandt & Daniel, 2017). While the natural indirect effects are often the estimands with a single mediator (Imai et al., 2010), with multiple mediators the identification of natural indirect effects for each mediator requires strong assumptions (Avin et al., 2005; Loh et al., 2022). For example, under Figure 1(c) (mediators correlated due to an unmeasured common cause), the natural indirect effect for each mediator is generally unidentifiable (Vansteelandt & Daniel, 2017). To address this problem, Vansteelandt and Daniel (2017) proposed the interventional effects approach for defining causal mediation effects with multiple mediators, which requires weaker assumptions for identification; for example, the IIEs remain identifiable under Figure 1(c). As we describe in the next section, the interventional effects approach considers hypothetical interventions on mediators, which helps guide intervention development (Moreno-Betancur & Carlin, 2018) and is increasingly popular in causal mediation analyses in applied research (X. Liu & Wang, 2024; X. Liu et al., 2025; Loh & Ren, 2022; Rubinstein, Haviland, & Breslau, 2023).
The purpose of this study is to examine methods for estimating the interventional effects with clustered data (e.g., students clustered within schools), a data structure frequently occurring in multiple-mediator analyses in behavioral sciences and can add complexities to causal inference. For example, the Education Longitudinal Study (ELS) has served as a major data source in multiple-mediator analyses for the effect of extracurricular participation () on adolescent academic outcome () via the adolescent’s educational expectations and beliefs ( and ), in which adolescents are clustered within schools (Ingels et al., 2007; Morris, 2016). The variables (, , , ) are all measured at the adolescent level, and could be affected by pretreatment covariates at not only the adolescent level (e.g., gender and race) but also at the school level (e.g., school climate).
From previous literature, clustering adds complexities to effect estimation, not only due to correlations within clusters but more importantly, due to the possibility of unmeasured cluster-level confounders (Chang & Stuart, 2022; Li et al., 2013). A key insight from the literature is that, methods adequately utilizing the clustered structure (e.g., by conditioning on clusters or cluster-mean centering) can protect against bias due to unmeasured cluster-level confounders (Suk & Kang, 2023; Zetterqvist et al., 2016). However, to our knowledge, the literature has not addressed handling clustering in the estimation of causal mediation effects in the multiple-mediator analysis (Figure 1), despite that multiple-mediator analyses with clustered data are frequently involved in the research process of educational and behavioral sciences.
Therefore, in this study, we develop methods to assess the interventional (in)direct effects of the multiple-mediator analysis (Figure 1) with clustered data. Specifically, we focus on a binary treatment and two discrete (binary) mediators. We consider that each individual’s cluster membership is an observed pretreatment characteristic. In the rest of this article, we first define the causal estimands and present the identification results. We then describe the estimation methods. Subsequently, we conduct simulations to evaluate the methods. We illustrate the application of the developed methods using data from the ELS.
Definition and Identification of Causal Mediation Effects
We start with describing the sampling process that generates the clustered sample. In the first stage, a simple random sample of clusters (e.g., schools) is selected from some study population; the selected clusters are indexed by . Each cluster serves a source population of individuals eligible for the study (e.g., adolescents). In the second stage, given that cluster () is selected in the first stage, a simple random sample of individuals is drawn from all eligible individuals within the cluster . The cluster size is set to represent the cluster’s relative population size, that is, is either the cluster’s population size or is set proportional to the cluster’s population size (Raudenbush & Bloom, 2015). With the sampling process, we have a clustered sample of total size . Let indicate each individual’s cluster membership, which is for individuals in cluster . Note that we have assumed simple random sampling in both stages: that is, we have equal sampling probabilities of clusters (the first stage), and given that a cluster is sampled, we have equal sampling probabilities of individuals within the cluster (the second stage); in the Supplemental Materials (Appendix A in the online version of the journal), we consider scenarios with unequal sampling probabilities in the first or second stages, and describe how our method may be modified by incorporating user-specified sampling weights in such scenarios.
After the sampling, baseline covariates are measured for each sampled cluster. The covariates have two types: (1) Measured cluster-level covariates on cluster (e.g., school type) that are shared by all individuals within the cluster, and (2) measured individual-level covariates that vary within the cluster (e.g., an individual’s demographics). Then, each individual selects or is assigned to either the treatment condition or the control condition . The treatment assignment may be affected by the cluster size (if varies), measured cluster-level covariates, unmeasured cluster-level confounder (e.g., school environment), and the individual’s covariates; that is, depends on . In a follow-up, the two mediators are measured on each individual; the mediators may be affected by the individual’s own treatment and pretreatment covariates, the cluster size, and measured/unmeasured cluster-level pretreatment covariates; that is, depends on . Finally, the outcome is measured on each individual; may depend on the individual’s own treatment, mediators, and pretreatment covariates, as well as the cluster size and measured/unmeasured cluster-level pretreatment covariates; that is, depends on .
To summarize, for each sampled cluster , we observed the cluster size, measured cluster-level covariates, and measured individual-level covariates, treatment, mediators, and outcome of individuals sampled within the cluster, written as with being the set of measured individual-level variables for each individual. We assume that the data are independently and identically distributed observations of . Furthermore, we assume that the cluster size is finite, following some distribution with finite support on positive integers (Wang et al., 2024).
Note that in our setting, the treatment variable is an individual-level variable varying within each cluster; this is to be distinguished from clustered designs with a cluster-level treatment variable (e.g., cluster randomized trials), in which all individuals in the same cluster are assigned to the same treatment value. Our setting resembles a blocked experiment that uses higher-level sampling units (e.g., schools) as blocks (often referred to as random blocks, e.g., Dong & Maynard, 2013 or structural blocks, e.g., Pashley & Miratrix, 2022) or multisite experiment (Raudenbush & Bloom, 2015), where the experimenter allocates individuals to treatment and control conditions within each cluster (within each block or within each site).
Figure 2 shows the relationships of the variables, where the mediators are dependent. The dependence structure of the mediators is unknown, analogous to Figure 1 in the unclustered setting.
Causal diagrams with dependent mediators. The data are clustered. (a)affects. (b) affects . (c)and share an unmeasured common cause (d) and have a casual relationship and share an unmeasured common cause
Causal Estimands
To define the effects, consider the counterfactual or potential outcomes (Rubin, 1974, 1978, 1980) for an individual. Let denote the potential outcome that an individual would have if the treatment is set to value . Furthermore, incorporating the mediators, let denote the potential outcome if the vector of treatment and mediators is set to value . For the two mediators and , let and be respectively the potential mediators if the treatment is set to value . We make the stable unit treatment value assumption (SUTVA; Rubin, 1980), which assumes that there are no different versions of each treatment value and each mediator value and that there is no interference among individuals. With clustered data, the no-interference assumption requires considerations given specific substantive contexts (Hong & Raudenbush, 2006). In this study, we assume the SUTVA throughout, and relaxing this assumption (e.g., allowing for interference within clusters) is an important future research topic. The SUTVA allows us to connect the potential variables to the observed data as follows: The observed mediators and equal the potential mediators and under the observed treatment; the observed outcome equals the potential outcome under the observed treatment and mediators.
In what follows, we first define the interventional (in)direct effects (Vansteelandt & Daniel, 2017) for a specific cluster; then, aggregating these cluster-specific effects, we define the cluster-average and individual-average versions of the effects.
Cluster-Specific Causal Mediation Effects
For a specific cluster (e.g., school ), the total effect of treatment on the outcome is
which is the average difference between each individual’s potential outcomes under treatment versus control conditions, averaged among the individuals in cluster . We can equivalently write this total effect by incorporating the mediators (VanderWeele & Vansteelandt, 2009): . How much do the mediators play a role in mediating the total effect? We address this next.
We first define the IIE (Vansteelandt & Daniel, 2017) via both mediators jointly, and the interventional direct effect (IDE) not through the mediators. To ease introducing the definition, consider a hypothetical intervention in cluster , which would assign the mediators for each individual to a random draw [denoted as ] from the mediators’ potential joint probability distribution under treatment value , given the individual’s pretreatment characteristics and cluster membership . Then, we define the and joint indirect effect () for cluster as follows:
The direct effect is the cluster-specific average effect of assigning each individual in the cluster to treatment versus control, while holding the mediators’ joint probability at what it would be under control (given the individual’s covariates and cluster membership). The joint indirect effect is the cluster-specific average effect of changing the joint probability of both mediators for each individual in the cluster, from what it would be under control to under treatment (given the individual’s covariates and cluster membership) while assigning the individual to the treatment . These two effects decompose the conditional expectation of the total effect as follows: Given and , the expected total effect for an individual in cluster is ,where the expectation is over the mediators’ distribution .
Next, we define the IIEs for each mediator. To do so, consider a hypothetical intervention in cluster , which would assign the two mediators and for an individual as two independent random draws, denoted by and , respectively. is a random draw from , the potential probability distribution of mediator under treatment value given the covariates and cluster membership . Analogously, is a random draw from . Then, we define the IIE of treatment on outcome via for cluster as
This effect via (4) is the cluster-specific average effect of changing the probability of mediator for each individual in the cluster, from what it would be under control to what it would be under treatment (given the individual’s covariates and cluster membership) while assigning the individual to the treatment and holding the probability of the mediator at what it would be under control. This effect (4) captures the part of the treatment effect mediated by mediator alone, but not by the mediator’s causal descendants (i.e., variables affected by the mediator). To illustrate, consider the diagrams in Figure 2. The IIE via captures the pathway in Figure 2(a), (c), and (d); while in Figure 2(b), the IIE via captures the combination of the pathways and .
Corresponding to , we define the IIE of treatment on outcome via mediator for cluster as
This effect via (5) is the effect of changing the probability of for each individual in cluster from what it would be under control to treatment while assigning the individual to the treatment and fixing the probability of at the distribution under treatment (given the individual’s covariates and cluster membership). In Figure 2, the IIE via captures the pathway in the diagrams of (b) and (c), and captures the combination of the pathways and in the diagrams of (a) and (d).
Finally, we define the IIE due to the mediators’ mutual dependence (Vansteelandt & Daniel, 2017), as the difference between the joint indirect effect (3) and the two separate indirect effects through each mediator (4) and (5):
This (6) expresses the effect of changing the strength of dependence between the mediators from under control to treatment, given the covariates and cluster membership . would be zero in expectation, when the potential mediators are independent given and cluster membership . Nonetheless, when the potential mediators are dependent, can be nonzero in expectation and capture part of the mediation pathways. In what follows, we collectively refer to the IDE (2), indirect effects via each mediator (4) and (5), and indirect effect due to mediators’ mutual dependence (6), as “causal mediation effects.”
Individual- and Cluster-Average Effects
With the clustered data, an average effect can be defined for two types of average units (Miratrix et al., 2021; Raudenbush & Bloom, 2015): (i) The average effect for an average cluster (i.e., weighting clusters equally), or the cluster-average effect and (ii) the average effect for an average individual (i.e., weighting individuals equally), or the individual-average effect. Specifically, let be one of the cluster-specific causal mediation effects, . Then, we can express the cluster-average effect, , and the individual-average effect, , as the following (weighted) averages of the cluster-specific effects:
where is the mean cluster size. The definition (7) is a sample version of the average effect definition, as the average is across the clusters in the sample.
Alternatively, we have the superpopulation version of the average effect definition (Miratrix et al., 2021), which will be the focus of our study. This definition considers the characteristics of the sampled clusters, including the cluster size and cluster-specific effect , are representative of some much broader population of clusters (i.e., the superpopulation with infinite clusters). Following the literature (Wang et al., 2024), we define the superpopulation version of the cluster-average effect, , and individual-average effect, , using the following expectations of the cluster-specific effect:
where the expectation “” is over the distribution of across repeated simple random sampling of a cluster, and then simple random sampling of individuals in the sampled cluster (with cluster size equal to or proportional to the cluster’s population size).
Identification
To estimate the causal mediation effects (defined above), the first step is to express these counterfactual-based estimands as functions only of the observed data distribution (i.e., identification). Extending Vansteelandt and Daniel (2017) to the clustered setting, we make the following assumptions for identification:
A1. The effect of treatment on outcome is conditionally unconfounded, ;
A2. The effect of both mediators on outcome is conditionally unconfounded,
A3. The effect of treatment on both mediators is conditionally unconfounded, .
Furthermore, we make a within-cluster positivity assumption as follows. Within a given cluster , there is a nonzero probability for each individual to have any value of the treatment and any value of the mediator vector so that and for all , , and values.
The above identification assumptions allow for unmeasured cluster-level confounder (Figure 2), by conditioning on the cluster membership and correspondingly making a stronger, within-cluster positivity assumption. Under A1 to A3 and the positivity assumption, we identify the cluster- and individual-average causal mediation effects as follows. For each cluster , we use to denote the conditional mean outcome , use to denote the joint probability distribution of both mediators , and use and to respectively denote the probability distribution of each mediator and .
Then, for the direct effect (IDE in Equation 2), the cluster-average and individual-average potential outcomes under are identified by
In the indirect effects (IIEs) via each mediator, the cluster-average and individual-average potential outcomes for are identified by
We discuss several components of the identification results (9) and (10). First, conditioning on the cluster membership implies conditioning on all cluster-level variables, including the cluster size and measured/unmeasured cluster-level covariates . Second, the expectation is over the distribution of the cluster size, cluster-level covariates, and individual-level covariates, , across the repeated sampling of clusters and then individuals within the clusters. Third, in the effect definition and identification, we have considered the cluster sizes to represent the relative population sizes of the clusters . By considering as equal to the cluster’s population size, this assumes that the individuals in the data at hand constitute the cluster’s population (e.g., all eligible children in a school). Alternatively, by considering as proportional to the cluster’s population size, this assumes that the individuals in the data at hand are a random sample representative of the cluster’s population (e.g., a random sample of all eligible children in a school). Fourth, the identification result (9) and (10) extends the previous causal mediation literature (Vansteelandt & Daniel, 2017) in several ways: (a) it accounts for unmeasured cluster-level confounders, by utilizing the clustered data structure; (b) it distinguishes between the cluster-average and individual-average effect types, both of which can be of substantive interest in the clustered setting; and (c) it explicitly considers varying cluster sizes , and via conditioning on , it allows for the influence of on the treatment, mediators, outcome, and their relationships.
With (9) and (10) together, we express the cluster-average and individual-average causal mediation effects (IIEs and IDE) as contrasts of the average potential outcomes following the effect definitions. For example, for the IIEs, the cluster-average version is identified by and ; for the IDE, the individual-average version is identified by .
Estimation
In the previous section, we have defined and identified the causal mediation effects. We now examine how to construct an effect estimator and make inferences. One approach could be a regression-based approach, which requires consistently estimating models for the mediators and outcome in the identification result (Loh et al., 2022; Vansteelandt & Daniel, 2017); another approach could be extending the weighting-based approach, which requires consistently estimating models for the mediators and treatment (Hong, 2015). These approaches often estimate the models parametrically, which could induce bias due to overly simplified modeling assumptions; furthermore, if the model estimation is data-adaptive (e.g., via model selection or machine learning methods), the sampling distribution of the resulting effect estimate is generally unknown, creating difficulties for uncertainty quantification and statistical inference. This motivates us to take a different estimation approach that helps overcome this issue.
Specifically, we develop an estimation method based on the multiply robust estimation approach (Benkeser & Ran, 2021). Multiply robust approaches (Kennedy, 2023; Tchetgen Tchetgen & Shpitser, 2012) often yield estimators with desirable properties (e.g., multiple robustness so that the estimator remains consistent even when one of the nuisance models is misspecified); furthermore, they allow the model estimation to be data-adaptive (e.g., via machine learning algorithms), while providing uncertainty quantification via asymptotic variance and Wald-type confidence intervals. For the multiple-mediator analysis, the multiply robust method with unclustered data has been developed (Benkeser & Ran, 2021; Rubinstein, Haviland, & Breslau, 2023). Nonetheless, it has not been extended to the clustered setting where unmeasured cluster-level confounders may exist (Figure 2). Therefore, in what follows, we first describe the multiply robust estimator. Then, we develop methods for (a) estimating the nuisance models allowing for unmeasured cluster-level confounders and (b) making inferences (variance estimation and confidence intervals) considering within-cluster correlations.
The Multiply Robust Estimator
The multiply robust estimator combines estimates of nuisance models for the outcome , mediators , and treatment , in a way that allows for errors in estimating any one model. To ease describing the estimator, we introduce some additional notation for the nuisance models. We denote the “fully marginalized mean outcomes” in the identification results as
These mean outcomes are “fully marginalized,” in the sense that they marginalize over both mediators: marginalizes the distributions of each mediator, and it will enter the estimator for the indirect effects via each mediator (IIEs); marginalizes the joint distribution of both mediators, and it will be part of the estimator for the direct effect (IDE). We also have the following “partially marginalized mean outcomes”:
which marginalize only one of the mediators, while still being a function of the other mediator.
Furthermore, we introduce the following mediator weights
In these mediator weights, the numerators are the mediator distributions in the identification results (10) and (9), and the denominator is the distribution of the mediators given the observed treatment. Intuitively, these weights adjust the observational distribution of the mediators to be the desired one (i.e., that in the identification result). Finally, we use to denote , the conditional probability of treatment, or the “propensity score” (Rosenbaum & Rubin, 1983).
The multiply robust estimator (Benkeser & Ran, 2021) combines estimates of the above nuisance models with the observed data as follows. For the identification result in (10), it has the following cluster-specific estimating function, which is a function of the observed data in cluster :
For the identification result in (9), the cluster-specific estimating function is
Further, we perform weight normalization for the above cluster-specific estimating functions. Weight normalization helps alleviate the potential adverse influence of unstable weights, which has been suggested in previous literature on multiply robust estimation (Farbmacher et al., 2022). Specifically, within each cluster, we divide each weighting term by the cluster-mean weight. That is, for , we divide the first line (13) by the cluster-mean weight , divide the second line (14) by , and divide the third line (15) by ; for , we divide the first line (16) by , and divide the second line (17) by .
After the weight normalization for Equations 13 to 17, we contrast the cluster-specific estimating functions to obtain the causal mediation effects:
Then, let be each of these cluster-specific estimating functions. We obtain multiply robust estimators for the cluster-average and individual-average effects as the following (weighted) averages:
where is the average cluster size (as denoted earlier). Analogous to the effect definitions (8), the effect estimator for the cluster-average effect weights clusters equally, and the effect estimator for the individual-average effect weights individuals equally (i.e., by weighting clusters proportionally to cluster sizes).
The multiply robust estimator (19) provides multiple robustness for consistent estimation of the causal mediation effects. Specifically, it would remain consistent under consistent estimation for at least one of the following sets of nuisance models (Benkeser & Ran, 2021): (a) the conditional mean outcome and the joint mediator distribution are consistently estimated; or (b) the propensity score , the conditional mean outcome, and either mediator distribution or , are consistently estimated; or (c) the propensity score and the joint mediator distribution are consistently estimated.
Estimating the Nuisance Models Under Unmeasured Cluster-Level Confounders
To control for the unmeasured cluster-level confounder , all the nuisance models are conditioned on the cluster membership . Conditioning on cluster membership implies conditioning on (a) the means of individual-level variables within the cluster and (b) a full set of dummy indicators of the cluster membership. Therefore, we implement the conditioning on the cluster membership in the nuisance model estimation via the following two approaches.
First, we control for cluster means and perform cluster-mean centering for individual-level variables in all the nuisance models. Specifically, cluster-mean centered variables are orthogonal to any cluster-level variable, including unmeasured cluster-level confounders; thus, cluster-mean centering helps remove the influence of unmeasured cluster-level confounders (Raudenbush & Bryk, 2002; Suk & Kang, 2023; Zetterqvist et al., 2016). In addition, while conditioning on cluster means of individual-level variables does not necessarily imply conditioning on the cluster membership (), variation in cluster means can result from the influence of unmeasured cluster-level variables; therefore, conditioning on the cluster means can also help control for unmeasured cluster-level confounders.
Second, we also control for dummy indicators of an individual’s cluster membership in all the nuisance models. Conditioning on the full set of cluster dummies necessarily implies conditioning on the cluster membership (), and has been a common approach to handling unmeasured cluster-level confounders in treatment effect estimation (Chang & Stuart, 2022; Li et al., 2013). A potential issue is that with many clusters, incorporating a large number of cluster dummies into the nuisance models may result in unstable model estimates, which in turn may reduce the precision of effect estimation (Li et al., 2013). Therefore, in the next section, we will explore omitting the cluster dummies (as an alternative to including the cluster dummies) in the nuisance model estimation through simulations.
Specifically, we implement the above two approaches to controlling for the unmeasured cluster-level confounder as follows. We first calculate the cluster means of individual-level covariates, treatment, mediators, and outcome: , , , , and . We also create a full set of cluster dummies, , where for individuals with cluster membership (i.e., ) and otherwise. Then, we incorporate these cluster means and cluster dummies into estimating the nuisance models for the treatment, mediators, and outcome.
For the treatment, we estimate its conditional probability (propensity score) using a regression model for , regressing it on the cluster-mean centered covariates , its cluster mean , measured cluster-level covariates , the cluster size , and the set of cluster dummies . The cluster-mean treatment serves as a cluster-level summary statistic to account for the influence of unmeasured cluster-level confounder (as described above), which was also used in propensity score estimation with unmeasured cluster-level confounders (Zetterqvist et al., 2016). Using the fitted model, we obtain the estimate (i.e., predicted value) of the conditional probability of treatment given an individual’s covariates and cluster membership: .
For the mediators, we estimate the joint probability distribution by factorizing it as the distribution of one mediator and the conditional distribution of the other mediator, . Once the joint distribution is estimated, we obtain the distribution of the other mediator as .
Specifically, to estimate the factorized joint mediator distribution, we fit a regression model for , regressing it on , , , , , and (i.e., the cluster-mean centered treatment and covariates, its cluster mean, measured cluster-level covariates, cluster size, cluster dummies). Using the fitted mediator model, we obtain the estimate for the probability of mediator given the treatment value , conditional on an individual’s covariates and cluster membership: . Then, to estimate the conditional mediator distribution, we fit a regression model for the other mediator , regressing it on , , , , , , , and . In this conditional mediator model, the interaction term is included for the treatment and the other mediator , as they are variables of interest; the interaction term is also centered at its cluster mean . From this fitted model, we obtain the estimate for the probability of mediator for given values of the treatment and the other mediator , conditional on an individual’s covariates and cluster membership: .
For the outcome, we estimate the conditional mean using a regression model for , regressing it on the cluster-mean centered covariates , the cluster-mean centered treatment and mediators and their interactions , , , , , , , cluster-mean outcome , measured cluster-level covariates , cluster size , and cluster dummies . In the outcome model, all interactions among the treatment and mediators are considered; these interaction terms are cluster-mean centered, where the cluster mean is the mean of the interaction term within the cluster, for example, . Using the fitted outcome model, we obtain the estimate of the conditional mean outcome for given values of the treatment and mediators , conditional on an individual’s covariates and cluster membership: .
Using Machine Learning to Estimate the Nuisance Models
To fit the above regressions and obtain nuisance model estimates, parametric modeling (e.g., generalized linear modeling) may be applicable when we have prior knowledge of correct model specifications; however, this knowledge is generally unavailable in practice. Therefore, we consider the use of machine learning methods to obtain the nuisance model estimates.
Based on the literature (Kennedy, 2023; Rubinstein, Branson, & Kennedy, 2023), although machine learning methods often yield nuisance model estimates with small errors (e.g., regularization errors), multiply robust estimators (including those presented above) are robust to small errors in the nuisance model estimates. Specifically, unlike regression-based estimation (i.e., directly plug-in nuisance model estimates into the identification results), with which the asymptotic bias contains first-order terms of errors of the nuisance model estimates, the multiply robust estimator has no first-order bias asymptotically. The asymptotic bias of the multiply robust estimator contains only products of errors of two estimated nuisance models, which is of second order (see Rubinstein, Branson, & Kennedy, 2023 for detailed expressions). Thus, the bias of the multiply robust estimator will be asymptotically negligible, when the product of errors in estimating any two nuisance models goes to zero at a rate faster than , which is satisfied when the nuisance models are estimated by correct parametric models or by machine learning methods with sufficiently fast convergence rate (achievable by commonly available machine learning methods such as lasso, random forest, and neural network; Díaz, 2020). Therefore, even when machine learning methods are used, the multiply robust estimator (Equation 19) minus the true effect would asymptotically behave as a sample average of independent, mean-zero random variables, which converges to a normal distribution at rate; this asymptotic normality permits the use of the variance estimator and Wald-type confidence intervals (described in the next subsection) for uncertainty quantification and inference.
With machine learning methods to estimate nuisance models, cross-fitting is necessary to obtain estimates (i.e., predicted values) of the nuisance models for calculating the multiply robust estimator (Chernozhukov et al., 2018; Zheng & van der Laan, 2011). Cross-fitting helps avoid bias due to overfitting issues that may occur with data-adaptive machine learning regressions (Chernozhukov et al., 2018). To perform cross-fitting while considering the presence of the unmeasured cluster-level confounder, we randomly split the dataset within clusters to obtain folds (e.g., five folds) of approximately equal number of individuals (so that each fold contains individuals from all clusters). For each fold, we let it be the validation dataset and use the remaining folds as the training dataset to fit the nuisance models. Then, using the fitted models (fitted with the training dataset), we obtain estimates (i.e., predicted values) for nuisance models of individuals in the validation dataset. After obtaining the nuisance model estimates for all folds, we use these estimates to calculate the multiply robust estimators (19).
Inference
As mentioned in the previous subsection, we obtain an asymptotic variance estimator and the Wald-type confidence interval for inference with the multiply robust estimator (19). We estimate the asymptotic variance as the empirical variance of the cluster-specific estimating function divided by the number of clusters , where for the cluster-average effect and for the individual-average effect; that is, the variance estimator is
where is taking the empirical variance. This variance estimator considers clusters as independent units while accounting for correlations of observations within clusters.
Then, using the variance estimator, we construct a Wald-type confidence interval using the -distribution. The use of -distribution (instead of the normal distribution) is to improve the confidence interval coverage when there are few clusters (Wang et al., 2024). Specifically, for the effects contrasting two potential outcomes, namely the indirect effects via each mediator (, ) and the direct effect (), we use the -distribution with degrees of freedom, and correspondingly, we multiply the variance estimator by . That is, the 95% interval for is: ; the 95% intervals for , and are obtained in the same way. For the indirect effect due to mutual dependence , which contrasts four potential outcomes, we use the degrees of freedom , and correspondingly multiply its variance estimator by ; thus, the 95% interval for is .
We conduct a simulation study to evaluate the proposed method. The R code for the simulations is available at https://osf.io/skcba/?view_only=None. We examined two types of sample size conditions: (1) a smaller number of larger clusters and (2) a larger number of smaller clusters. For (1), the number of clusters was J = 10, 20, or 40, with the cluster size uniformly distributed between 50 and 100, . For (2), the number of clusters was J = 40, 70, or 100, with uniformly distributed between 5 and 20, . Larger numbers of clusters often occur when large-scale surveys are data sources for analyses (e.g., 70–100 schools); smaller numbers of clusters can occur in smaller-scale multisite studies (e.g., 10 sites).
Data Generation
For each cluster , an unmeasured cluster-level confounder was generated from the standard normal distribution. For each individual in the cluster, three individual-level covariates were generated . Each was , a summation of its between-cluster component and within-cluster component ; and were each generated by normal distributions with means of zero and variances of 0.2 and 0.8, respectively. The unmeasured and measured , as well as cluster size , affected the treatment, mediators, and outcome. Specifically, let be the cluster size scaled to range [0, 1]. The data-generation models for , , , and were as follows (with being the indicator function):
where the treatment effects can be heterogeneous across clusters due to cluster size . Coefficients in the data-generation models are described in the Supplemental Materials (Appendix B in the online version of the journal).
Simulation Conditions
In addition to the sample size conditions ( and ), we varied the following data-generation conditions: (1) Null versus non-null effects and (2) linear versus nonlinear terms of covariates. Specifically, for (1), we simulated the null condition to evaluate the Type I error rates. In the null condition, all terms involving the treatment and mediators were set to zero in the data-generation model for . Thus, all the individual-average and cluster-average causal mediation effects were zero in the null condition. For (2), we manipulated the functional forms of the covariates to evaluate whether machine learning estimation for nuisance models can yield reduced bias compared to parametric estimation when the data generation has nonlinearities. In the “linear” scenario, we used the data-generation models as shown above (Equation 21), in which the covariates only had linear terms. In the “nonlinear” scenario, we replaced each covariate () with its quadratic term in all data-generation models (Equation 21) for , , , and .
Analysis Methods
The unmeasured cluster-level confounder was unavailable (i.e., omitted) in the observed data for estimation. Then, to control for the unmeasured , we examined the following options for including clusters in the nuisance model estimation. Option (A), labeled as “cluster means + cluster dummies,” is the method described above (in section “Estimation”). Option (B), labeled “cluster means,” omits the cluster dummies from Option (A). In both Option (A) and Option (B), the cluster size was controlled for and (which denotes a set of measured cluster-level covariates) was an empty set in all the nuisance models.
We also examined the following factors for estimating nuisance models. First, we compared (a) parametric estimation, labeled as “parametric,” versus (b) machine learning estimation, labeled as “nonparametric”. For the parametric estimation, we used generalized linear regressions via R function glm(). For the nonparametric estimation, we used fivefold cross-fitting to obtain the nuisance model estimates; each nuisance model was estimated by the super learner ensemble procedure (Polley & van der Laan, 2017; van der Laan et al., 2007) with generalized additive model and neural network via R package “SuperLearner” (i.e., “SL.gam” and “SL.nnet” were specified in R function “SuperLearner”). Second, we examined different orders of estimating the mediator distributions: (a) “M1 before M2” (i.e., the order described in section “Estimation”), where the joint mediator distribution is factorized as and we first estimated the distribution for and then for ; (b) “M2 before M1”, where the joint mediator distribution is factorized as and we first estimated the distribution for and then for . This simulation factor allows us to check that using the different factorization of the joint mediator distribution would not appreciably influence the simulation results.
With the nuisance model estimates yielded (i.e., from a combination of the above implementation factors), we obtained the estimate and 95% Wald-type confidence interval for each of the cluster-average and individual-average causal mediation effects, namely and with .
Evaluation Criteria
Under each condition, we run 1,000 simulation replications in the R software. We evaluated the performance for each of the cluster-average and individual-average causal mediation effects (i.e., and with ). For each effect, we calculated the true value by using the data-generation models to generate the potential outcomes for a large number of clusters ( clusters) and combining them to calculate the effect definition. Table B1 in the Supplemental Materials (Appendix B in the online version of the journal) lists the true effect values in the non-null condition; in the null condition, the true effect values were all zero.
To evaluate the performance, we used the following criteria: (1) bias, (2) mean squared error (MSE), (3) coverage rate of 95% confidence intervals, and (4) Type I error rate with 5% nominal significance level or statistical power. Specifically, for each effect, Bias = , where denotes the true value and denotes the estimate from the -th simulation replication. The MSE evaluates the overall accuracy, which was MSE = . The coverage rate was the proportion of replications where the confidence interval covered the true value. The type I error rate (under the null condition) or power (under the non-null condition) was the proportion of replications where the 95% confidence interval excluded zero.
Simulation Results
Figures B1 to B8 in the Supplemental Materials (Appendix B in the online version of the journal) display the simulation results for the individual-average causal mediation effects (IDE and IIEs) when they were non-null or null. The results for the corresponding cluster-average effects (Figures B9–B16 in the online version of the journal) show similar patterns. We observe several patterns in the simulation results.
Under the “linear” scenario, the multiply robust method had near zero bias, confidence interval coverage rates close to the nominal 95% (ranged from 91% to 99%), and Type I error rate close to the nominal 5% (ranged from 1% to 9%), across all combinations of implementation factors. The power of testing each effect increased with increasing the number of clusters , holding constant the cluster size condition. The MSE decreased with increasing . This is so for all the causal mediation effects, including both the cluster-average and individual-average versions. Comparing the nonparametric estimators with different methods to incorporate clusters, we observe that the method “cluster means” (incorporating the cluster means while omitting the cluster dummies) led to better power and MSE than the method “cluster means + cluster dummies” (incorporating both the cluster means and cluster dummies). Nonetheless, in Figures B8 and B16 (available in the online version of this article) under the null condition, with a relatively large number of clusters (), the other estimators (the nonparametric estimator with “cluster means” and the parametric estimators) led to more anti-conservative Type I error rates (e.g., 9%), while the nonparametric estimator with “cluster means + cluster dummies” yielded more satisfactory Type I error rates (4%–6%).
Under the “nonlinear” scenario, the bias was lower for the multiply robust estimators with nonparametric nuisance estimation than with parametric nuisance estimation. Comparing the nonparametric estimators, the MSE and power were better for the nonparametric estimator incorporating the cluster means while omitting the cluster dummies (“cluster means”), than for the nonparametric estimator incorporating both the cluster means and cluster dummies (“cluster means + cluster dummies”). Nonetheless, in the presence of the unmeasured cluster-level confounder , the most satisfactory confidence interval coverage and Type I error rates were yielded by the nonparametric estimator incorporating both the cluster means and cluster dummies (“cluster means + cluster dummies”). In addition, we observe that in the “nonlinear” scenario, the coverage rate and bias with the nonparametric estimator could become less satisfactory as the number of clusters increased, which is unlike the pattern in the “linear” scenario. This implies that in the “nonlinear” scenario, the errors of the estimated nuisance models did not go to zero at a sufficiently fast rate as the number of independent units (i.e., clusters) increased (Rudolph et al., 2024). Despite this, in most cases, the coverage rates were above 90% with the best-performing multiply robust estimator (i.e., the nonparametric estimator with “cluster means + cluster dummies”). The order of estimating mediator distributions (i.e., estimating before , or estimating before ) did not appreciably affect the performance in general. In the Supplemental Materials, we ran additional simulations (Appendix C in the online version of the journal) where we manipulated the nuisance model estimation in the “nonlinear” scenario to satisfy certain convergence rates, following the simulation approach in Rubinstein, Branson, and Kennedy (2023); results from these simulations (Figures C1–C6 in the online version of the journal) confirmed that when the convergence rates of the nuisance model estimation are sufficiently fast to satisfy the theoretical requirement for the multiply robust estimator (i.e., the product of errors in estimates of any two nuisance models goes to zero at rate ), the bias goes down to zero as increases and the confidence interval coverage maintains around 95%.
Empirical Example
To illustrate the developed method for multiple-mediator analysis with clustered data, we assess mediators that were hypothesized to play a role in mediating the impact of adolescent extracurricular activity participation (treatment ) on math outcome () using data from the ELS of 2002 (Ingels et al., 2007). Based on substantive research (Haghighat & Knifsend, 2019), we consider two mediators related to the adolescent’s educational expectation () and perceived importance of good grades (). (The dataset and R code used in this illustration are available at https://osf.io/skcba/?view_only=None.)
The analysis sample was 1,079 adolescents in public non-vocational high schools in rural areas. The eligible schools () were schools in which there were 10 or more sampled adolescents (i.e., cluster size 10) and the sampled adolescents had different treatment status (described below) on the extracurricular activity participation (i.e., the treatment proportion was above 0 and below 1). Across the schools, the number of sampled adolescents ranged from 10 to 20 with a median of 14. The schools in the analysis sample were considered to be a simple random sample from some underlying population of schools. Within each school, the sampled adolescents were considered as a simple random sample from some underlying population of adolescents served by the school, with the cluster size representing the relative population size of the school.
The treatment () was an adolescent’s reported participation in school-sponsored extracurricular activities (e.g., school band and school academic clubs) in the 2001 to 2002 school year (1 = yes, 0 = no). The outcome () was the math standardized test score in the first follow-up (i.e., 2004). The mediators ( and ) were based on the adolescent’s reports in the base-year assessment in spring of 2002. The mediator of educational expectation () was whether the adolescent expected to obtain a master’s degree or above (1 = yes, 0 = no). The mediator of the perceived importance of good grades () was whether the adolescent perceived good grades as very important (1 = yes, 0 = no). In the analysis, the controlled individual-level characteristics () were adolescent pretreatment characteristics, which were: The adolescent’s sex, race (Hispanic, Black, White, or other), English fluency, generational status (i.e., whether the student and mother born in the United States), Head Start program attendance, family socioeconomic status, family composition (mother and father, or other), number of siblings, parents’ highest level of education, total family income (above 50k or not), and home literacy resources. The controlled cluster-level covariate was the cluster size ().
We conducted the analysis using the developed multiply robust method. For the nuisance model estimation, we implemented the method described in the section on estimation (i.e., the method labeled as “nonparametric” and “cluster means + cluster dummies” in the simulations; the order of estimating mediator distributions was before ). In the implementation, we used fivefold cross-fitting to obtain estimates of the nuisance models. The estimation of each nuisance model was done by the super learner ensemble with generalized additive model, neural network, and generalized linear model (i.e., “SL.glm,”“SL.gam,” and “SL.nnet” via R package “SuperLearner”; van der Laan et al., 2007). Using the nuisance model estimates, we calculated the multiply robust estimator and the Wald-type 95% confidence interval for each of the estimands. Table 1 shows the results.
Results of the Multiple-Mediator Analysis in the Illustrative Example
Interventional Effect
Individual-Average
Cluster-Average
Estimate
95% CI
Estimate
95% CI
Direct ()
1.561
[0.151, 2.971]
1.511
[0.087, 2.934]
Indirect via educational expectation ()
0.563
[0.236, 0.891]
0.565
[0.235, 0.895]
Indirect via importance of good grades ()
0.276
[0.006, 0.546]
0.271
[0.000, 0.542]
Indirect via the mediators’ mutual dependence ()
0.037
[−0.047, 0.121]
0.034
[−0.058, 0.126]
Note. CI = confidence interval; IDE = interventional direct effect; = interventional indirect effects.
For the individual-average effects, there were positive estimates for the IIEs of extracurricular activity participation () on math achievement () via educational expectation and via the importance of good grades (i.e., and ). This suggests that for an average adolescent, an increase in the math achievement outcome would be achieved, by changing the probability distribution of each of the mediators (e.g., educational expectation, ) from what it would be if the adolescent had no-participation (i.e., treatment status ) to what it would be if the adolescent had participated (i.e., ) (given the adolescent’s pretreatment characteristics and school membership) while fixing the treatment status to no-participation and fixing the probability distribution of the other mediator (i.e., the importance of good grades, ). For the IDE of extracurricular activity participation on math achievement (i.e., ), the positive estimate indicates that, for an average adolescent, an increase in math achievement would be achieved by changing the adolescent’s treatment status from no-participation () to participation (), while fixing the mediators’ joint probability distribution to what it would be under no-participation (given the adolescent’s pretreatment characteristics and school membership).
For the cluster-average effects, the results were similar to those for the individual-average effects. Nonetheless, we would interpret these cluster-average effects as the effects of an average school. For example, consider the cluster-average indirect effect via educational expectation (). The positive estimate suggests that, for an average school, an increase in the school mean math achievement outcome would be achieved, by changing the probability distribution of educational expectation () for each adolescent in the school, from what it would be if the adolescent had no-participation () to what it would be if the adolescent had participated (i.e., ) (given the adolescent’s pretreatment characteristics and school membership) while fixing the treatment status at no-participation and fixing the probability distribution of the other mediator () at what it would be under no-participation (given the adolescent’s pretreatment characteristics and school membership).
For comparison, we also obtained estimation results ignoring the clustering in the nuisance model estimation (Table D1 in Appendix D of the Supplemental Materials in the online version of the journal). That is, we omitted all the cluster dummies and cluster means and did not perform cluster-mean centering in estimating the nuisance models. In these results omitting the clusters (Table D1), the effect estimates were smaller (in magnitudes) than the corresponding estimates in Table 1 controlling for the clusters (e.g., the estimate of the cluster-average direct effect was 1.511 in Table 1, whereas it was 0.823 in Table D1; similar patterns also apply to the estimates of the other effects). By omitting the clusters (i.e., school membership of an adolescent), the validity of these estimates (Table D1) is threatened by the possibility that certain school-level variables (e.g., school environment) could influence the adolescent’s participation in extracurricular activities (treatment), educational expectation and perceived importance of good grades (mediators), and the achievement outcome, even after conditioning on measured baseline covariates. Compared to Table 1 (which accounts for unmeasured school-level confounders), the results in Table D1 underestimated the interventional (in)direct effects of extracurricular activity participation on the achievement outcome, which could be resulting from bias due to uncontrolled school-level confounders in obtaining Table D1.
Discussion
For multiple-mediator analysis with clustered data, a common scenario in behavioral sciences, there has been limited methodological research on assessing the causal mediation effects. This study reduces this methodological gap by developing and evaluating the method to assess the cluster-average and individual-average causal mediation effects with preexisting clusters where the individual-level treatment, mediators, and outcome may be affected by unmeasured cluster-level confounders. Through simulations, we recommend using the multiply robust estimator with both the cluster means and cluster dummies included in the nuisance models, to adequately control for the unmeasured cluster-level confounder. In addition, when there is no prior knowledge of correct model specification, we recommend using data-adaptive methods (e.g., machine learning algorithms) to estimate the nuisance models. Below, we discuss considerations and extensions of the methods presented in this study.
Choosing Between the Estimands
We have provided methods for both the cluster-average and individual-average versions of the effects. In practice, the choice between the different versions would depend on which version is more relevant to the research question in the substantive context (Kahan et al., 2023; Miratrix et al., 2021). Suppose in a substantive application, the investigator is interested in the effect (one of the causal mediation effects) for all individuals (i.e., considering individuals equally), the individual-average effects would be more relevant. For instance, when the treatment is a policy affecting all individuals, a policy investigator may be interested in the policy’s impact on a typical individual (i.e., individual-average effects). In comparison, when the substantive investigator is interested in the effect of a cluster (e.g., how the school mean outcome would be affected by a school’s decision about intervention assignments for individuals in the school), the cluster-average effects would be more relevant, as they capture the effect for a typical cluster.
Limitations and Future Directions
The current study has limitations, and several topics await future research. First, for estimating the mediator distributions, it is possible to circumvent the estimation by extending recent methods (e.g., alternative representations of multiply robust estimators) in the literature on nonparametric mediation estimation (R. Liu et al., 2024; Rudolph et al., 2024). Second, future research can further evaluate and improve upon the multiply robust method with clustering presented here. For instance, alternative options for handling unmeasured cluster-level confounders await investigation, such as grouping similar clusters (Lee et al., 2021). For the variance estimation using correct parametric nuisance models, the finite sample performance could be potentially improved by incorporating uncertainty in estimating the parametric nuisance models (e.g., by stacking estimating equations of the model parameters); on this topic, recent progress has been made with weighting-based estimation (Bein et al., 2018), and future research may explore extensions to the multiply robust estimator. For the cross-fitting procedure that combines the multiply robust estimator with machine learning estimation of nuisance models, the impact of the number of folds could be worthwhile for future evaluation; while recent research found that the impact tends to be minor (Ellul et al., 2024), it is unclear whether this is generalizable to the multiple-mediator analysis with clustered data.
Third, with clustered data, the no-interference assumption would be violated when individuals sharing the same cluster membership may interact and influence each other’s outcomes (Hong & Raudenbush, 2006). In this case, we may make a cluster-level version of the SUTVA that allows for interference within the same clusters while assuming no interference between different clusters. Under the cluster-level SUTVA, it can be interesting to examine a cluster-level version of the estimands, such as causal mediation effects of the cluster-mean treatment via cluster means mediators (VanderWeele et al., 2013). Furthermore, the handling of missing data in clustered designs (Enders et al., 2020; X. Liu, 2024b) for multiple-mediator analyses could await more research.
Fourth, sensitivity analysis could be an important future research area in multiple-mediator analysis with clustered data. For unclustered data, fruitful developments have been made in quantifying the sensitivity to unmeasured confounders in mediation analyses (X. Liu et al., 2024; Park & Esterling, 2021; Rubinstein, Branson, & Kennedy, 2023; Zhang & Ding, 2023); for example, Zhang and Ding (2023) derived omitted-variable bias formulas using partial correlations, and extended the robustness value approach (Cinelli & Hazlett, 2020) to mediation analysis. It could be useful to extend the sensitivity analysis literature to the interventional (in)direct effects with clustered data.
Fifth, exploring the inference for the sample version of the effects could be an interesting direction, such as in the context of randomized experiments (Ding et al., 2017; Miratrix et al., 2021). For the sample effects, we make inferences viewing the sample at hand as the inferential population. Correspondingly, we would consider the clusters in the sample as fixed, and the uncertainty quantification would no longer incorporate the randomness due to cluster sampling. To quantify uncertainty in inference on the sample effects, we could take the randomization-based perspective (e.g., Part II in Ding, 2024). For instance, a (hypothetical) randomized experiment for assessing the interventional (in)direct effects (Moreno-Betancur & Carlin, 2018) could entail randomizing the treatment and then randomizing the mediators given the treatment, covariates, and cluster membership. Then, in making inferences for the sample effects, we would consider quantifying uncertainty in the observed outcomes with respect to randomness due to random assignments of the treatment and mediators. A related direction is to examine the inference of mediation for a finite population; for example, Abadie et al. (2023) proposed an alternative inferential framework for clustered data, allowing the sampled clusters to constitute a possibly large fraction of a finite population at hand (e.g., states in the United States). Furthermore, we have focused on two-level clustering (i.e., assuming the clusters as independent units). Nonetheless, designs with three or more levels of clustering and partially clustered designs also occur frequently in educational and behavioral research (Hedges & Borenstein, 2014; X. Liu, 2024a; X. Liu et al., 2023; Lohr et al., 2014). With more complex clustering structures, developing and evaluating methods for causal inference in multiple-mediator analyses await future studies.
Supplemental Material
sj-pdf-1-jeb-10.3102_10769986251318093 – Supplemental material for Estimating Causal Mediation Effects in Multiple-Mediator Analyses With Clustered Data
Supplemental material, sj-pdf-1-jeb-10.3102_10769986251318093 for Estimating Causal Mediation Effects in Multiple-Mediator Analyses With Clustered Data by Xiao Liu in Journal of Educational and Behavioral Statistics
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
ORCID iD
Xiao Liu
Author
XIAO LIU is an assistant professor in the Department of Educational Psychology, The University of Texas at Austin, 1912 Speedway, Austin, TX 78712; e-mail: xiao.liu@austin.utexas.edu. Her research interests include quantitative methods for causal inference, mediation analysis, and clustered data analysis.
References
1.
AbadieA.AtheyS.ImbensG. W.WooldridgeJ. M. (2023). When should you adjust standard errors for clustering?The Quarterly Journal of Economics, 138(1), 1–35. https://doi.org/10.1093/qje/qjac038
BaronR. M.KennyD. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1882. https://doi.org/10.1037/0022-3514.51.6.1173
4.
BeinE.DeutschJ.HongG.PorterK. E.QinX.YangC. (2018). Two-step estimation in ratio-of-mediator-probability weighted causal mediation analysis. Statistics in Medicine, 37(8), 1304–1324. https://doi.org/10.1002/sim.7581
5.
BenkeserD.RanJ. (2021). Nonparametric inference for interventional effects with multiple mediators. Journal of Causal Inference, 9(1), 172–189. https://doi.org/10.1515/jci-2020-0018
6.
ChangT.-H.StuartE. A. (2022). Propensity score methods for observational studies with clustered data: A review. Statistics in Medicine, 41(18), 3612–3626. https://doi.org/10.1002/sim.9437
7.
ChernozhukovV.ChetverikovD.DemirerM.DufloE.HansenC.NeweyW.RobinsJ. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68. https://doi.org/10.1111/ectj.12097
8.
CinelliC.HazlettC. (2020). Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1), 39–67. https://doi.org/10.1111/rssb.12348
DingP.LiX.MiratrixL. W. (2017). Bridging finite and super population causal inference. Journal of Causal Inference, 5(2), 20160027. https://doi.org/10.1515/jci-2016-0027
13.
DongN.MaynardR. (2013). Powerup!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24–67. https://doi.org/10.1080/19345747.2012.673143
14.
EllulS.CarlinJ. B.VansteelandtS.Moreno-BetancurM. (2024, May). Causal machine learning methods and use of sample splitting in settings with high-dimensional confounding [arXiv:2405.15242 [stat]]. https://doi.org/10.48550/arXiv.2405.15242
15.
EndersC. K.DuH.KellerB. T. (2020). A model-based imputation procedure for multilevel regression models with random coefficients, interaction effects, and nonlinear terms. Psychological Methods, 25(1), 88–112. https://doi.org/10.1037/met0000228
16.
FarbmacherH.HuberM.LafférsL.LangenH.SpindlerM. (2022). Causal mediation analysis with double machine learning. The Econometrics Journal, 25(2), 277–300. https://doi.org/10.1093/ectj/utac003
17.
HaghighatM. D.KnifsendC. A. (2019). The longitudinal influence of 10th grade extracurricular activity involvement: Implications for 12th grade academic practices and future educational attainment. Journal of Youth and Adolescence, 48(3), 609–619. https://doi.org/10.1007/s10964-018-0947-x
18.
HedgesL. V.BorensteinM. (2014). Conditional optimal design in three- and four-level experiments. Journal of Educational and Behavioral Statistics, 39(4), 257–281. https://doi.org/10.3102/1076998614534897
19.
HongG. (2015). Causality in a social world: Moderation, mediation and spill-over. John Wiley & Sons.
20.
HongG.RaudenbushS. W. (2006). Evaluating kindergarten retention policy. Journal of the American Statistical Association, 101(475), 901–910. https://doi.org/10.1198/016214506000000447
21.
ImaiK.KeeleL.TingleyD. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334. https://doi.org/10.1037/a0020761
22.
ImaiK.YamamotoT. (2013). Identification and sensitivity analysis for multiple causal mechanisms: Revisiting evidence from framing experiments. Political Analysis, 21(2), 141–171. https://doi.org/10.1093/pan/mps040
23.
IngelsS. J.PrattD. J.WilsonD.BurnsL. J.CurrivanD.RogersJ. E.Hubbard-BednaszS. (2007). Education longitudinal study of 2002 (ELS: 2002): Base-year to second follow-up data file documentation (NCES 2008-347). National Center for Education Statistics.
24.
KahanB. C.LiF.CopasA. J.HarhayM. O. (2023). Estimands in cluster-randomized trials: Choosing analyses that answer the right question. International Journal of Epidemiology, 52(1), 107–118. https://doi.org/10.1093/ije/dyac131
25.
KennedyE. H. (2023). Semiparametric doubly robust targeted double machine learning: A review. http://arxiv.org/abs/2203.06469
26.
LeeY.NguyenT. Q.StuartE. A. (2021). Partially pooled propensity score models for average treatment effect estimation with multilevel data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4), 1578–1598. https://doi.org/10.1111/rssa.12741
27.
LiF.ZaslavskyA. M.BethM. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373–3387. https://doi.org/10.1002/sim.5786
28.
LiuR.WilliamsN. T.RudolphK. E.DíazI. (2024, August). General targeted machine learning for modern causal mediation analysis. https://doi.org/10.48550/arXiv.2408.14620
29.
LiuX. (2024a). Assessing heterogeneous causal effects across clusters in partially nested designs. Psychological Methods. Advanced online publication. https://doi.org/10.1037/met0000723
30.
LiuX. (2024b). Propensity score weighting with missing data on covariates and clustered data structure. Multivariate Behavioral Research, 59(3), 411–433. https://doi.org/10.1080/00273171.2024.2307529
31.
LiuX.EddyJ. M.MartinezC. R. (2025). Causal estimands and multiply robust estimation of mediated-moderation. Multivariate Behavioral Research, 1–27. Advanced Online Publication https://doi.org/10.1080/00273171.2024.2444949
32.
LiuX.LiuF.Miller-GraffL.HowellK. H.WangL. (2023). Causal inference for treatment effects in partially nested designs. Psychological Methods, 29(3), 457–479. https://doi.org/10.1037/met0000565
33.
LiuX.WangL. (2024). Causal mediation analysis with the parallel process latent growth curve mediation model. Structural Equation Modeling: A Multidisciplinary Journal, 31(6), 983–1004. https://doi.org/10.1080/10705511.2024.2347588
34.
LiuX.ZhangZ.ValentinoK.WangL. (2024). The impact of omitting confounders in parallel process latent growth curve mediation models: Three sensitivity analysis approaches. Structural Equation Modeling: A Multidisciplinary Journal, 31(1), 132–150. https://doi.org/10.1080/10705511.2023.2189551
35.
LohW. W.MoerkerkeB.LoeysT.VansteelandtS. (2022). Disentangling indirect effects through multiple mediators without assuming any causal structure among the mediators. Psychological Methods, 27(6), 982–999. https://doi.org/10.1037/met000314
36.
LohW. W.RenD. (2022). Improving causal inference of mediation analysis with multiple mediators using interventional indirect effects. Social and Personality Psychology Compass, 16(10), e12708. https://doi.org/10.1111/spc3.12708
37.
LohrS.SchochetP.SandersE. (2014). Partially nested randomized controlled trials in education research: A guide to design and analysis. National Center for Education Research. https://doi.org/https://ies.ed.gov/ncer/pubs/20142000/index.asp
38.
MacKinnonD. P. (2008). Introduction to statistical mediation analysis. Taylor & Francis Group/Lawrence Erlbaum Associates.
39.
MiratrixL. W.WeissM. J.HendersonB. (2021). An applied researcher’s guide to estimating effects from multisite individually randomized trials: Estimands, estimators, and estimates. Journal of Research on Educational Effectiveness, 14(1), 270–308. https://doi.org/10.1080/19345747.2020.1831115
40.
Moreno-BetancurM.CarlinJ. B. (2018). Understanding interventional effects: A more natural approach to mediation analysis?Epidemiology, 29(5), 614. https://doi.org/10.1097/EDE.0000000000000866
41.
MorrisD. S. (2016). Extracurricular activity participation in high school: Mechanisms linking participation to math achievement and 4-year college attendance. American Educational Research Journal, 53(5), 1376–1410. https://doi.org/10.3102/0002831216667579
42.
ParkS.EsterlingK. M. (2021). Sensitivity analysis for pretreatment confounding with multiple mediators. Journal of Educational and Behavioral Statistics, 46(1), 85–108. https://doi.org/https://doi.org/10.3102/1076998620934500
43.
PashleyN. E.MiratrixL. W. (2022). Block what you can, except when you shouldn’t. Journal of Educational and Behavioral Statistics, 47(1), 69–100. https://doi.org/10.3102/10769986211027240
44.
PearlJ. (2001). Direct and indirect effects. In BreeseJ.KollerD. (Eds.), UAI’01: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 411–420). Morgan Kaufmann Publishers. http://dl.acm.org/citation.cfm?id=2074022.2074073
QinX.DeutschJ.HongG. (2021). Unpacking complex mediation mechanisms and their heterogeneity between sites in a job corps evaluation. Journal of Policy Analysis and Management, 40(1), 158–190. https://doi.org/10.1002/pam.22268
47.
RaudenbushS. W.BloomH. S. (2015). Learning about and from a distribution of program impacts using multisite trials. American Journal of Evaluation, 36(4), 475–499. https://doi.org/10.1177/1098214015600515
48.
RaudenbushS. W.BrykA. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1). Sage.
RosenbaumP. R.RubinD. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41
51.
RubinD. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
52.
RubinD. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, 6(1), 34–58. https://doi.org/http://www.jstor.org/stable/2958688
53.
RubinD. B. (1980). Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75(371), 591–593. https://doi.org/10.2307/2287653
54.
RubinsteinM.BransonZ.KennedyE. H. (2023). Heterogeneous interventional effects with multiple mediators: Semiparametric and nonparametric approaches. Journal of Causal Inference, 11(1), 20220070. https://doi.org/10.1515/jci-2022-0070
55.
RubinsteinM.HavilandA.BreslauJ. (2023). The effect of COVID-19 vaccinations on self-reported depression and anxiety duringFebruary2021. Statistics and Public Policy, 10(1), 2190008. https://doi.org/10.1080/2330443X.2023.2190008
SukY.KangH. (2023). Tuning random forests for causal inference under cluster-level unmeasured confounding. Multivariate Behavioral Research, 58(2), 408–440. https://doi.org/10.1080/00273171.2021.1994364
58.
Tchetgen TchetgenE. J.ShpitserI. (2012). Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness and sensitivity analysis. The Annals of Statistics, 40(3), 1816–1845. https://doi.org/10.1214/12-aos990
59.
van der LaanM. J.PolleyE. C.HubbardA. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(1). https://doi.org/10.2202/1544-6115.1309
60.
VanderWeeleT. J. (2015). Explanation in causal inference: Methods for mediation and interaction. Oxford University Press.
61.
VanderWeeleT. J.HongG.JonesS. M.BrownJ. L. (2013). Mediation and spillover effects in group-randomized trials: A case study of the 4Rs educational intervention. Journal of the American Statistical Association, 108(502), 469–482. http://www.jstor.org/stable/24246456
62.
VanderWeeleT. J.VansteelandtS. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2(4), 457–468. https://doi.org/10.4310/SII.2009.v2.n4.a7
VanderWeeleT. J.VansteelandtS.RobinsJ. M. (2014). Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology, 25(2), 300–306. https://doi.org/10.1097/EDE.0000000000000034
65.
VansteelandtS.DanielR. M. (2017). Interventional effects for mediation analysis with multiple mediators. Epidemiology, 28(2), 258–265. https://doi.org/10.1097/EDE.0000000000000596
66.
WangB.ParkC.SmallD. S.LiF. (2024). Model-robust and efficient covariate adjustment for cluster-randomized experiments. Journal of the American Statistical Association, 119(548), 2959–2971. https://doi.org/10.1080/01621459.2023.2289693
67.
ZetterqvistJ.VansteelandtS.PawitanY.SjölanderA. (2016). Doubly robust methods for handling confounding by cluster. Biostatistics, 17(2), 264–276. https://doi.org/10.1093/biostatistics/kxv041
68.
ZhangM.DingP. (2023). Interpretable sensitivity analysis for the Baron-Kenny approach to mediation with unmeasured confounding (arXiv:2205.08030). arXiv. https://doi.org/10.48550/arXiv.2205.08030
69.
ZhengW.van der LaanM. J. (2011). Cross-validated targeted minimum-loss-based estimation. In van der LaanM. J.RoseS. (Eds.), Targeted learning: Causal inference for observational and experimental data (pp. 459–474). Springer. https://doi.org/10.1007/978-1-4419-9782-1
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.