Sage Journals: Discover world-class research

Abstract

Causal inference on populations embedded in social networks poses technical challenges, since the typical no-interference assumption frequently does not hold. Existing methods developed in the context of network interference rely upon the assumption of no unmeasured confounding. However, when faced with multilevel network data, there may be a latent factor influencing both the exposure and the outcome at the cluster level. We propose a Bayesian inference approach that combines a joint mixed-effects model for the outcome and the exposure with direct standardisation to identify and estimate causal effects in the presence of network interference and unmeasured cluster confounding. In simulations, we compare our proposed method with linear mixed and fixed effects models and show that unbiased estimation is achieved using the joint model. Having derived valid tools for estimation, we examine the effect of home environment on adolescent school performance using data from the National Longitudinal Study of Adolescent Health.

Keywords

Causal inference network interference Bayesian inference unmeasured confounding

1. Introduction

Much of the theoretical development in causal inference has been within the counterfactual framework, which typically employs a ‘no interference’ assumption, in that the potential outcome of an individual is not affected by the treatments of another. In many sociological settings, the ‘no interference’ assumption is implausible, and traditional methods will fail whenever social interaction influences the mechanisms relating a treatment to an outcome. In such cases, the treatment or exposure assigned to an individual will also affect their close neighbours’ outcomes, yielding spillover or indirect effects. A highly relevant example is in studies of adolescents, where the influence of peers takes on a much greater role than in early childhood or later life. For instance, access to drugs in the home (exposure) may affect that teen’s HIV risk and that of their friends, as adolescents are particularly susceptible to peer influence. A tobacco cessation intervention in a high school setting might not only affect the uptake of its direct recipients but also that of their contacts through word of mouth and social influence. Similarly, it is entirely plausible that an adolescent’s academic performance is influenced both by their home environment and that of their peers.

Peer influence has long been of interest to social scientists, since it can potentially inform the optimal organisation of schools, neighbourhoods, workplaces, and other structures whereby individuals interact. In educational research, there has been mounting evidence suggesting that classroom composition, notably in terms of ability or achievement distribution, can have an influence on learning outcomes.^1,2 For instance, using administrative public school data gathered by the Texas Education Agency, Hoxby³ found that a change in one point in peers’ reading scores is associated with an increase in a student’s own score between 0.15 and 0.40. Leveraging data from the National Longitudinal Study of Adolescent Health (Add Health), Bifulco et al.⁴ found that having a higher proportion of classmates with college-educated mothers was associated with a lower risk of dropping out and a higher likelihood of attending college. Bifulco et al.⁴ also hypothesised about a spillover effect of maternal educational attainment on students’ post-secondary health behaviors, such as binge drinking, cigarette smoking, and marijuana use, but the data did not provide evidence of such an effect. Fletcher et al.⁵ made use of the friendship nominations data in Add Health and found that a greater proportion of friends with college-educated mothers is associated with a higher grade point average (GPA), with evidence of a differential effect by sex. This body of evidence supports the idea of spillover effects of home environment on academic performance, and possibly health behaviour outcomes.

Social network data are often collected when such interference is at play, and many of the settings in which spillover effects are being estimated are multilevel (e.g. social networks of students nested within schools). The Add Health study is one example of such a multilevel structure. When faced with multilevel network data, such as one might find in schools or geographical communities, for example, there may be an unmeasured, latent factor influencing both the exposure and the outcome at the cluster level. This tendency for peers, such as schoolmates or neighbourhood residents, to share some unobserved contextual factors is termed contextual confounding.⁶ In the context of adolescent school performance, there could be a latent trait influencing both the average maternal education as well as the adolescents’ overall performance at the school level. If it is unmeasured, such a latent trait could be geographical location, for instance, since it may encompass area-level violence, pollution, or access to quiet learning spaces. Another potential source of network confounding is homophily, which is defined as the increased tendency of units with similar characteristics to form ties. As opposed to contextual confounding, homophily confounding arises at the individual rather than the cluster level and pertains to the network structure within clusters. In this paper, the main focus is on unmeasured contextual confounding, and latent homophily will either be assumed to be not present or removed by conditioning on all the variables driving tie formation in simulations.

Causal inference is inherently a missing data problem, and the problem of unmeasured contextual confounding can be conceived as a non-ignorable missingness mechanism.⁷ Non-ignorable missingness or treatment assignment mechanisms are typically addressed by imposing additional parametric assumptions. In the frequentist paradigm, joint mixed-effects models (JMMs) were originally proposed to handle non-ignorable missing data when estimating the parameters of an outcome data-generating model.^8,9 Shardell and Ferrucci¹⁰ combined parametric JMMs for a study outcome and an exposure with g-computation to identify and estimate causal effects in longitudinal settings under unmeasured confounding. In the Bayesian paradigm, Papadogeorgou and Samanta¹¹ proposed an approach based on simultaneous modelling of the exposure and outcome to account for the presence of spatially structured unmeasured confounders. Nobre et al.¹² used a Bayesian multilevel approach to study the effect of directly observed therapy on the rate of cure of tuberculosis while mitigating the impact of unobserved contextual factors shared by individuals within different municipalities across the state of São Paulo, Brazil. Xu et al.¹³ proposed Bayesian multivariate generalised linear mixed-effects models (LMMs) to assess dynamic treatment regimes while accounting for unmeasured patient heterogeneity. The aforementioned methods employ an outcome modelling strategy while leveraging the treatment assignment to learn about the cluster-level confounders. Other methods based on propensity score modelling have been proposed to reduce bias from unmeasured cluster-level confounding,^14–17 but these approaches are suitable for multilevel designs where there are no spillover effects through interference.

Existing methods that accommodate multilevel networks, be it inverse probability-of-treatment weighting, outcome regression, or doubly robust estimation, predominantly rely upon the assumption of no unmeasured confounding.^18–20 In this work, we address an important gap by relaxing the unconfoundedness assumption and providing a Bayesian framework for causal inference in the multilevel network setting, in which heterogeneity is due to clustering of individuals in different subgraphs. We propose an approach based on a JMM combined with Bayesian standardisation to estimate direct and indirect effects of a binary treatment on a continuous outcome in the presence of unmeasured contextual confounding. We study frequentist properties of the JMM approach in simulation studies, which show that valid inference can be obtained in comparison with more naive approaches that either assume ignorability of the treatment assignment (LMM) or ignore the clustered nature of the data altogether (fixed effects model (FEM)).

The paper is structured as follows. In Section 2, we introduce the Add Health study and our motivating question examining the direct and indirect effect of maternal education on school performance. In Sections 3 and 4, we introduce the counterfactual framework and causal assumptions under the network interference setting and the relevant causal estimands for this setting. In Section 5, we describe the proposed JMM approach along with the Bayesian standardisation procedure used to perform posterior inference for causal effects. In Section 6, we assess frequentist properties of the proposed estimator in simulation studies. The method is then utilised to assess the direct and spillover effects of maternal college education on adolescent school performance using the Add Health network data in Section 7. We close with a general discussion in Section 8.

2. The Add Health study

Add Health is a longitudinal study of a nationally representative sample of 90,118 adolescent students in grades 7 to 12 in the United States in 1994–1995 who were followed throughout adolescence.²¹ The original purpose of the study was to understand the determinants of adolescent health and health behaviour. Add Health used a school-based design that selected 80 high schools and associated feeder schools (typically a middle school) using a sampling frame derived from the Quality Education Database. The final sample comprised 132 schools with communities located in urban, suburban, and rural areas of the country.

In addition to socio-demographic and academic achievement characteristics, peer network data were obtained in the in-school questionnaire, in which adolescents nominated up to five best male and five best female friends from the school roster. We mapped an undirected network to friendship nomination data from the first wave of data collection in 1994–1995 of the Add Health. Following a reciprocal definition of friendship, we drew an edge between students $i$ and $j$ if student $i$ listed $j$ as a friend in the in-school survey, or $j$ listed $i$ as a friend, or both. In other words, friendships that were not mutual were retained in the graph. For the purpose of this analysis, we selected a random sample of 16 schools, which allowed us to reduce the computational burden. Figure 1 represents the three smallest schools in the subsample, with students with a college-educated mother represented by black nodes. The network characteristics and distribution of attributes are summarised in Tables S4 and S5, respectively, which can be found in Supplemental Appendix D. The Add Health subnetwork has an average degree of 6.65 (standard deviation = 3.88) and a density of 0.0008. The transitivity, which measures the extent to which nodes tend to cluster together, is 0.171. Perfect transitivity implies that if two nodes $x$ and $z$ are connected to another node $y$ , then systematically $x$ and $z$ are also connected. The assortativity by maternal education, which is 0.123 globally, is the extent to which nodes with matching values of maternal education tend to form ties in the network.

Figure 1.

Three schools in the Add Health network of 71, 159, and 160 students, respectively, where an edge between two students indicates that either one nominated the other as a friend, and black nodes represent students with college-educated mothers. Note that ties that exist across different schools are not shown in this visualisation.

Previous work has examined the causal effect of peers on academic success under a counterfactual framework, focusing on the indirect (spillover) effects of having a college-educated mother on GPA.²⁰ As individual-level data are available for each school in Add Health, it is expected that students attend the same school and thus reside in the same area share latent traits affecting both their propensity to have a college-educated mother and their overall grades. Thus, we suspect unmeasured subgraph-level confounders influencing both the overall socio-economic status and the overall academic performance at the school level. This raises the question of whether subgraph-level random effects should be included when modelling the quantities of interest. Therefore, we follow the Bayesian paradigm and discuss a procedure that jointly models propensity score and the outcome to estimate direct and indirect effects of maternal education on academic performance. Throughout the text, we will refer to maternal education as a (self-selected) treatment.

3. Notation and causal assumptions

Consider an undirected network $G = (N, E)$ , where $N = {1, 2, \dots, N}$ is a set of $N$ nodes and $E$ is a set of edges $E$ with generic element ${i, j} = {j, i}$ denoting the presence of an edge between nodes $i$ and $j$ . The sociomatrix $A$ associated with $G$ has an entry $A_{i j} = 1$ if ${i, j}$ is in $E$ , and 0 otherwise. We assume that the set of nodes in this network is clustered in that it can be written as a partition of $m$ sets, such that $N = \cup_{ν = 1}^{m} N_{ν},$ where $N_{ν} \cap N_{υ} = \emptyset$ for $ν \neq υ$ and each set is of order $| N_{ν} | = N_{ν}$ . The $m$ disjoint node sets could, for instance, correspond to distinct schools or classrooms. We define $G_{ν} = (N_{ν}, E_{ν})$ to be the induced subgraph associated with the set of nodes $N_{ν}$ . Note that this definition does not preclude the existence of ties across the different disjoint node sets $N_{ν}$ , which implies that the induced subgraphs ${G_{ν}, ν = 1, \dots, m}$ are not necessarily disjoint. We define the neighbourhood $N_{ν i}$ of node $i$ in cluster $ν$ , $i = 1, 2, \dots, N_{ν}$ , as the set of nodes $j$ for which ${i, j} \in E_{ν}$ and the degree of node $i$ is denoted $d_{ν i} = | N_{ν i} |$ . Since the induced subgraphs ${G_{ν}, ν = 1, \dots, m}$ are not necessarily disjoint, nodes in the immediate neighbourhood $N_{ν i}$ may belong to a different subset than $N_{ν}$ , say $N_{υ}$ , $υ \neq ν$ . Additionally, we denote $N_{ν i}^{*} = N_{ν i} \cup i$ as the union of the node $i$ and the nodes in $N_{ν i}$ . It follows that each node $i$ is associated with a partition of $N$ , that is, ${i, N_{ν i}, N_{ν - i}}$ , where $N_{ν - i} = N ∖ N_{ν i}^{*}$ .

A node $i$ in cluster or subgraph $ν$ has an observed outcome $Y_{ν i}$ , an assigned binary treatment $Z_{ν i}$ , and a vector of covariates $X_{ν i}$ . Covariates may include an intercept, unit $i$ ’s individual covariates, or covariates corresponding to unit $i$ ’s neighbourhood or subgraph. Let $Z_{N_{ν i}} = (Z_{j_{1}}, \dots, Z_{j_{d_{ν i}}})^{'}$ denote the observed neighbourhood treatment vector for node $i$ in subgraph $ν$ , where $j_{k} \in N_{ν i}, k = 1, \dots, d_{ν i}$ . We extend typical causal assumptions to include the presence of network interference. Let $y_{ν i} (z_{ν i}, z_{N_{ν i}}, z_{N_{ν - i}})$ denote the potential outcome for individual $i$ in subgraph $ν$ if they (and their neighbours and non-neighbours) receive the treatment assignment $(z_{ν i}, z_{N_{ν i}}, z_{N_{ν - i}})$ . While in principle this allows the potential outcome to depend on the entire treatment assignment in the network, we will soon restrict the number of potential outcomes using the known individual connections and assuming an exposure mapping; see Assumption 2 below.^19,22 We first make the consistency assumption, which states that there cannot be multiple versions of a treatment.

Assumption 1 (Consistency)

For $ν \in {1, \dots, m}$ , $i \in N_{ν}$ , $Y_{ν i} = y_{ν i} (Z_{ν i}, Z_{N_{ν i}}, Z_{N_{ν - i}})$ .

Assumption 2 (Neighbourhood interference)

For $ν \in {1, \dots, m}$ , $i \in N_{ν}$ , $\forall z_{N_{ν - i}}$ , $z_{N_{ν - i}}^{'}$ , and $\forall z_{N_{ν i}}$ , $z_{N_{ν i}}^{'}$ such that $ϕ (z_{N_{ν i}}) = ϕ (z_{N_{ν i}}^{'})$ given a function $ϕ : {0, 1}^{d_{ν} i} \to Φ_{ν i}$ , then, $y_{ν i} (z_{ν i}, z_{N_{ν i}}, z_{N_{ν - i}}) = y_{ν i} (z_{ν i}, z_{N_{ν i}}, z_{N_{ν - i}}^{'})$

This assumption is akin to Assumption 2 from Forastiere et al.,¹⁹ which is that the outcome for an individual depends on the exposures of other individuals only through a function of the neighbourhood treatment vector. For simplicity, in what follows we assume that the function $ϕ$ corresponds to the total exposure value among the neighbours, that is, $ϕ : {0, 1}^{d_{ν i}} \to Φ_{ν i}$ defined by $ϕ (z_{N_{ν i}}) = \sum_{j \in N_{ν i}} z_{ν j}$ , where $Φ_{ν i} \in {0, 1, \dots, d_{ν i}}$ . Because of Assumption 2, we can drop the dependence of $y_{ν i} (\cdot)$ on the nodes in $N_{ν - i}$ and write more simply $y_{ν i} (z_{ν i}, z_{N_{ν i}})$ .

In the presence of unmeasured contextual confounding, the covariates $X_{ν i}$ are no longer a sufficient conditioning set for unconfoundedness of the treatment assignment, that is, $y_{ν i} (z_{ν i}, z_{N_{ν i}})$ $Z_{ν i}, Z_{N_{ν i}} | X_{ν i} .$ We assume, however, that exchangeability holds conditional on measured covariates and the subgraph-level value of the unmeasured covariate. For a given subgraph $ν$ , we represent these subgraph-level unmeasured covariates as $b_{ν} = (b_{ν}^{y}, b_{ν}^{z})$ , where $b_{ν}^{y}$ is an unmeasured covariate pertaining to the outcome data generating mechanism, while $b_{ν}^{z}$ informs the treatment assignment mechanism. The following assumption states that, for node $i$ in subgraph $ν$ , potential outcomes and treatment assignments are independent conditional on the value of the measured covariate vector, $X_{ν i}$ , and the unmeasured confounder $b_{ν}^{y}$ .

Assumption 3 (Latent ignorability)

There exists unmeasured subgraph-level covariates $b^{y} = (b_{1}^{y}, b_{2}^{y}, \dots, b_{m}^{y})$ , each of dimension $q_{1}$ , such that for all $ν \in {1, \dots, m}$ , $i \in N_{ν}$ , $z_{ν i} \in {0, 1}$ , $z_{N_{ν i}} \in {0, 1}^{d_{ν i}},$ $y_{i} (z_{ν i}, z_{N_{ν i}}) ⊥ ⊥ Z_{ν i}, Z_{N_{ν i}} | X_{ν i}, b_{ν}^{y}$

Although inference in this work relies primarily upon the outcome model, we must also ensure appropriate covariate overlap, which tends to reduce the sensitivity of causal estimates to correct model specification²³ and further ensures that the estimated treatment effect is not extrapolated to subregions of the covariate space for which there is little or no information. As the propensity score is central to the overlap or positivity assumption, we define the joint propensity score as $e (z_{ν i}, z_{N_{ν} i} | X_{ν i}, X_{N_{ν i}}, b_{ν}^{z}) = P (Z_{ν i} = z_{ν i}, Z_{N_{ν i}} = z_{N_{ν i}} | X_{ν i}, X_{N_{ν i}}, b_{ν}^{z})$ , which corresponds to the probability of observing the joint treatment $(z_{ν i}, z_{N_{ν i}})$ conditional on measured individual and neighbourhood covariates and the unmeasured subgraph-level covariate $b_{ν}^{z}$ . Thus, we further require:

Assumption 4 (Positivity)

For $ν \in {1, \dots, m}, i \in N$ and for all $z_{ν i}, X_{ν i}, X_{N_{ν i}}$ , $b_{ν}^{z}$ , $e (z_{ν i}, z_{N_{ν i}} | X_{ν i}, X_{N_{ν i}}, b_{ν}^{z}) > 0$

4. Estimands

To define the causal estimands of interest, we adopt the Bernoulli exposure allocation standardisation.^18,24 Let $π (Σ z_{N_{ν i}}; α) = (\binom{d_{ν i}}{Σ z_{N_{ν i}}}) α^{Σ z_{N_{ν i}}} (1 - α)^{d_{ν i} - Σ z_{N_{ν i}}}$ denote the probability of node $i$ in subgraph $ν$ receiving neighbourhood treatment $Σ z_{N_{i}}$ under Bernoulli allocation strategy or treatment coverage $α$ , which corresponds to the counterfactual probability of directly receiving treatment. In this counterfactual scenario, potential outcomes are standardised with respect to a population in which all individuals in a neighbourhood are exposed to treatment independently with the same probability $α$ , which corresponds to a type B parameterisation.²⁵ While independent exposure selection among neighbours seems plausible in the context of maternal education in Add Health, note that generalisations of the type B estimands have been proposed to allow for within-cluster dependence in treatment uptake in contexts where it would be warranted, such as those of vaccination and air pollution.^26,27 Define the $i$ -th individual’s average potential outcome given individual exposure $z$ and allocation strategy $α$ conditional on covariate value $x$ as $μ_{z α}^{ν i} (x) = E [{\bar{y}}_{ν i} (z; α) | X_{ν i} = x] = E [\sum_{Σ z_{N_{ν i}} = 0}^{d_{ν i}} y_{ν i} (z, Σ z_{N_{ν i}}) π (Σ z_{N_{ν i}}; α) | X_{ν i} = x]$ This second equality highlights the dependence of the potential outcomes on the individual covariates $x$ , that is, $y_{ν i} (z, Σ z_{N_{ν i}}) \equiv y_{ν i} (z, Σ z_{N_{ν i}}, x_{ν i})$ , which will be made more explicit when specifying the generating model for the potential outcomes in Section 5. The average potential outcome in subgraph $ν$ corresponds to the average of $μ_{ν i}^{z α} (x)$ over the empirical subgraph covariate distribution ${\hat{F}}_{X}^{ν}$ , that is, $μ_{z α}^{ν} = E [μ_{z α}^{ν i} (x)] = \int μ_{z α}^{ν i} (x) {\hat{F}}_{X}^{ν} (x) d x = N_{ν}^{- 1} \sum_{i \in N_{ν}} μ_{z α}^{ν i} (x_{ν i})$ . The population average potential outcome is defined as $μ_{z α} = E [μ_{z α}^{ν}] = \int μ_{z α}^{ν} \hat{H} (ν) d ν$ , where $\hat{H}$ represents the empirical distribution of the subgraph-level average potential outcomes. This assigns equal weights to every subgraph-level potential outcome, such that $μ_{z α} = m^{- 1} \sum_{ν = 1}^{m} μ_{z α}^{ν}$ . Note that $μ_{z α}^{ν}$ and $μ_{z α}$ , respectively, correspond to subgraph-level and population-level average potential outcomes; Li et al.²³ refer to these as mixed average potential outcomes to emphasise the fact that these models condition on the observed data (as is typical in the Bayesian framework) though we do marginalise over the empirical distribution of the observed covariates $X$ , as can be seen in the expression for $μ_{z α}^{ν}$ . In this work, our primary interest lies in the inference for $μ_{z α}$ and marginal causal contrasts, rather than subgraph-specific estimands. We can also define the population-level marginal average potential outcome, which takes an additional expectation over the individual treatment allocation $Z : μ_{α} = α μ_{1 α} + (1 - α) μ_{0 α}$ .

We consider various contrasts of the average potential outcomes to define the causal estimands.^18,20,24,28 The direct exposure effect is defined as the difference between the average potential outcomes of untreated and treated individuals for a fixed allocation strategy $α$ , that is, $DE (α) = μ_{1 α} - μ_{0 α}$ . The spillover effect, also referred to as the indirect effect, is defined as the difference between the average potential outcomes under counterfactual scenarios $α$ and $α^{'}$ among the untreated, that is, $IE (α, α^{'}) = μ_{0 α} - μ_{0 α^{'}}$ . Note that it would be possible to define a spillover effect for the treated individuals as well. For allocation strategies $α$ and $α^{'}$ , we also define the total effect $TE (α, α^{'}) = μ_{1 α} - μ_{0 α^{'}}$ as well as the overall effect $OE (α, α^{'}) = μ_{α} - μ_{α^{'}}$ . As the name suggests, $TE (α, α^{'})$ can be interpreted as the total effect of treatment, since $TE (α, α^{'}) = DE (α) + IE (α, α^{'})$ , whereas the overall effect corresponds to the difference in average potential outcomes under one allocation strategy compared to another allocation strategy.

A summary of the different causal estimands presented along with the distributions being averaged over can be found in Table S1 in Supplemental Appendix A.

5. Proposed JMM

The proposed joint model allows for the possibility of unmeasured contextual confounding by assuming that the latent factors influencing the study outcome and the exposure selection in the social network are correlated, as illustrated in the directed acyclic graph (DAG) in Figure 2.

Figure 2.

Directed acyclic graph (DAG) for the joint linear mixed-effects model displaying interference and unmeasured heterogeneity in both treatment assignment and outcome mechanisms within a subgraph. Baseline characteristics $X_{ν i}$ and $X_{ν j}$ are excluded from the figure for simplicity.

5.1. Outcome model

We assume that the potential outcomes arise according to $\begin{aligned} y_{ν i} (z_{ν i}, Σ z_{N_{ν i}}) & = β_{Z} z_{ν i} + h_{1} (Σ z_{N_{ν i}}; β_{Z_{N}}) + h_{2} (z_{ν i} Σ z_{N_{ν i}}; β_{Z \times Z_{N}}) \\ + h_{3} (X_{ν i}^{y}; β_{X}) + W_{ν i}^{⊤} b_{ν}^{y} + δ_{ν i} \end{aligned}$ (1)where $h_{1}, h_{2}$ are known functions, $X_{ν i}^{y}$ is a vector of nodal covariates that are contained in $X_{ν i}$ , $W_{ν i}$ is a vector of random effect covariates, and $δ_{ν i}$ is the random error of node $i$ in subgraph $ν$ . We further assume that $δ = (δ_{1}, δ_{2}, \dots, δ_{m}) \sim N (0, Φ)$ , where $δ_{ν} = (δ_{ν 1}, \dots, δ_{ν N_{ν}})$ , $ν = 1, 2, \dots, m$ . In related work, the function $h_{1}$ is often assumed to be linear,^18,19 though non-linear functions such as spline representations could easily be accommodated. Similarly, $h_{2}$ and $h_{3}$ could either be linear or non-linear. According to this model, $β_{Z}$ and (potentially vector-valued) $β_{Z_{N}}$ are tied to the direct and spillover effects of the exposure, respectively. The parameter $β_{Z \times Z_{N}}$ captures the interaction between $z$ and $Σ z_{N_{ν i}}$ and modifies the direct and interference effects with the level at which we fix the individual or neighbourhood exposure. In the absence of interactions between the exposure variables and the covariates/random effects, the causal effects do not vary as a function of characteristics of the node or the subgraph.

We assume a simultaneous autoregressive (SAR) model for the vector of random errors $δ$ , such that if $U = {I_{N} - τ A}^{- 1}$ exists, the covariance matrix of $δ$ is given by Cressie²⁹ $Φ = σ_{ε}^{2} U U^{⊤}$ (2)

Due to its dependence on the adjacency matrix $A$ , the SAR covariance matrix is such that the potential outcomes of neighbouring individuals are more correlated than those of non-neighbours.³⁰ According to this model, the parameter $τ$ controls the strength of the interaction between neighbours.³¹ A simplifying assumption we make in simulations is that the subgraphs $G_{ν}$ are disjoint, such that the sociomatrix $A$ associated with $G$ is a block-diagonal matrix with $m$ blocks denoted $A_{ν}$ , $ν = 1, \dots m$ , each corresponding to a subgraph $G_{ν}$ . This implies that $Φ$ is also block-diagonal with blocks $Φ_{ν} = σ_{ε}^{2} U_{ν} U_{ν}^{⊤}, ν = 1, \dots, m$ , assuming that $U_{ν} = {I_{N_{ν}} - τ A_{ν}}^{- 1}$ exists for all $ν$ . If this assumption is not met, the computation might be infeasible in the presence of a large graph order $N$ due to difficulties with large-scale matrix inversion.

5.2. Exposure model

We posit a Bernoulli distribution for the individual assigned treatment, such that $Z_{ν i} | p_{ν i} \sim Bernoulli (p_{ν i})$ with $logit (p_{ν i}) = f (X_{ν i}^{z}; γ) + U_{ν i}^{⊤} b_{ν}^{z}$ (3)where $p_{ν i} = P (Z_{ν i} = 1 | X_{ν i}^{z}, b_{ν}^{z})$ , $f$ is a known function, and $U_{ν i}$ is a vector of random effect covariates. Note that the vector of covariates $X_{ν i}^{z}$ in the exposure selection model need not be identical to $X_{ν i}^{y}$ . In a standard causal setting, it is typically preferable to avoid including instruments in the treatment model, notably when implementing inverse weighting approaches, since it can lead to extreme weights and decreased precision in the estimation of causal effects.³² However, since we do not assume conditional exchangeability, a correctly specified model that follows the data-generating procedure (accounting for more than simply the confounders) for the treatment assignment mechanism is required for unbiased inference. Thus, $X_{ν i}^{z}$ is not necessarily a subset of $X_{ν i}^{y}$ in the proposed joint modelling methodology.

In what follows, similar to Lee et al.¹⁸ and McNealis et al.,²⁰ we make the simplifying assumption that conditional on the cluster-level random effect $b_{ν}^{z}$ , for $ν = 1, \dots, m$ , $i = 1, \dots, N_{ν}$ , the joint propensity score can be expressed as $\begin{aligned} e (z_{ν i}, z_{N_{ν i}} | X_{ν i}, X_{N_{ν i}}, b_{ν}^{z}) & = P (Z_{ν i} = z_{ν i}, Z_{N_{ν i}} = z_{N_{ν i}} | X_{ν i}, X_{N_{ν i}}, b_{ν}^{z}) \\ = \prod_{j \in {i, N_{i}}} P (Z_{ν j} = z_{ν j} | X_{ν j}, b_{ν}^{z}) \end{aligned}$ When describing the full data likelihood in Section 5.4, this assumption allows us to express the likelihood for the exposure model parameters as the product of individual propensity scores.

5.3. Random effect distribution

The subgraph-level random effects of the above models, $b_{ν} = (b_{ν}^{y}, b_{ν}^{z})$ , are a $q_{1} + q_{2}$ dimensional vector that captures the unobserved dependence among observations within each subgraph. We denote $b = (b_{1}, \dots, b_{m})$ the collection of subgraph-level random effects across the whole graph $G$ . We assume that the random effects follow a multivariate normal distribution $b_{ν} \sim N_{q_{1} + q_{2}} (0, Σ)$ , where $Σ = (\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix})$ (4)

The correlations of the random effects in the two models represent unobserved dependence among the outcome and the exposure assignment mechanisms. Groups defined by different treatment assignments are not, in the presence of such latent effects, conditionally exchangeable given observed covariates. This means that merely incorporating random effects in the outcome regression and ignoring the treatment assignment will not lead to valid inference of causal effects. Although based on Figure 2 and Assumption 3, conditioning on $b^{y}$ would appear to block confounding, these are unobserved covariates and their posterior distribution may only be accurately estimated by accounting for the dependence between the outcomes and treatment assignments, the latter being informative for the imputation of missing potential outcomes.⁸ In other words, latent ignorability conditional on $b^{y}$ can only be achieved if $b^{y}$ is modelled in conjunction with $b^{z}$ in a joint inference procedure which accounts for the dependence in the outcome and treatment-specific random effects. A model that estimates the random effects in the outcome without accounting for their correlation with the random effects of the treatment model will be poorly estimated, and this measurement error in the latent variables may lead to bias in the treatment effect estimators.

The use of a latent Gaussian structure to relate the outcome and exposure assignment mechanisms is similar to the joint modelling strategies used by Tseng et al.³³ in the context of non-ignorable missing data and Xu et al.¹³ in the context of non-ignorable exposure assignment in longitudinal data. A different strategy, which is arguably more restrictive and more challenging to fit in comparison, is the use of a shared parameter model as the working model, where the outcome and the model for the exposure share common random effects. Examples of their use include Follmann and Wu⁸ in the context of non-ignorable missingness and Shardell and Ferrucci¹⁰ in the context of non-ignorable treatment assignment.

5.4. Bayesian inference

With interference, only one of multiple possible potential outcomes is observed for each node, with the number of potential outcomes for one unit depending on its degree by virtue of Assumption 2. Under the Bayesian paradigm, unobserved potential outcomes are viewed as missing data, and inference about causal estimands is obtained from their posterior distributions.^7,11,23,34 Let $y (\cdot) = {y_{i} (\cdot) for all i \in N}$ denote the collection of potential outcomes for all nodes in $G$ and possible individual and neighbourhood exposures. We can then define the partition $y (\cdot) = (Y, \tilde{y})$ , where $Y = (Y_{1}, Y_{2}, \dots, Y_{m})$ is the collection of observed outcomes across all the subgraphs and $\tilde{y}$ is the collection of missing potential outcomes for all graph units. We denote the collection of covariate vectors for all the nodes in $G$ as $X$ . As we do not assign a probability distribution for the multi-dimensional $X$ , we consider a conditional likelihood in what follows. Following the specifications discussed in the previous section, the joint distribution of $(y (\cdot), Z)$ conditional on $X$ , denoted $p (y (\cdot), Z | X, θ)$ , is governed by the parameter vector $θ = (β, σ_{ε}^{2}, τ, γ, Σ)$ . Further, we can distinguish the parameters of the outcome model, $θ_{Y} = (β, σ_{ε}^{2}, τ, Σ)$ , and those of the exposure model, $θ_{Z} = (γ, Σ)$ , where $Σ$ is shared between the two models. Bayesian causal inference proceeds by specifying a prior $p (θ)$ and imputing missing potential outcomes from the posterior predictive distribution $\begin{aligned} p (\tilde{y} | O) & = \int_{θ} p (\tilde{y} | θ, O) p (θ | O) d θ \\ \propto \int_{θ} [\int_{b_{ν}} p (\tilde{y} | b, θ, O) p (b_{ν} | θ, O) d b] L (θ | O) p (θ) d θ \end{aligned}$ where $p (\tilde{y} | θ, O)$ is the sampling distribution of the missing potential outcomes given the parameters and the observed data, $p (θ | O)$ is the posterior distribution of $θ$ , $L (θ | O) = p (O | θ)$ is the observed data likelihood, and $O = (Y, Z, X)$ is the observed data. Since our interest lies in marginal causal estimands, we perform inference from the marginal posterior predictive distribution of the potential outcome with respect to the random effects, which is highlighted by the proportionality statement.

Typically, the central component in Bayesian inference for causal effects is the specification of the outcome model.²³ In instances where the propensity score model is ignorable, unbiased inference can be obtained about causal inference without having to specify a model for the propensity of treatment. However, the presence of dependent latent effects among the outcome and exposure models threatens the inference, since the exposure assignment mechanism is informative for the imputation of missing potential outcomes. To see this, we turn to the observed data likelihood and write $\begin{aligned} L (θ | O) & = \prod_{ν = 1}^{m} \int_{b_{ν}} p (Y_{ν}, Z_{ν} | X_{ν}, b_{ν}, θ) p (b_{ν} | θ) d b_{ν} \\ = \prod_{ν}^{m} \int_{b_{ν}} \prod_{i = 1}^{N_{ν}} p (Y_{ν i} | Z_{ν i} = z_{ν_{i}}, Σ Z_{N_{ν i}} = Σ z_{N_{ν i}}, X_{ν i}^{y}, b_{ν}^{y}, θ) p (Z_{ν i} | X_{ν i}^{z}, b_{ν}^{z}, θ) p (b_{ν} | θ) d b_{ν} \\ = \prod_{ν}^{m} \int_{b_{ν}} \prod_{i = 1}^{N_{ν}} p (y_{ν i} (z_{ν_{i}}, Σ z_{N_{ν i}}) | X_{ν i}^{y}, b_{ν}^{y}, θ_{Y}) p (Z_{ν i} | X_{ν i}^{z}, b_{ν}^{z}, θ_{Z}) p (b_{ν} | Σ) d b_{ν} \end{aligned}$ where the third equality follows from counterfactual consistency, latent exchangeability, and positivity. This means that having access to the latent confounders $b$ would render the treatment assignment ignorable. However, since $b$ is unobserved and yet a component of the treatment assignment mechanism through $Σ$ , the treatment assignment is informative for the imputation of missing potential outcomes and needs to be accounted for in the Bayesian procedure.^11,13,35,36 Hence, in the presence of unobserved contextual confounding, the propensity score model is not ignorable in the inference for $μ_{z α}$ , $μ_{α}$ and functions of these quantities.

For node $i$ in subgraph $ν$ , ${\tilde{y}}_{ν i} (z_{ν i}, Σ z_{N_{ν i}})$ acts like a missing variable that can be imputed after setting exposure to some value $(z_{ν i}, Σ z_{N_{ν i}})$ and sampling from the marginal sampling distribution for the replicated potential outcomes given parameters, assigned treatments and observed outcomes $p ({\tilde{y}}_{ν i} (z_{ν i}, Σ z_{N_{ν i}}) | θ, O) = \int_{b_{ν}} p ({\tilde{y}}_{ν i} | Z_{ν i} = z_{ν i}, Σ Z_{N_{ν i}} = Σ z_{N_{ν i}}, b_{ν}, θ, O) p (b_{ν} | θ, O) d b$ where the equality holds under latent ignorability because of the inclusion of the propensity score model in the posterior inference for $θ$ .

5.5. Prior specification

We adopt a combination of vague and weakly informative priors for model parameters. These parameters are the coefficients of the outcome model in (1), the coefficients of the exposure model in (3), the residual variance and the autocorrelation parameters of the SAR covariance matrix in (2), and the parameters of the random effect covariance matrix in (4).

For the coefficients of the outcome and exposure models, $β$ and $γ$ , we adopt vague independent $N (0, σ_{prior}^{2})$ prior distributions with $σ_{prior}^{2} = 100$ . We specify $τ \sim Uniform [- 1, 1]$ and $σ_{ε} \sim Half-Cauchy (α_{prior})$ with the recommended hyperparameter of $α_{prior} = 25$ , which results in a reasonably flat prior distribution.^37,38

For the covariance matrix of the random effects, we propose using the decomposition $Σ = S R S$ , where $S$ is a $(q_{1} + q_{2}) \times (q_{1} + q_{2})$ diagonal matrix with the standard deviations of the random effects as its diagonal elements and $R$ is a correlation matrix, and assigning the prior $R \sim LKJ (η_{prior})$ with $η_{prior} = 1$ , where LKJ refers to the Lewandowski–Kurowicka–Joe distribution.³⁹ This choice of prior assigns equal density to all correlation matrices of order $q_{1} + q_{2}$ . Finally, we assign independent $Half-Cauchy (α_{prior})$ priors to the diagonal elements of $S$ . In the simulations and application that follow, we consider the joint random intercept model, that is, $b_{ν} = (b_{ν}^{y}, b_{ν}^{z})$ , such that $S = (\begin{matrix} σ_{b^{y}} & 0 \\ 0 & σ_{b^{z}} \end{matrix}), R = (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})$ In that case, the LKJ prior reduces to assigning a $Uniform [- 1, 1]$ prior to $ρ$ .

The resultant posterior distributions are approximated through computational approaches. We employ Markov chain Monte Carlo methods.

5.6. Bayesian standardisation

In this section, we describe a procedure to recover the posterior distributions of $μ_{z α}$ , $μ_{α}$ , and relevant contrasts from the posterior distribution of $θ$ . In a point treatment setting, the Bayesian g-formula or standardisation typically entails marginalising the individual average potential outcome over the distribution of covariates, which requires obtaining the posterior distributions of $θ$ and $θ_{X}$ . However, recall that in this work, we do not model the multi-dimensional covariates $X$ and instead rely upon mixed average causal effects, which are reasonable approximations to population average causal effects.²³

Samples from the posterior distributions of $μ_{z α}$ , $μ_{α}$ , and relevant contrasts are obtained by marginalising the posterior distribution of $μ_{z α} (x) \equiv μ_{z α} (x; θ)$ over the empirical distribution of $X$ and the subgraph distribution, where the notation highlights the dependence on the outcome model parameter. The standardisation procedure essentially consists of computing the following integral: ${\tilde{μ}}_{z α} = \int_{ν} \int_{X} μ_{z α}^{ν i} (x; θ_{Y}) {\hat{F}}_{X}^{ν} (x) \hat{H} (ν) d x d ν = m^{- 1} \sum_{ν = 1}^{m} N_{ν}^{- 1} \sum_{i = 1}^{N_{ν}} {\tilde{μ}}_{z α}^{ν i} (x_{ν i}; θ)$ where ${\tilde{μ}}_{z α}^{ν i} (x_{ν i}; θ) = \sum_{Σ z_{N_{ν i}} = 0}^{d_{ν i}} {\tilde{y}}_{ν i} (z, Σ z_{N_{ν i}}) π (Σ z_{N_{ν i}}; α)$ is the imputed individual average potential outcome and ${\tilde{y}}_{ν i} (z, Σ z_{N_{ν i}}) \equiv {\tilde{y}}_{ν i} (z, Σ z_{N_{ν i}}, x_{ν i})$ is a random draw from the marginal sampling distribution for the replicated potential outcomes given individual treatment $z$ , neighbourhood treatment $Σ z_{N_{ν i}}$ , and covariate vector $x_{ν i}$ . We use Monte Carlo integration to obtain samples from the marginal distribution for the potential outcomes with respect to the random effects. The Bayesian standardisation procedure is summarised as follows:

Set an exposure value $z$ and an allocation strategy $α$ .

For each posterior draw $θ^{(s)}$ , $s = 1, \dots, S$ , perform the following steps:

For each subgraph $ν \in {1, \dots m}$ , sample a random effect vector ${\tilde{b}}_{ν}^{b (s)} \sim N (0, Σ^{(s)})$ .

For each subgraph $ν \in {1, \dots m}$ , each node $i \in {1, \dots, N_{ν}}$ , and each neighbourhood exposure value $Σ z_{N_{ν i}} \in {0, \dots, d_{ν i}}$ , draw the corresponding missing potential outcome from the conditional posterior predictive distribution $\begin{aligned} {\tilde{y}}_{ν i}^{b (s)} (z, Σ z_{N_{ν i}}) & = β_{Z}^{(s)} z + h_{1} (Σ z_{N_{ν i}}; β_{Z_{N}}^{(s)}) + h_{2} (z Σ z_{N_{ν i}}; β_{Z \times Z_{N}}^{(s)}) \\ + h_{3} (x_{ν i}; β_{X}^{(s)}) + W_{ν i}^{⊤} {\tilde{b}}_{ν}^{y, b (s)} + δ_{ν i}^{(s)} \end{aligned}$

Repeat steps (a) and (b) $B$ times for each exposure value $(z, Σ z_{N_{ν i}})$ to obtain a draw from the sampling distribution of the missing potential outcomes conditional on $θ$ (marginalised with respect to $b$ ) for all nodes and all subgraphs using ${\tilde{y}}_{ν i}^{(s)} (z, Σ z_{N_{ν i}}) = \frac{1}{B} \sum_{b = 1}^{B} {\tilde{y}}_{ν i}^{b (s)} (z, Σ z_{N_{ν i}})$

Using the imputed potential outcomes in Step 2, for each posterior draw $s = 1, \dots, S$ , subgraph $ν \in {1, \dots, m}$ , and node $i \in {1, \dots, N_{ν}}$ , obtain the posterior individual average potential outcome as ${\tilde{μ}}_{z α}^{ν i (s)} (x_{ν i}) = \sum_{Σ z_{N_{ν i}} = 0}^{d_{ν i}} {\tilde{y}}_{ν i}^{(s)} (z, Σ z_{N_{ν i}}) π (Σ z_{N_{ν i}}; α)$

Finally, obtain the posterior population average potential outcome for treatment $z$ and coverage $α$ as ${\tilde{μ}}_{z α}^{(s)} = m^{- 1} \sum_{ν = 1}^{m} N_{ν}^{- 1} \sum_{i = 1}^{N_{ν}} {\tilde{μ}}_{z α}^{ν i (s)} (x_{ν i}),$ as well as the posterior population marginal average potential outcome, ${\tilde{μ}}_{α}^{(s)} = α {\tilde{μ}}_{1 α}^{(s)} + (1 - α) {\tilde{μ}}_{0 α}^{(s)}$ .

Approximations of the posterior distributions for relevant causal contrasts can be obtained by applying relevant functions (

DE (α)

IE (α, α^{'})

TE (α, α^{'})

OE (α, α^{'})

) to each posterior draw

{\tilde{μ}}_{z α}^{(s)}

and

{\tilde{μ}}_{α}^{(s)}

s = 1, \dots, S

6. Simulations

We present simulation results under the joint random intercept model, where $b_{ν} = (b_{ν}^{y}, b_{ν}^{z})$ to assess frequentist properties of the proposed Bayesian estimator based on the JMM in finite samples compared to estimators based on LMM and FEM, respectively. The LMM approach incorporates a cluster-level random intercept in the outcome model, but ignores the treatment assignment mechanism completely in the sense that the propensity score component is dropped from the likelihood altogether, while the FEM approach also drops treats the outcome data as unclustered and thus does not account for within-cluster dependence. Additionally, we compare the joint modelling approach to an augmented linear mixed-effects model approach that includes the joint propensity score as an additional covariate (LMM-augmented (LMM-aug)), which is described in more detail in Supplemental Appendix B. The default simulation is conducted according to the following steps with $m = 50$ disjoint subgraphs, such that $A$ is block-diagonal.

Generation of the network: For $ν \in {1, \dots, 30}$ , we sampled $N_{ν}$ from the distribution $Poisson (35)$ , and for $ν \in {31, \dots, 50}$ , we sampled $N_{ν}$ from the distribution $Poisson (12)$ . For each subgraph, we generated the random vector $H_{ν}$ of size $N_{ν},$ whose elements are from a $Bernoulli (0.5)$ distribution. The $m$ subgraphs were generated according to an exponential-family random graph model in which matching $H$ values increase the probability of a tie between two nodes, inducing network homophily. The final network $G$ was taken as the union of the $m$ components.

For $ν = 1, \dots, m$ , we sampled $(b_{ν}^{y}, b_{ν}^{z})$ from $N_{2} (0, Σ)$ , with $Σ = (\begin{matrix} σ_{b^{y}}^{2} & σ_{b^{y}} σ_{b^{z}} ρ \\ σ_{b^{y}} σ_{b^{z}} ρ & σ_{b^{z}}^{2} \end{matrix})$ and sampled the SAR errors $δ_{ν}$ from $N_{N_{ν}} (0, σ_{ε}^{2} U_{ν} U_{ν}^{⊤})$ .

For $ν = 1, \dots, m, i = 1, \dots, N_{ν}$ , two baseline covariates were generated as $X_{1 ν i} \sim N (0, 1)$ and $X_{2 ν i} \sim Bernoulli (0.5)$ . Define the proportion of treated nodes in a neighbourhood as $ψ (z_{N_{ν i}}) = Σ z_{N_{ν i}} / d_{ν i}$ . Potential outcomes for all $z_{ν i} \in {0, 1}$ , $Σ z_{N_{ν i}} \in {0, 1, \dots, d_{ν i}}$ were given by $y_{ν i} (z_{ν i}, z_{N_{ν i}}) = 2 + 2 z_{ν i} + ψ (z_{N_{ν i}}) + z_{ν i} ψ (z_{N_{ν i}}) - 1.5 | X_{1 i} | + 2 X_{2 i} - 3 | X_{1 i} | X_{2 i} + b_{ν}^{y} + δ_{ν i}$ where $δ_{ν i}$ is the $i$ -th entry of the SAR errors vector $δ_{ν}$ .

The treatment was generated as $Z_{ν i} = Bernoulli (p_{ν i})$ , where $p_{ν i} = {logit}^{- 1} [0.1 + 0.2 | X_{1 i} | + 0.2 X_{2 i} | X_{1 i} | - H_{ν i} + b_{ν}^{z}]$ Under this model, nodes that share ties tended to have similar treatment assignments.

Based on the potential outcomes defined in Step 2, observed outcomes were set equal to $Y_{ν i} = y_{ν i} (Z_{ν i}, Σ Z_{ν i})$ .

In the standard simulation scenario, simulations were carried out 500 times by repeating steps 2–4 with $Var (b_{ν}^{y}) = Var (b_{ν}^{z}) = 1$ and a correlation coefficient $ρ = 0, 0.2, 0.5, 0.8$ between the outcome and exposure model random intercepts. For simplicity, we first set conditionally independent random errors for observations within each component with $τ = 0$ and $σ_{ε}^{2} = 1$ , such that the covariance matrix of the random error vector within a component was equal to $σ_{ε}^{2} U_{ν} U_{ν}^{⊤} = I_{N_{ν}}$ . The true parameters were obtained by averaging the potential outcomes that we generated in Step 2 over the simulated datasets. For each simulated dataset and each model (JMM, LMM, LMM-aug, and FEM), we fit four chains with 10,000 iterations used as a burn-in period, and thinning by 50 iterations. The posterior distributions of $μ_{z α}$ and $μ_{α}$ for $z = 0, 1$ and $α \in {0.1, 0.2, \dots, 0.9}$ as well as relevant causal contrasts were obtained from the JMM, LMM, LMM-aug, and FEM parameter posterior distributions, respectively, using the procedure detailed in Subsection 5.6 with $B = 50$ Monte Carlo samples. For all estimands of interest, the Bayesian estimator was defined as the posterior mean, and we calculated the posterior standard deviation and a 95% equi-tailed credible interval as measures of uncertainty.

Table 1 shows the relative bias (RB, %), the average of posterior standard deviations (ASDs), the empirical standard deviation (ESD) of Bayesian estimators, and the coverage of 95% credible intervals (empirical coverage probability (ECP, %) associated with JMM, LMM, LMM-aug, and FEM for the estimation of $μ_{0, 0.7}$ , $μ_{1, 0.7}$ , and $DE (0.7) = μ_{1, 0.7} - μ_{0, 0.7}$ . Our discussion of results focuses on the direct effect $DE (0.7)$ . In general, JMM provides valid inference in comparison to LMM, LMM-aug, or FEM, and as expected, the benefit associated with accounting for contextual confounding becomes clearer as the correlation $ρ$ increases. The relative bias associated with LMM or LMM-aug, albeit never very large, increases with $ρ$ , while the relative bias associated with JMM remains small. In general, the posterior standard deviations accurately estimate the variability of the regression estimators when either JMM, LMM, or LMM-aug is used, whereas the actual variability of the causal estimators is drastically underestimated when the multilevel structure of the data is not accounted for in FEM.

Table 1.
Finite sample properties of the Bayesian estimator based on correctly specified JMM, LMM, and FEM for 500 simulations.

$μ_{0, 0.7}$ ^a $μ_{1, 0.7}$ ^a $DE (0.7)$ ^a

$ρ$ Method RB ASD ESD ECP RB ASD ESD ECP RB ASD ESD ECP

0 Main scenario

JMM −0.32 0.16 0.17 92.8 −0.06 0.16 0.17 94.6 0.07 0.08 0.08 96.2

LMM −0.21 0.16 0.17 93.2 −0.02 0.16 0.17 94.6 0.08 0.08 0.08 96.6

LMM-aug −0.11 0.16 0.16 93.2 0.01 0.16 0.17 94.4 0.07 0.08 0.08 96.8

FEM −0.46 0.09 0.19 64.8 0.11 0.08 0.21 53.6 0.39 0.10 0.13 85.8

0.2 Main scenario

JMM −0.35 0.16 0.16 95.2 0.04 0.16 0.16 94.8 0.23 0.08 0.07 97.4

LMM −0.05 0.16 0.16 94.4 0.29 0.16 0.16 94.2 0.45 0.08 0.07 97.4

LMM-aug −0.08 0.16 0.16 95.0 0.28 0.16 0.16 94.4 0.45 0.08 0.07 97.8

FEM 1.83 0.09 0.19 62.6 2.93 0.08 0.21 49.6 3.46 0.10 0.13 77.8

SAR errors ( $τ = 0.1$ )

JMM −0.92 0.15 0.17 91.5 −0.28 0.16 0.17 91.0 0.03 0.08 0.08 95.5

LMM −0.89 0.15 0.17 91.5 −0.04 0.15 0.17 91.0 0.37 0.08 0.08 95.5

LMM-aug −0.94 0.15 0.17 91.5 −0.04 0.16 0.17 92.5 0.40 0.08 0.08 94.5

FEM −0.37 0.08 0.17 63.0 1.28 0.08 0.17 58.0 2.08 0.08 0.10 87.0

Lower sample size ( $m = 30$ )

JMM −0.29 0.22 0.22 94.4 −0.08 0.22 0.23 93.8 0.03 0.10 0.09 97.6

LMM 0.01 0.21 0.22 94.2 0.16 0.21 0.23 93.4 0.24 0.10 0.09 97.6

LMM-aug 0.18 0.22 0.22 93.6 0.19 0.21 0.23 92.4 0.19 0.10 0.09 97.6

FEM 1.58 0.11 0.25 58.4 2.87 0.10 0.27 49.6 3.49 0.12 0.16 81.0

Larger random effect variance ( $σ_{b^{y}}^{2} = σ_{b^{z}}^{2} = 4$ )

JMM −0.17 0.30 0.30 95.0 −0.09 0.30 0.30 94.8 −0.06 0.10 0.09 97.6

LMM 0.35 0.30 0.30 95.0 0.25 0.30 0.30 94.0 0.20 0.10 0.09 97.6

LMM-aug 0.18 0.30 0.30 95.2 0.16 0.30 0.30 94.6 0.15 0.10 0.09 97.0

FEM 2.50 0.15 0.39 53.0 6.27 0.11 0.38 35.0 8.07 0.16 0.29 60.8

No treatment effect^b

JMM −0.34 0.16 0.17 93.8 0.23 0.16 0.17 93.0 - 0.08 0.08 95.0

LMM −0.18 0.16 0.17 94.4 1.44 0.16 0.17 91.8 - 0.08 0.08 94.8

LMM-aug −0.01 0.16 0.17 93.6 1.59 0.16 0.17 92.0 - 0.08 0.08 94.6

FEM 2.78 0.09 0.19 64.8 19.71 0.08 0.20 47.4 - 0.10 0.14 70.6

Bivariate exponential random effect distribution

JMM −0.35 0.15 0.08 99.8 −0.05 0.15 0.09 100.0 0.10 0.09 0.08 97.8

LMM −0.64 0.15 0.08 99.6 0.09 0.15 0.09 100.0 0.43 0.09 0.08 97.6

LMM-aug −0.27 0.15 0.09 99.6 0.14 0.15 0.09 100.0 0.34 0.09 0.08 97.4

FEM −4.42 0.09 0.12 80.8 0.85 0.09 0.12 80.4 3.35 0.11 0.14 79.6

Truncated normal outcome distribution

JMM −0.20 0.15 0.15 96.2 −0.09 0.15 0.15 95.8 −0.03 0.08 0.07 96.4

LMM −0.09 0.15 0.15 95.6 0.12 0.15 0.15 96.0 0.22 0.08 0.07 96.6

LMM-aug 2.16 0.15 0.15 95.2 −0.24 0.15 0.15 96.2 −1.46 0.08 0.07 93.2

FEM 1.49 0.08 0.17 61.0 2.68 0.07 0.18 48.0 3.28 0.10 0.13 82.2

Misspecified outcome model

JMM 13.24 0.21 0.18 91.4 −4.50 0.21 0.19 88.4 −12.93 0.17 0.17 44.6

LMM 14.01 0.21 0.18 90.8 −3.48 0.20 0.19 92.0 −11.79 0.17 0.17 52.2

LMM-aug 14.07 0.21 0.18 91.6 −3.47 0.21 0.19 91.6 −11.80 0.17 0.17 53.4

FEM 15.89 0.15 0.20 67.8 −0.82 0.14 0.22 77.4 −8.76 0.17 0.19 71.0

0.5 Main scenario

JMM 0.13 0.16 0.17 92.8 0.10 0.16 0.17 93.4 0.09 0.08 0.08 96.0

LMM 0.41 0.16 0.17 93.0 0.61 0.16 0.16 93.2 0.71 0.08 0.08 95.0

LMM-aug 0.48 0.16 0.17 92.4 0.65 0.16 0.16 92.6 0.73 0.08 0.08 94.8

FEM 4.98 0.09 0.18 62.2 7.57 0.08 0.20 20.4 8.81 0.10 0.13 35.2

0.8 Main scenario

JMM −0.59 0.16 0.17 93.0 −0.08 0.16 0.17 93.2 0.16 0.08 0.08 95.4

LMM 0.11 0.16 0.16 93.0 0.83 0.16 0.16 92.6 1.17 0.08 0.08 93.8

LMM-aug 0.22 0.16 0.17 92.2 0.87 0.16 0.16 92.8 1.18 0.08 0.08 93.6

FEM 7.21 0.08 0.16 61.2 12.02 0.07 0.18 3.0 14.32 0.09 0.14 7.2

JMM: joint mixed-effects model; LMM: linear mixed model; LMM-aug: LMM-augmented; FEM: fixed effects model; RB: relative bias; ASD: average of posterior standard deviation; ESD: empirical standard deviation; ECP: empirical coverage probability.
^a
For the main simulation scenario, the true estimand values were $μ_{0, 0.7} = 1.27$ , $μ_{1, 0.7} = 3.93$ , and $DE (0.7) = 2.66$ .
^b
For this scenario, the small value for the true direct effect ( $DE (0.7) = 0.00051$ ) precluded the meaningful calculation of a relative bias. The average Bayesian estimates of $DE (0.7)$ were $0.00397$ , $0.010431$ , and $0.10381$ for JMM, LMM, and FEM, respectively.

		$μ_{0, 0.7}$ ^a	$μ_{1, 0.7}$ ^a	$DE (0.7)$ ^a
0	Main scenario
	JMM	−0.32	0.16	0.17	92.8	−0.06	0.16	0.17	94.6	0.07	0.08	0.08	96.2
	LMM	−0.21	0.16	0.17	93.2	−0.02	0.16	0.17	94.6	0.08	0.08	0.08	96.6
	LMM-aug	−0.11	0.16	0.16	93.2	0.01	0.16	0.17	94.4	0.07	0.08	0.08	96.8
	FEM	−0.46	0.09	0.19	64.8	0.11	0.08	0.21	53.6	0.39	0.10	0.13	85.8
0.2	Main scenario
	JMM	−0.35	0.16	0.16	95.2	0.04	0.16	0.16	94.8	0.23	0.08	0.07	97.4
	LMM	−0.05	0.16	0.16	94.4	0.29	0.16	0.16	94.2	0.45	0.08	0.07	97.4
	LMM-aug	−0.08	0.16	0.16	95.0	0.28	0.16	0.16	94.4	0.45	0.08	0.07	97.8
	FEM	1.83	0.09	0.19	62.6	2.93	0.08	0.21	49.6	3.46	0.10	0.13	77.8
	SAR errors ( $τ = 0.1$ )
	JMM	−0.92	0.15	0.17	91.5	−0.28	0.16	0.17	91.0	0.03	0.08	0.08	95.5
	LMM	−0.89	0.15	0.17	91.5	−0.04	0.15	0.17	91.0	0.37	0.08	0.08	95.5
	LMM-aug	−0.94	0.15	0.17	91.5	−0.04	0.16	0.17	92.5	0.40	0.08	0.08	94.5
	FEM	−0.37	0.08	0.17	63.0	1.28	0.08	0.17	58.0	2.08	0.08	0.10	87.0
	Lower sample size ( $m = 30$ )
	JMM	−0.29	0.22	0.22	94.4	−0.08	0.22	0.23	93.8	0.03	0.10	0.09	97.6
	LMM	0.01	0.21	0.22	94.2	0.16	0.21	0.23	93.4	0.24	0.10	0.09	97.6
	LMM-aug	0.18	0.22	0.22	93.6	0.19	0.21	0.23	92.4	0.19	0.10	0.09	97.6
	FEM	1.58	0.11	0.25	58.4	2.87	0.10	0.27	49.6	3.49	0.12	0.16	81.0
	Larger random effect variance ( $σ_{b^{y}}^{2} = σ_{b^{z}}^{2} = 4$ )
	JMM	−0.17	0.30	0.30	95.0	−0.09	0.30	0.30	94.8	−0.06	0.10	0.09	97.6
	LMM	0.35	0.30	0.30	95.0	0.25	0.30	0.30	94.0	0.20	0.10	0.09	97.6
	LMM-aug	0.18	0.30	0.30	95.2	0.16	0.30	0.30	94.6	0.15	0.10	0.09	97.0
	FEM	2.50	0.15	0.39	53.0	6.27	0.11	0.38	35.0	8.07	0.16	0.29	60.8
	No treatment effect^b
	JMM	−0.34	0.16	0.17	93.8	0.23	0.16	0.17	93.0	-	0.08	0.08	95.0
	LMM	−0.18	0.16	0.17	94.4	1.44	0.16	0.17	91.8	-	0.08	0.08	94.8
	LMM-aug	−0.01	0.16	0.17	93.6	1.59	0.16	0.17	92.0	-	0.08	0.08	94.6
	FEM	2.78	0.09	0.19	64.8	19.71	0.08	0.20	47.4	-	0.10	0.14	70.6
	Bivariate exponential random effect distribution
	JMM	−0.35	0.15	0.08	99.8	−0.05	0.15	0.09	100.0	0.10	0.09	0.08	97.8
	LMM	−0.64	0.15	0.08	99.6	0.09	0.15	0.09	100.0	0.43	0.09	0.08	97.6
	LMM-aug	−0.27	0.15	0.09	99.6	0.14	0.15	0.09	100.0	0.34	0.09	0.08	97.4
	FEM	−4.42	0.09	0.12	80.8	0.85	0.09	0.12	80.4	3.35	0.11	0.14	79.6
	Truncated normal outcome distribution
	JMM	−0.20	0.15	0.15	96.2	−0.09	0.15	0.15	95.8	−0.03	0.08	0.07	96.4
	LMM	−0.09	0.15	0.15	95.6	0.12	0.15	0.15	96.0	0.22	0.08	0.07	96.6
	LMM-aug	2.16	0.15	0.15	95.2	−0.24	0.15	0.15	96.2	−1.46	0.08	0.07	93.2
	FEM	1.49	0.08	0.17	61.0	2.68	0.07	0.18	48.0	3.28	0.10	0.13	82.2
	Misspecified outcome model
	JMM	13.24	0.21	0.18	91.4	−4.50	0.21	0.19	88.4	−12.93	0.17	0.17	44.6
	LMM	14.01	0.21	0.18	90.8	−3.48	0.20	0.19	92.0	−11.79	0.17	0.17	52.2
	LMM-aug	14.07	0.21	0.18	91.6	−3.47	0.21	0.19	91.6	−11.80	0.17	0.17	53.4
	FEM	15.89	0.15	0.20	67.8	−0.82	0.14	0.22	77.4	−8.76	0.17	0.19	71.0
0.5	Main scenario
	JMM	0.13	0.16	0.17	92.8	0.10	0.16	0.17	93.4	0.09	0.08	0.08	96.0
	LMM	0.41	0.16	0.17	93.0	0.61	0.16	0.16	93.2	0.71	0.08	0.08	95.0
	LMM-aug	0.48	0.16	0.17	92.4	0.65	0.16	0.16	92.6	0.73	0.08	0.08	94.8
	FEM	4.98	0.09	0.18	62.2	7.57	0.08	0.20	20.4	8.81	0.10	0.13	35.2
0.8	Main scenario
	JMM	−0.59	0.16	0.17	93.0	−0.08	0.16	0.17	93.2	0.16	0.08	0.08	95.4
	LMM	0.11	0.16	0.16	93.0	0.83	0.16	0.16	92.6	1.17	0.08	0.08	93.8
	LMM-aug	0.22	0.16	0.17	92.2	0.87	0.16	0.16	92.8	1.18	0.08	0.08	93.6
	FEM	7.21	0.08	0.16	61.2	12.02	0.07	0.18	3.0	14.32	0.09	0.14	7.2

Figure S1, which can be found in Supplemental Appendix C, displays the average of the 500 JMM, LMM, LMM-aug, and FEM estimators of the direct effect $DE (α)$ as well as the corresponding true dose–response curve, where each panel corresponds to one of the $ρ$ values considered. Figure S2 in Supplemental Appendix C displays the average of the 500 JMM, LMM, LMM-aug, and FEM estimators of the indirect effect $IE (α, α^{'})$ as well as the corresponding true surface. LMM, LMM-aug, and FEM estimators of the direct effect sustain a positive bias as soon as there is the presence of unmeasured contextual confounding ( $ρ > 0$ ), and this bias increases as the contextual confounding intensifies. The same conclusions pertain to the indirect effect, which takes the form of a surface as a function of $α$ and $α^{'}$ .

In addition to the simulation scenario above, we generated datasets under a variety of conditions with a fixed $ρ = 0.2$ . We considered the following seven scenarios for which results can be found in Table 1:

We assumed a SAR structure for the random errors with $τ = 0.1$ and modelled them as such across the different methods (JMM, LMM, LMM-aug, and FEM). Comparing this scenario to the standard scenario, JMM, LMM, and LMM-aug’s performance remain largely similar.

We reduced the number of subgraphs to $m = 30$ . For $ν \in {1, \dots, 20}$ , we sampled $N_{ν}$ from the distribution $Poisson (35)$ , and for $ν \in {21, \dots, 30}$ , we sampled $N_{ν}$ from the distribution $Poisson (12)$ . With fewer clusters, the ASD and ESD of the estimators show a slight increase.

We generated datasets using a larger random effect variance with $σ_{b^{y}}^{2} = σ_{b^{z}}^{2} = 4$ , which also caused the ASD and ESD of the estimators to increase. Note the worsened performance of FEM in this setting.

In an additional scenario, we generated data under no treatment effect with the following outcome model $y_{ν i} (z_{ν i}, z_{N_{ν i}}) = 2 - 1.5 | X_{1 i} | + 2 X_{2 i} - 3 | X_{1 i} | X_{2 i} + b_{ν}^{y} + δ_{ν i}$ However, we included terms for the treatment, the neighbourhood treatment, and the interaction at the modelling stage. The average Bayesian estimate of $DE (0.7)$ is closest to 0 with JMM, followed by LMM, LMM-aug, and FEM.

In another scenario, $(b_{ν}^{z}, b_{ν}^{y})$ were simulated from a bivariate exponential distribution with scale parameters $λ_{b^{y}} = λ_{b^{z}} = 1$ for the marginal distributions and a correlation of 0.2 using the R package MDBED.⁴⁰ The random effects were then centred to have a mean of 0. At the modelling stage, we used the regular method based on a multivariate normal likelihood to investigate the impact of misspecifying the random effect distribution. The performance of JMM, LMM, LMM-aug, and FEM appears to be largely unaffected by the misspecification of the random effect distribution for the set simulation parameters.

To mimic certain aspects of the Add Health study, we also generated the potential outcomes under a truncated normal distribution with a conditional mean $2 + 2 z_{ν i} + p (z_{N_{ν i}}) + z_{ν i} p (z_{N_{ν i}}) - 1.5 | X_{1 i} | + 2 X_{2 i} - 3 | X_{1 i} | X_{2 i} + b_{ν}^{y}$ standard deviation of 1, a lower truncation bound of $- 5$ , and an upper truncation bound of 9. Again, we obtain comparable results to the main simulation scenario under this simulation scheme.

To assess the sensitivity of the different methods to model misspecification, we analysed generated data under the standard simulation scenario using the following incorrect outcome model: $E [y_{ν i} (z_{ν i}, z_{N_{ν i}})] = β_{0} + β_{1} z_{ν i} + β_{2} p (z_{N_{ν i}}) + β_{3} z_{ν i} p (Z_{N_{i}}) + β_{4} X_{1 i} + β_{5} X_{2 i}$

As observed across a wide range of scenarios, JMM outperforms the competitor approaches of LMM, LMM-aug, and FEM whenever contextual confounding is present. However, none of the methods evaluated are robust to misspecification of the outcome model in the presence of unmeasured confounding.

Additionally, similar to Shardell and Ferrucci,¹⁰ we considered an additional scenario of shared random intercepts (i.e. $b_{ν}^{z} = b_{ν}^{y} = b_{ν} \forall ν$ ), where the outcome and treatment assignments were generated as $y_{ν i} (z_{ν i}, z_{N_{ν i}}) = 2 - 1.5 | X_{1 i} | + 2 X_{2 i} - 3 | X_{1 i} | X_{2 i} + λ b_{ν} + δ_{ν i}$ and $\begin{aligned} Z_{ν i} = Bernoulli ({logit}^{- 1} [0.1 + 0.2 | X_{1 i} | + 0.2 X_{2 i} | X_{1 i} | - H_{ν i} + ω b_{ν}]) \end{aligned}$ with $Var (b_{ν}) = ϕ_{b}$ . We considered values of $ϕ_{b} \in {1, 2, 3, 4}$ for the random effect variance in combination with the random effect coefficients $(λ, ω) \in {(1, 1), (3, 12)}$ . Results for $DE (0.1)$ and $DE (0.7)$ under this scenario can be found in Supplemental Tables S2 and S3, respectively. Since the random effects in the outcome and treatment models are identical in this scenario, we obtain the strongest form of contextual confounding and see more differences between JMM and the competitor approaches as a result. Again, JMM provides valid inference for $DE (0.1)$ and $DE (0.7)$ , while the other methods are generally associated with high relative variance and undercoverage. In particular, for $ϕ_{b} = 1$ and $(λ, ω) = (3, 12)$ , JMM has a relative bias of 0.24% and an empirical coverage probability of 94.0% for $DE (0.1)$ , while these values are 14.91% and 77.5% for LMM. Since LMM, LMM-aug, and FEM assume ignorability of the treatment assignment, their poor performance under shared random effect scenarios, where this assumption is violated, is expected. In contrast, JMM does not rely on strong ignorability and continues to yield valid inference even under strong unmeasured confounding in the form of shared random effects.

7. Evaluation of spillover effects of maternal education in Add Health

We apply the estimators proposed in Section 5 to estimate the causal effects of college maternal education on academic performance. We investigate this question while aiming to mitigate bias from potentially missing subgraph-level confounders. In the following, the outcome is a student’s GPA, and the exposure is defined as the binary indicator of whether their mother has completed four years of college. The students who did not report living with their biological mother, stepmother, foster mother, or adoptive mother at the time of the survey were excluded from the analytic sample. We also excluded isolates from the sample for computational reasons. There was partial actor non-response in the dataset, which occurs when some, but not all, attribute information is available for one or more nodes in the network.⁴¹ To remedy this, a single imputation of the nodal attributes was performed using imputation by chained equations with predictive mean matching and 20 cycles. The second column of Table S5 in Supplemental Appendix D provides descriptive statistics for the nodal attributes after the imputation. The network structure of the Add Health subsample of $m = 16$ schools had 29 connected components with 7,931 students and 26,384 shared connections (average degree was 6.65) after the exclusions. Among the 7,931 students, 2,873 (36.2%) had a college-educated mother. Figure S3 in Supplemental Appendix D displays a histogram of the observed proportion of friends with a college education in the analytical sample, which shows a right-skewed distribution. The following baseline covariates were included in the outcome model: sex, age, student’s race (as a proxy for the race of the mother), whether the student was adopted, whether the mother was born in the US, whether the father is at home, household size, whether the mother works for pay, student’s perception of how much their mother cares for them, club participation, motivation at school, trouble at school, sense of belonging, attendance, health status, physical fitness, and screen time. The model for the exposure only included four of the baseline covariates: the student’s race (as a proxy for the race of the mother), whether the mother was born in the US, whether the father is at home, and household size. Checks for positivity were performed by comparing the distribution of the individual propensity score for the exposure model, including the four potential confounders across the two groups defined by maternal education status. Figure S4 in Supplemental Appendix D shows reasonable covariate overlap across the two exposure groups.

We considered three analyses (JMM, LMM, and FEM), all of which adjust for the same baseline covariates enumerated above and include all variables linearly in the predictors of the outcome model (and of the propensity score model in the case of JMM). To conform with the observed support of the GPA distribution, we considered a truncated Gaussian likelihood for the outcome, with the lower and upper truncation bounds set to 1 and 4, respectively. The main analyses do not impose a SAR structure on the random errors, since the adjacency matrix in the Add Health subnetwork is not block-diagonal (there are ties between students from different schools) and its inversion would be prohibitively computationally expensive considering its large dimension ( $N = 7, 931$ ).

Estimates based on the JMM, LMM, and FEM approaches for $DE (α)$ , $IE (α, 0.3)$ , $TE (α, 0.3)$ and $OE (α, 0.3)$ along with 95% credible intervals are shown in Figure 3 and Table S6 in Supplemental Appendix D. In general, JMM, LMM, and FEM estimators give similar conclusions, in that the directions of the dose–response curves align. The point estimates suggest that academic performance is increased not only directly by having a college-educated mother, but also indirectly by having a greater proportion of friends with a college-educated mother. We provide an interpretation for $DE (0.2)$ . If every student in this network were to be “assigned” a college-educated mother versus a mother with a non-college education, we would expect the average GPA difference to be 0.0252 (95% CrI: [-0.0291, 0.0792]) points under a counterfactual probability of receiving treatment $α = 20$ %. Using the JMM method, we obtain an estimate of 0.0025 (95% CrI: [-0.0373, 0.0428]) for $IE (0.8, 0.3)$ . This implies that if we held the maternal education indicator to 0, the GPA of students with 80% of friends with college-educated mothers would be on average higher by 0.0025 points compared to a setting in which the maternal education indicator setting is again set to 0 with, however, 30% of friends having college-educated mothers, assuming both groups’ covariate distribution matched that of the overall population.

Figure 3.

JMM, LMM, and FEM estimates of $DE (α)$ , $IE (α, 0.3)$ , $TE (α, 0.3)$ , and $OE (α, 0.3)$ effects with 95% credible intervals. JMM: joint mixed-effects model; LMM: linear mixed model; FEM: fixed effects model.

Most credible intervals shown in Supplemental Table S6 contain zero, with the exception of $DE (0.4)$ and $TE (α, 0.3)$ for $α = 0.4, 0.6, 0.8$ when FEM is used. In general, effect estimates are closer to zero based on JMM and LMM compared to the FEM approach, which ignores the clustering in the graph. Effect estimates based on our approach (JMM) are also lower in magnitude than those based on LMM, which assumes ignorability of the exposure assignment. In the JMM approach, the posterior median for $ρ$ was $0.58$ (95% CrI: $[0.09, 0.84]$ ). These results suggest that there might exist unmeasured contextual confounding, and our approach partially mitigates the bias due to the unmeasured latent traits. Also, the credible intervals for $IE (α, 0.3)$ based on JMM are slightly narrower than those for LMM, illustrating that accounting for unmeasured contextual confounding can also lead to efficiency gains in the estimation of interference effects.

8. Discussion

In this paper, we developed a Bayesian approach for causal inference with clustered network-based observational studies, estimating the direct and indirect effects of a binary exposure on a continuous outcome while accounting for potential unmeasured contextual confounding. We discussed the inherent challenges that arise in causal inference with multilevel network data, which are prevalent in education, and discussed the relevance of accounting for potential non-ignorability of the treatment assignment. We proposed to simultaneously model the outcome and treatment assignment by JMM, which reduces unmeasured contextual confounding when models are correctly specified. Our proposed methodology can also address potential network autocorrelation in the outcome. We described a Bayesian standardisation procedure that can be used to obtain the posterior distributions of causal estimands of interest from the posterior distributions of the JMM parameters. We note that JMM has a specific representation of unmeasured contextual confounding, the level of which is governed by the level of heterogeneity in the outcome and treatment assignment mechanisms and the correlation between the treatment assignment and the outcome. In the case of the joint random intercept model, this correlation operates through the correlation parameter $ρ$ . Large random effect variances or $ρ$ values may lead to significant bias due to unmeasured confounding. The simulation study demonstrated that JMM provides valid inference in a wide variety of settings for average potential outcomes and causal contrasts, where LMM and FEM may not. A particular advantage of our proposed approach is that it performs well with either the presence or absence of unmeasured contextual confounding.

Our proposal fits more widely into the literature of Bayesian causal inference methods, incorporating a model for the propensity score. Existing ways of combining propensity score and outcomes models include the propensity score as a covariate in the outcome model, specifying dependent priors or shared parameters between the propensity score and outcome models, or posterior predictive inference through inverse probability weighting and doubly robust estimation.^23,42,43 As we used a latent Gaussian structure to relate the outcome and propensity score models, our method falls under the second category of methods.¹³

As previously mentioned, JMM relies on parametric assumptions regarding the joint distribution of outcomes and treatment assignments. Unbiased estimation requires correct specification of the outcome model, the treatment assignment mechanism, and the random effects distribution. Under the assumption of strong ignorability, parametric model assumptions would not necessarily be required for valid causal inference. However, in the context of a school-based design such as Add Health, unmeasured school-level confounders are highly likely, as students attending the same school may share latent characteristics affecting both their outcome and exposure. To relax the unconfoundedness assumption, the trade-off here is to make additional assumptions on model specifications. In particular, we assumed that random effects follow a normal distribution in both the outcome model and the propensity score model. As seen in simulation studies, a violation of the Gaussian assumption in the form of exponentially distributed random effects can impact the coverage probability of credible intervals for average potential outcomes more severely, while inference for causal contrasts seems to be mildly affected for the data-generating mechanism considered. Importantly, similar to LMM and LMM-aug, JMM also yields biased inference under outcome model misspecification, which aligns with theoretical expectations. As it is more restrictive and also computationally expensive, the gain provided by JMM may not be worthwhile in larger networks or when contextual confounding is weak. Another caveat specific to our analysis of spillover effects of maternal education in Add Health is that we had to use a smaller subset of the dataset for computational reasons, potentially leading to an underpowered analysis. In addition, we considered a linear functional form to relate the outcome to the predictors in our analysis of Add Health. While posterior predictive checks and the deviance information criterion supported the use of a linear model over a generalised additive model in this particular case, in general, non-linear such as spline representations, as the general outcome model formulation in (1) should be investigated, as failure to capture true non-linear relationships may lead to invalid causal inferences.

Despite the potential merits of this work, methodological challenges remain in order to improve inference for causal effects in the presence of network interference. We performed approximate checks for positivity by assessing covariate overlap for the individual exposure only. However, the context of interference requires sufficient overlap across groups defined by the different combinations of the individual and neighbourhood exposures. Designing positivity assessments for the network interference setting should be included in future research. Also, the impact of the network topology on the covariance matrix $Φ$ in the SAR model remains to be investigated. For instance, a sparse or inhomogeneous graph may have an effect with regard to the computation of $Φ$ , particularly on the matrix inversions involved. Although the methodology in this paper is developed under a SAR structure, the analysis of the Add Health study was done under $τ = 0$ corresponding to independent random errors, avoiding the SAR structure for this large-scale empirical network of $N = 7, 931$ nodes. In theory, for large sparse networks, one can avoid inversion entirely by parameterising the model in terms of the precision matrix rather than the covariance matrix. This approach, in combination with an incomplete Cholesky factorisation, would allow for scalable inference without ever forming or inverting the full covariance matrix. While our implementation relied on the covariance form, a precision-based parameterisation is a natural avenue for handling larger networks in future work.

Of the different sources of unmeasured network confounding,⁶ our approach only addresses contextual confounding. However, node-level unmeasured confounders could still introduce bias in causal effect estimates. One direction for future research could entail developing novel sensitivity analysis techniques to assess the impact of node-level unmeasured confounding on the inference for direct and spillover effects, such as a method based on the confounding method, which has the advantage of avoiding introducing a hypothetical unmeasured confounder.^44,45

Another challenge inherent to this context is homophily bias. In sociological settings, this can lead to latent variable dependence due to latent traits that both drive the formation of network ties and predict an individual’s treatment or outcome. In all scenarios we considered here, we assumed that all variables driving homophily were measured and conditioned upon in analyses, which is unlikely to be the case in practice. Imposing a SAR structure on the random errors of the outcome model might account for some of the latent dependence, but more work has to be done to better understand how homophily can affect inference in these settings. Latent homophily poses a threat to the identification of causal effects in various ways. Shalizi and Thomas⁴⁶ showed that unless all of the nodal attributes driving both social-tie formation and the behaviour of interest are observed, then peer effects are generally unidentified. Assuming that the social network is generated by a latent community, McFowland III and Shalizi⁴⁷ proposed asymptotically unbiased estimators of peer influence under homophily by controlling the latent location of each node. Another avenue for future work would be to use their methodology and incorporate estimates of latent positions in the outcome and exposure models to control for potential latent homophily when inferring for interference effects.

Another limitation of this work is that it is assumed that the adjacency matrix $A$ is fully observed and without error. A common strategy consists of treating missing edges as absent or dropping vertices with missing edge information, which can lead to dubious results.^48–50 Li et al.⁵¹ have studied the impact of noisy networks for experiments on networks, but the problem of missing data remains largely understudied in the network interference literature. To our knowledge, the only example of a use of network imputation to assess causal effects with interference is in the context of experiments on networks.⁵² This could be another important avenue for future work in the context of network-based observational studies.

In general, cross-sectional social network data are limited, since it remains a challenge to distinguish whether associations in peers’ outcomes are due to interference or homophily.⁶ Future developments should entail extending methods presented in this paper to discrete-time dynamic networks and considering the co-evolution of network ties, $A_{i j}^{t}$ , and vertex attributes, $(Y_{ν i}^{t}, Z_{ν i}^{t}, X_{ν i}^{t})$ , through time $t$ for more nuanced causal inferences. A more straightforward path would be to consider discrete-time processes, as discrete-time models for time-varying effects, such as structural nested models or marginal structural models estimated via the g-formula, g-estimation, or related methods, do not generalise easily to continuous time, due to the entanglement of uncountably infinite variables. Stochastic actor-oriented models are appealing in that they allow for joint modelling of social influence and social ties, and can hence provide adequate control for network endogeneity in longitudinal social network data.^6,53 Despite their potential for interference analyses, these models have yet to be formulated in a counterfactual framework.⁶ Special considerations would be needed to account for potential time-varying confounding by $X_{ν i}^{t}$ as well as by $A_{i j}^{t}$ , since these variables could also be on causal paths for subsequent time points. This represents a promising yet challenging avenue for future work. Lastly, future extensions of the JMM could consider categorical and count outcomes as well as survival outcomes.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802251403355 - Supplemental material for Joint mixed-effects models for causal inference in clustered network-based observational studies

Supplemental material, sj-pdf-1-smm-10.1177_09622802251403355 for Joint mixed-effects models for causal inference in clustered network-based observational studies by Vanessa McNealis, Erica EM Moodie and Nema Dean in Statistical Methods in Medical Research

Footnotes

ORCID iDs

Vanessa McNealis

Erica EM Moodie

Ethical approval statement

Ethical approval to use data from Add Health was obtained from the McGill Institutional Review Board (A02-E11-22A).

Funding

The authors disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This research was enabled in part by support provided by Calcul Québec ( https://www.calculquebec.ca/ ) and the Digital Research Alliance of Canada (

). The case study in this paper uses data from Add Health,funded by grant P01 HD31921 (Harris) from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD),with cooperative funding from 23 other federal agencies and foundations. Add Health is currently directed by Robert A. Hummer and funded by the National Institute on Aging cooperative agreements U01 AG071448 (Hummer) and U01AG071450 (Aiello and Hummer) at the University of North Carolina at Chapel Hill. Add Health was designed by J. Richard Udry,Peter S. Bearman,and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill. Vanessa McNealis is supported by doctoral fellowships from Natural Sciences and Engineering Research Council of Canada (NSERC) and the Fonds de Recherche du Québec (FRQ)—Nature et Technologie. Erica E. M. Moodie acknowledges support from an NSERC Discovery Grant. Erica E. M. Moodie is a Canada Research Chair (Tier 1) in Statistical Methods for Precision Medicine and acknowledges the support of a Chercheur-boursier de mérite career award from the FRQ—Santé.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Data availability

The data from Add Health used in this study were provided by the Carolina Population Center and are not publicly available. Researchers can request access at

. The simulation code and datasets are available from the corresponding author on reasonable request.

Supplemental material

Supplemental materials for this article are available online.

References

Hattie

. Classroom composition and peer effects. Int J Educ Res 2002; 37: 449–481.

Jones

. How does classroom composition affect learning outcomes in Ugandan primary schools? Int J Educ Dev 2016; 48: 66–78.

Hoxby

. Peer effects in the classroom: Learning from gender and race variation (NBER Working Paper 7867). National Bureau of Economic Research, 2000.

Bifulco

Fletcher

Ross

. The effect of classmate characteristics on post-secondary outcomes: Evidence from the Add Health. Am Econ J Econ Policy 2011; 3: 25–53.

Fletcher

Ross

Zhang

. The consequences of friendships: Evidence on the effect of social relationships in school on academic achievement. J Urban Econ 2020; 116: 103241.

VanderWeele

. Social networks and causal inference. In: Morgan SL (ed) Handbook of causal analysis for social research. Dordrecht, The Netherlands, Springer, 2013, pp. 353–374.

Ding

. Causal inference: A missing data perspective. Stat Sci 2018; 33: 214–237.

Follmann

. An approximate generalized linear model with random effects for informative missing data. Biometrics 1995; 51: 151–168.

Little

Rubin

. Statistical analysis with missing data 793. Hoboken, NJ: John Wiley & Sons, 2019.

10.

Shardell

Ferrucci

. Joint mixed-effects models for causal inference with longitudinal data. Stat Med 2018; 37: 829–846.

11.

Papadogeorgou

Samanta

. Spatial causal inference in the presence of unmeasured confounding and interference. arXiv preprint arXiv:230308218, 2023.

12.

Nobre

Schmidt

Moodie

EEM

, et al. The impact of directly observed therapy on the efficacy of tuberculosis treatment: A Bayesian multilevel approach. J R Stat Soc Ser C Appl Stat 2023; 72: 889–911.

13.

Kim

Hummers

, et al. Causal inference using multivariate generalized linear mixed-effects models with longitudinal data. arXiv preprint arXiv:230302201, 2023.

14.

. Inverse conditional probability weighting with clustered data in causal inference. arXiv preprint arXiv:180801647, 2018.

15.

Lee

Nguyen

Stuart

. Partially pooled propensity score models for average treatment effect estimation with multilevel data. J R Stat Soc Ser A Stat Soc 2021; 184: 1578–1598.

16.

Salditt

Nestler

. Parametric and nonparametric propensity score estimation in multilevel observational studies. Stat Med 2023; 42: 4147–4176.

17.

Suk

Kang

. Robust machine learning for treatment effects in multilevel observational studies under cluster-level unmeasured confounding. Psychometrika 2022; 87: 310–343.

18.

Lee

Buchanan

Katenka

, et al. Estimating causal effects of non-randomized HIV prevention interventions with interference in network-based studies among people who inject drugs. Ann Appl Stat 2023; 17: 2165–2191.

19.

Forastiere

Airoldi

Mealli

. Identification and estimation of treatment and interference effects in observational studies on networks. J Am Stat Assoc 2021; 116: 901–918.

20.

McNealis

Moodie

EEM

Dean

. Revisiting the effects of maternal education on adolescents’ academic performance: Doubly robust estimation in a network-based observational study. J R Stat Soc Ser C Appl Stat 2024; 73: 715–734.

21.

Harris

. The add health study: Design and accomplishments. Chapel Hill: Carolina Population Center, University of North Carolina at Chapel Hill, 2013.

22.

Aronow

Samii

. Estimating average causal effects under general interference, with application to a social network experiment. Ann Appl Stat 2017; 11: 1912–1947.

23.

Ding

Mealli

. Bayesian causal inference: A critical review. Philos Trans R Soc A 2023; 381: 20220153.

24.

Liu

Hudgens

Saul

, et al. Doubly robust estimation in observational studies with partial interference. Stat 2019; 8: e214.

25.

Tchetgen

EJT

VanderWeele

. On causal inference in the presence of interference. Stat Methods Med Res 2012; 21: 55–75.

26.

Papadogeorgou

Mealli

Zigler

. Causal inference with interfering units for cluster and population level treatment allocation programs. Biometrics 2019; 75: 778–787.

27.

Barkley

Hudgens

Clemens

, et al. Causal inference from observational studies with clustered interference, with application to a cholera vaccine study. Ann Appl Stat 2020; 14: 1432–1448.

28.

Hudgens

Halloran

. Toward causal inference with interference. J Am Stat Assoc 2008; 103: 832–842.

29.

Cressie

. Statistics for spatial data. New York: John Wiley & Sons, 2015.

30.

Yauck

Moodie

EEM

Apelian

, et al. General regression methods for respondent-driven sampling data. Stat Methods Med Res 2021; 30: 2105–2118.

31.

Dormann

McPherson

Araújo

, et al. Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography 2007; 30: 609–628.

32.

Kilpatrick

Gilbertson

Brookhart

, et al. Exploring large weight deletion and the ability to balance confounders when using inverse probability of treatment weighting in the presence of rare treatment decisions. Pharmacoepidemiol Drug Saf 2013; 22: 111–121.

33.

Tseng

Elashoff

, et al. Longitudinal data analysis with non-ignorable missing data. Stat Methods Med Res 2016; 25: 205–220.

34.

Rubin

. Bayesian inference for causal effects: The role of randomization. Ann Stat 1978; 6: 34–58.

35.

Ricciardi

Mattei

Mealli

. Bayesian inference for sequential treatments under latent sequential ignorability. J Am Stat Assoc 2020; 115: 1498–1517.

36.

McCandless

Gustafson

Levy

. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat Med 2007; 26: 2331–2347.

37.

Gelman

. Prior distributions for variance parameters in hierarchical models. Bayesian Anal 2006; 1: 515–533.

38.

Polson

Scott

. On the half-cauchy prior for a global scale parameter. Bayesian Anal 2012; 7: 887–902.

39.

Lewandowski

Kurowicka

Joe

. Generating random correlation matrices based on vines and extended onion method. J Multivar Anal 2009; 100: 1989–2001.

40.

Duque

. MDBED: Moran–Downton bivariate exponential distribution (R package version 1.0.0). https://CRAN.R-project.org/package=MDBED (2020).

41.

Krause

Huisman

Steglich

, et al. Missing network data: A comparison of different imputation methods. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), IEEE, 28–31 August 2018, Barcelona, Spain, pp.159–163. New York, NY, USA: IEEE.

42.

Saarela

Stephens

Moodie

EEM

, et al. On Bayesian estimation of marginal structural models. Biometrics 2015; 71: 279–288.

43.

Saarela

Belzile

Stephens

. A Bayesian view of doubly robust causal inference. Biometrika 2016; 103: 667–681.

44.

Ennis

, et al. A flexible approach for causal inference with multiple treatments and clustered survival outcomes. Stat Med 2022; 41: 4982–4999.

45.

Zou

, et al. A flexible sensitivity analysis approach for unmeasured confounding with multiple treatments and a binary outcome with application to SEER-Medicare lung cancer data. Ann Appl Stat 2022; 16: 1014.

46.

Shalizi

Thomas

. Homophily and contagion are generically confounded in observational social network studies. Sociol Methods Res 2011; 40: 211–239.

47.

McFowland

IIIE

Shalizi

. Estimating causal peer influence in homophilous social networks by inferring latent locations. J Am Stat Assoc 2023; 118: 707–718.

48.

Ghani

Donnelly

Garnett

. Sampling biases and missing data in explorations of sexual partner networks for the spread of sexually transmitted diseases. Stat Med 1998; 17: 2079–2097.

49.

Kossinets

. Effects of missing data in social networks. Soc Netw 2006; 28: 247–268.

50.

Huisman

Steglich

. Treatment of non-response in longitudinal network studies. Soc Netw 2008; 30: 297–308.

51.

Sussman

Kolaczyk

. Causal inference under network interference with noise. arXiv preprint arXiv:210504518, 2021.

52.

Kao

. Causal inference under network interference: A framework for experiments on social networks. PhD Thesis, Harvard University, 2017.

53.

Snijders

. Stochastic actor-oriented models for network change. J Math Sociol 1996; 21: 149–172.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.93 MB

0.00 MB

		$μ_{0, 0.7}$ ^a				$μ_{1, 0.7}$ ^a				$DE (0.7)$ ^a
$ρ$	Method	RB	ASD	ESD	ECP	RB	ASD	ESD	ECP	RB	ASD	ESD	ECP
0	Main scenario
	JMM	−0.32	0.16	0.17	92.8	−0.06	0.16	0.17	94.6	0.07	0.08	0.08	96.2
	LMM	−0.21	0.16	0.17	93.2	−0.02	0.16	0.17	94.6	0.08	0.08	0.08	96.6
	LMM-aug	−0.11	0.16	0.16	93.2	0.01	0.16	0.17	94.4	0.07	0.08	0.08	96.8
	FEM	−0.46	0.09	0.19	64.8	0.11	0.08	0.21	53.6	0.39	0.10	0.13	85.8
0.2	Main scenario
	JMM	−0.35	0.16	0.16	95.2	0.04	0.16	0.16	94.8	0.23	0.08	0.07	97.4
	LMM	−0.05	0.16	0.16	94.4	0.29	0.16	0.16	94.2	0.45	0.08	0.07	97.4
	LMM-aug	−0.08	0.16	0.16	95.0	0.28	0.16	0.16	94.4	0.45	0.08	0.07	97.8
	FEM	1.83	0.09	0.19	62.6	2.93	0.08	0.21	49.6	3.46	0.10	0.13	77.8
	SAR errors ( $τ = 0.1$ )
	JMM	−0.92	0.15	0.17	91.5	−0.28	0.16	0.17	91.0	0.03	0.08	0.08	95.5
	LMM	−0.89	0.15	0.17	91.5	−0.04	0.15	0.17	91.0	0.37	0.08	0.08	95.5
	LMM-aug	−0.94	0.15	0.17	91.5	−0.04	0.16	0.17	92.5	0.40	0.08	0.08	94.5
	FEM	−0.37	0.08	0.17	63.0	1.28	0.08	0.17	58.0	2.08	0.08	0.10	87.0
	Lower sample size ( $m = 30$ )
	JMM	−0.29	0.22	0.22	94.4	−0.08	0.22	0.23	93.8	0.03	0.10	0.09	97.6
	LMM	0.01	0.21	0.22	94.2	0.16	0.21	0.23	93.4	0.24	0.10	0.09	97.6
	LMM-aug	0.18	0.22	0.22	93.6	0.19	0.21	0.23	92.4	0.19	0.10	0.09	97.6
	FEM	1.58	0.11	0.25	58.4	2.87	0.10	0.27	49.6	3.49	0.12	0.16	81.0
	Larger random effect variance ( $σ_{b^{y}}^{2} = σ_{b^{z}}^{2} = 4$ )
	JMM	−0.17	0.30	0.30	95.0	−0.09	0.30	0.30	94.8	−0.06	0.10	0.09	97.6
	LMM	0.35	0.30	0.30	95.0	0.25	0.30	0.30	94.0	0.20	0.10	0.09	97.6
	LMM-aug	0.18	0.30	0.30	95.2	0.16	0.30	0.30	94.6	0.15	0.10	0.09	97.0
	FEM	2.50	0.15	0.39	53.0	6.27	0.11	0.38	35.0	8.07	0.16	0.29	60.8
	No treatment effect^b
	JMM	−0.34	0.16	0.17	93.8	0.23	0.16	0.17	93.0	-	0.08	0.08	95.0
	LMM	−0.18	0.16	0.17	94.4	1.44	0.16	0.17	91.8	-	0.08	0.08	94.8
	LMM-aug	−0.01	0.16	0.17	93.6	1.59	0.16	0.17	92.0	-	0.08	0.08	94.6
	FEM	2.78	0.09	0.19	64.8	19.71	0.08	0.20	47.4	-	0.10	0.14	70.6
	Bivariate exponential random effect distribution
	JMM	−0.35	0.15	0.08	99.8	−0.05	0.15	0.09	100.0	0.10	0.09	0.08	97.8
	LMM	−0.64	0.15	0.08	99.6	0.09	0.15	0.09	100.0	0.43	0.09	0.08	97.6
	LMM-aug	−0.27	0.15	0.09	99.6	0.14	0.15	0.09	100.0	0.34	0.09	0.08	97.4
	FEM	−4.42	0.09	0.12	80.8	0.85	0.09	0.12	80.4	3.35	0.11	0.14	79.6
	Truncated normal outcome distribution
	JMM	−0.20	0.15	0.15	96.2	−0.09	0.15	0.15	95.8	−0.03	0.08	0.07	96.4
	LMM	−0.09	0.15	0.15	95.6	0.12	0.15	0.15	96.0	0.22	0.08	0.07	96.6
	LMM-aug	2.16	0.15	0.15	95.2	−0.24	0.15	0.15	96.2	−1.46	0.08	0.07	93.2
	FEM	1.49	0.08	0.17	61.0	2.68	0.07	0.18	48.0	3.28	0.10	0.13	82.2
	Misspecified outcome model
	JMM	13.24	0.21	0.18	91.4	−4.50	0.21	0.19	88.4	−12.93	0.17	0.17	44.6
	LMM	14.01	0.21	0.18	90.8	−3.48	0.20	0.19	92.0	−11.79	0.17	0.17	52.2
	LMM-aug	14.07	0.21	0.18	91.6	−3.47	0.21	0.19	91.6	−11.80	0.17	0.17	53.4
	FEM	15.89	0.15	0.20	67.8	−0.82	0.14	0.22	77.4	−8.76	0.17	0.19	71.0
0.5	Main scenario
	JMM	0.13	0.16	0.17	92.8	0.10	0.16	0.17	93.4	0.09	0.08	0.08	96.0
	LMM	0.41	0.16	0.17	93.0	0.61	0.16	0.16	93.2	0.71	0.08	0.08	95.0
	LMM-aug	0.48	0.16	0.17	92.4	0.65	0.16	0.16	92.6	0.73	0.08	0.08	94.8
	FEM	4.98	0.09	0.18	62.2	7.57	0.08	0.20	20.4	8.81	0.10	0.13	35.2
0.8	Main scenario
	JMM	−0.59	0.16	0.17	93.0	−0.08	0.16	0.17	93.2	0.16	0.08	0.08	95.4
	LMM	0.11	0.16	0.16	93.0	0.83	0.16	0.16	92.6	1.17	0.08	0.08	93.8
	LMM-aug	0.22	0.16	0.17	92.2	0.87	0.16	0.16	92.8	1.18	0.08	0.08	93.6
	FEM	7.21	0.08	0.16	61.2	12.02	0.07	0.18	3.0	14.32	0.09	0.14	7.2