Sage Journals: Discover world-class research

Abstract

Background/Aims:

Power and sample size calculation formulas for stepped-wedge trials with two levels (subjects within clusters) are available. However, stepped-wedge trials with more than two levels are possible. An example is the CHANGE trial which randomizes nursing homes (level 4) consisting of nursing home wards (level 3) in which nurses (level 2) are observed with respect to their hand hygiene compliance during hand hygiene opportunities (level 1) in the care of patients. We provide power and sample size methods for such trials and illustrate these in the setting of the CHANGE trial.

Methods:

We extend the original sample size methodology derived for stepped-wedge trials based on a random intercepts model, to accommodate more than two levels of clustering. We derive expressions that can be used to determine power and sample size for p levels of clustering in terms of the variances at each level or, alternatively, in terms of intracluster correlation coefficients. We consider different scenarios, depending on whether the same units in a particular level are repeatedly measured as a cohort sample or whether different units are measured cross-sectionally.

Results:

A simple variance inflation factor is obtained that can be used to calculate power and sample size for continuous and by approximation for binary and rate outcomes. It is the product of (1) variance inflation due to the multilevel structure and (2) variance inflation due to the stepped-wedge manner of assigning interventions over time. Standard and non-standard designs (i.e. so-called “hybrid designs” and designs with more, less, or no data collection when the clusters are all in the control or are all in the intervention condition) are covered.

Conclusions:

The formulas derived enable power and sample size calculations for multilevel stepped-wedge trials. For the two-, three-, and four-level case of the standard stepped wedge, we provide programs to facilitate these calculations.

Keywords

Stepped-wedge trials hybrid (stepped wedge) design power sample size multilevel variance inflation factor

Introduction

Hussey and Hughes¹ and Girling and Hemming² derived a power formula for the standard stepped-wedge cluster-randomized design (see Figure 1) with two levels of clustering (i.e. subjects within clusters), where cross-sectional samples are taken at the lowest (subject) level, that is, different subjects are measured in every period. In this article, we derive and demonstrate power and sample size calculations for stepped-wedge cluster trials with more than two levels, in which the lowest level is cross-sectional. One such example is the CHANGE trial (ClinicalTrial.gov NCT02817282), which aims to improve nurses’ level of compliance with hand hygiene guidelines. This trial has four levels of clustering, with nurses (level 2) in wards (level 3) of several nursing homes (level 4). Nurses are followed in sessions where different opportunities for hand hygiene arise and observations (level 1) on compliance to the guideline are made.

Figure 1.

Cluster-randomized parallel group design and different stepped wedge like designs with $s = 4$ sequences. Each row corresponds to a sequence in the design with the number of clusters in that sequence at the right side of the row. The background color of a cell indicates the treatment (white for control and black for intervention) and the number within a cell gives the number of repeated measurements. The total number of measurements is indicated below the design. Further details are provided in the Supplementary Files (SF3, 4, and5).

If clusters consist of more than two levels, different scenarios are possible. For example, in the CHANGE trial that has four levels, the following scenarios are possible (Figure 2):

Figure 2.

Scenarios in four-level stepped-wedge design (CHANGE trial setting): (a) only nursing homes NH (level 4) followed as cohort, (b) wards W (level 3) within nursing homes (level 4) followed as cohort, and (c) nurses Nu (level 2) within wards (level 3) in nursing homes (level 4) followed as cohort. The boxed parts of the multilevel data are measured cross-sectionally. In particular, the observations O (level 1) are always measured cross-sectionally.

Level 4 repeated: the same nursing homes are repeatedly measured (i.e. as a cohort) but in every measurement period, different wards are measured, implying also that different nurses and hygiene observations are made (i.e. cross-sectional measurement at the lower levels).

Levels 4 and 3 repeated: the same wards within nursing homes are repeatedly measured, but in every measurement period, different nurses and hygiene observations are made cross-sectionally over time.

Levels 4 and 3 and 2 repeated: the same nurses within wards within nursing homes are repeatedly measured (cohort design at these levels); in each measurement period, different (i.e. cross-sectional) hygiene observations are made.

As illustrated in the range of possible scenarios above, the highest level (referred to as a cluster in this article) is always repeatedly measured and the lowest level cross-sectionally. Up to a certain level, all levels below this level are cross-sectionally measured, but levels above it as cohort.

Our method covers both “standard” stepped-wedge designs (i.e. designs where all clusters start in the control and end in the intervention condition) and non-standard designs (i.e. stepped-wedge designs with more, less, or no data collection before and/or after roll-out³ and hybrid designs;² see Figure 1 with $s = 4$ stepped-wedge sequences).

Methods

In order to support the flow of arguments, technical derivations are provided in the Supplementary Files (SF) and notations given in Table 1. At time t, cluster i is either in the control condition $(X_{it} = 0)$ or in the intervention condition $(X_{it} = 1)$ . For power calculations, we make the simplifying assumption that the differences between conditions, $δ$ , is the same wherever and whenever the intervention is introduced and is maintained at this level. Hussey and Hughes¹ modeled the clustering of subjects within clusters by a random intercept for cluster (level 2 random effect). For more than two levels, we extend this idea by incorporating random effects for each clustering level. For example, for four levels, the outcome $Y_{itjkm}$ of “observation” (level 1 unit) $m = 1, \dots, n_{1}$ of “subject” (level 2 unit) $k = 1, \dots, n_{2}$ within “sub-cluster” (level 3 unit) $j = 1, \dots, n_{3}$ within “cluster” (level 4 unit) $i = 1, \dots, I$ in measurement/period $t = 1, \dots, T$ is

$\begin{matrix} \begin{matrix} Y_{itjkm} = μ + u_{000 i} + u_{00 i (t) j} + u_{0 i (t) jk} + β_{t} + δ X_{it} + e_{itjkm}, \\ \begin{matrix} u_{000 i}, u_{00 i (t) j}, u_{0 i (t) jk} random effects at levels 4, 3, and 2 with variances σ_{4}^{2}, σ_{3}^{2}, and σ_{2}^{2}, respectively \\ e_{itjkm} random effect (residual) at level 1, with variance σ_{1}^{2} \end{matrix} \end{matrix} \\ \begin{matrix} {u_{000 i}, u_{00 (it) j,} u_{0 (it) jk,} e_{itjkm}} mutually independent; \\ \begin{matrix} u_{00 i (t) j}, u_{00 i (t') j} are equal (unequal) for t \neq t' if level 3 measured as cohort (cross - sectional); \\ u_{0 i (t) jk}, u_{0 i (t') jk} are equal (unequal) for t \neq t' if level 2 measured as cohort (cross - sectional) \end{matrix} \end{matrix} \end{matrix}}$ (1)

Table 1.

Notations in this article illustrated in the CHANGE trial setting.

Parameter	Meaning (in the four-level CHANGE trial)
$Y_{it •}$	The average of outcome Y in cluster i at time t, that is, the dot means averaging over all sub-units
$δ$	Treatment effect
$β_{t}$	Time effect at measurement time/period t
$X_{it}$	Design matrix: $X_{it} = 1$ if cluster i has intervention at time t, and $X_{it} = 0$ if it is in control condition
$σ_{tot}^{2}$	Total variance of level 1 units unconditional, that is, regardless of the cluster they belong to
$ρ_{12}$	True (population) value of correlation of level-1 units (observations) within a level-2 unit (nurse)
$ρ_{23}$	True (population) value of correlation of level-2 units (nurse) within a level-3 unit (ward)
${\tilde{ρ}}_{23}$	Sample estimated value of correlation of level-2 units (nurse) within a level-3 unit (ward)
$ρ_{34}$	True (population) value of correlation of level-3 units (ward) within a level-4 unit (nursing home)
${\tilde{ρ}}_{34}$	Sample estimated value of correlation of level-3 units (ward) within a level-4 unit (nursing home)
$n_{1}$	Number of level-1 units (observations) per level-2 unit (nurse)
$n_{2}$	Number of level-2 units (nurses) per level-3 unit (ward)
$n_{3}$	Number of level-3 units (wards) per level-4 unit (nursing home)
s	Number of sequences in a stepped wedge (also if part of a larger design)
c	Number clusters in a sequence of a stepped-wedge design
T	Number of measurement times/periods (including the baseline)
I	Total number of clusters (nursing homes)
$τ^{2}$	$Cov (Y_{it •}, Y_{is •})$ : covariance between averages of the same cluster at different times t and s
$τ^{2} + σ^{2}$	$Var (Y_{it \cdot})$ : variance of a cluster average at a time t
$σ_{1}^{2}$	Variance at level 1, that is, variance of level-1 units (observations) within their level-2 unit (nurse)
$σ_{2}^{2}$	Variance at level 2, that is, variance of level-2 units (nurses) within their level-3 unit (ward)
$σ_{3}^{2}$	Variance at level 3, that is, variance of level-3 units (wards) within their level-4 unit (nursing home)
$σ_{4}^{2}$	Variance at level 4, that is, variance between level-4 units (nursing homes)
$VI F_{p}$	Variance inflation factor due to the multilevel structure of the data having p levels
$ρ$	$Corr (Y_{it •}, Y_{is •})$ correlation between averages of the same cluster at different times t and s

If an intermediate level is measured as cohort, the index $(t)$ can be dropped. In this article, we assume that at every measurement time/period $(t = 1, 2, \dots, T)$

All clusters $(i = 1, 2, \dots, I)$ are measured;

Each level-2 unit (e.g. nurse) has the same number $n_{1}$ of level-1 units (e.g. observations); each level-3 unit (e.g. nursing home) has the same number $n_{2}$ of level-2 units (e.g. nurses), and so on.

Randomization is always on the highest level.

In terms of the cluster averages $Y_{it •}$ at each time point/period (so $Y_{it •} = (\sum_{j, k, m} Y_{itjkm}) / (n_{1} n_{2} n_{3})$ for four levels), we have a repeated measurement design, and the above model implies equal covariance $τ^{2} = Cov (Y_{it •}, Y_{it' •})$ between averages of the same cluster over time, and equal variance $Var (Y_{it •}) = σ^{2} + τ^{2}$ of the clusters across all time/period (SF1). The variance of the weighted least-squares estimator $\hat{δ}$ for the intervention effect is (Hussey & Hughes, 2007)

$\begin{matrix} var (\hat{δ}) = \frac{I σ^{2} (σ^{2} + T τ^{2})}{f (X) σ^{2} + g (X) τ^{2}} \\ f (X) = S \cdot I - C, g (X) = S^{2} + S \cdot I \cdot T - R \cdot I - C \cdot T \\ S = Σ_{it} X_{it}, C = Σ_{t} {(Σ_{i} X_{it})}^{2}, R = Σ_{i} {(Σ_{t} X_{it})}^{2} \end{matrix}}$ (2)

where S is the sum of matrix elements, C is the sum of squared column sums, and R is the sum of squared row sums of $X = (X_{it})$ .

In terms of the correlation $ρ = corr (Y_{it •}, Y_{it' •})$ between averages of the same cluster over time, we can reformulate this as (SF2)

$var (\hat{δ}) = \frac{I \cdot (1 - ρ) \cdot [1 + (T - 1) ρ]}{f (X) \cdot (1 - ρ) + g (X) \cdot ρ} \cdot var (Y_{it •})$ (3)

or in equivalent formulation by Girling and Hemming² (SF2)

$\begin{matrix} v a r (\hat{δ}) = \frac{(1 - ρ)}{I \cdot T \cdot (a_{D} (X) - b_{D} (X) \cdot R)} \cdot v a r (Y_{i t •}) \\ a_{D} (X) = \frac{1}{I \cdot T} \cdot Σ_{i t} {(X_{i j} - X_{• t})}^{2}, \\ b_{D} (X) = 1 / I Σ_{i t} {(X_{i •} - X_{• •})}^{2}, \\ R = \frac{T \cdot ρ}{1 + (T - 1) \cdot ρ} \\ X_{• t =} Σ_{i} X_{i t} / I, X_{i •} = Σ_{t} X_{i t} / T, X_{• •} = Σ_{i t} X_{i t} / (I \cdot T) . \end{matrix}}$ (4)

where $a_{D}$ is the within-column variance of $(X_{it})$ and $b_{D}$ is the between-row variance. Note that $ρ$ is not an intracluster correlation coefficient, but it can be expressed in terms of intracluster correlations of the multilevel design (Table 2).

Table 2.

Formulas for standard stepped-wedge trials with two, three, or four levels.

Stepped-wedge scenarios	Conversion formulas
Two levels	$σ_{2}^{2} = ρ_{12} \cdot σ_{tot}^{2}$ $σ_{1}^{2} = (1 - ρ_{12}) \cdot σ_{tot}^{2}$	$var (Y_{it •}) = \frac{σ_{tot}^{2}}{n_{1}} VI F_{2}, {σ^{2}}_{tot} = σ_{2}^{2} + σ_{1}^{2}$ $VI F_{2} = [1 + {(n_{1} - 1)}_{12}], ρ_{12} = \frac{σ_{2}^{2}}{σ_{2}^{2} + σ_{1}^{2}}$
	Covariance $τ^{2}$ and variance $τ^{2} + σ^{2}$ of cluster-time averages	Correlation $ρ = corr (Y_{it •}, Y_{is •})$ and variance $var (Y_{it •})$ of cluster-time averages
Level 2 (cluster) repeatedly measuredLevel 1 (e.g. subject) cross-sectionally measured	$τ^{2} = σ_{2}^{2}$ $σ^{2} = \frac{σ_{1}^{2}}{n_{1}}$	$ρ = \frac{n_{1} ρ_{12}}{[1 + {(n_{1} - 1)}_{12}]}$
Three levels	$σ_{3}^{2} = ρ_{23} ρ_{12} \cdot σ_{tot}^{2}$ $σ_{2}^{2} = (1 - ρ_{23}) ρ_{12} \cdot σ_{tot}^{2}$ $σ_{1}^{2} = (1 - ρ_{12}) \cdot σ_{tot}^{2}$	$var (Y_{it •}) = \frac{σ_{tot}^{2}}{n_{1} n_{2}} VI F_{3}, {σ^{2}}_{tot} = σ_{3}^{2} + σ_{2}^{2} + σ_{1}^{2}$ $VI F_{3} = [1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}]$ $ρ_{12} = \frac{σ_{3}^{2} + σ_{2}^{2}}{σ_{3}^{2} + σ_{2}^{2} + σ_{1}^{2}}$ ${\tilde{ρ}}_{23} = {\tilde{ρ}}_{23} (n_{1}) = \frac{σ_{3}^{2}}{[σ_{3}^{2} + σ_{2}^{2} + \frac{σ_{1}^{2}}{n_{1}}]} = ρ_{23} \frac{n_{1} ρ_{12}}{[1 + (n_{1} - 1) ρ_{12}]}$ $ρ_{23} = \frac{σ_{3}^{2}}{σ_{3}^{2} + σ_{2}^{2}}$
	Covariance $τ^{2}$ and variance $τ^{2} + σ^{2}$ of cluster-time averages	Correlation $ρ = corr (Y_{it •}, Y_{is •})$ and variance $var (Y_{it •})$ of cluster-time averages
Level 3 (cluster) repeatedly measuredLevels 2 and 1 cross-sectionally (e.g. subjects and sub-clusters or observations and subjects)	$τ^{2} = σ_{3}^{2}$ $σ^{2} = \frac{σ_{2}^{2}}{n_{2}} + \frac{σ_{1}^{2}}{n_{1} n_{2}}$	$ρ = \frac{ρ_{12} ρ_{23} n_{1} n_{2}}{[1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}]}$
Level 3 (cluster) and Level 2 (subject) repeatedly measured;Level 1 (observation) measured cross-sectionally	$τ^{2} = σ_{3}^{2} + \frac{σ_{2}^{2}}{n_{2}}$ $σ^{2} = \frac{σ_{1}^{2}}{n_{1} n_{2}}$	$ρ = \frac{ρ_{12} n_{1} [1 + (n_{2} - 1) ρ_{23}]}{[1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}]}$
Four levels	$σ_{4}^{2} = ρ_{34} ρ_{23} ρ_{12} \cdot σ_{tot}^{2}$ $σ_{3}^{2} = (1 - ρ_{34}) ρ_{23} ρ_{12} \cdot σ_{tot}^{2}$ $σ_{2}^{2} = (1 - ρ_{23}) ρ_{12} \cdot σ_{tot}^{2}$ $σ_{1}^{2} = (1 - ρ_{12}) \cdot σ_{tot}^{2}$	$var (Y_{it •}) = \frac{σ_{tot}^{2}}{n_{1} n_{2} n_{3}} VI F_{4}, σ_{tot}^{2} = σ_{4}^{2} + σ_{3}^{2} + σ_{2}^{2} + σ_{1}^{2}$ $VI F_{4} = [1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}] [1 + (n_{3} - 1) {\tilde{ρ}}_{34}]$ $ρ_{12} = \frac{σ_{4}^{2} + σ_{3}^{2} + σ_{2}^{2}}{σ_{4}^{2} + σ_{3}^{2} + σ_{2}^{2} + σ_{1}^{2}}$ ${\tilde{ρ}}_{23} = {\tilde{ρ}}_{23} (n_{1}) = \frac{σ_{4}^{2} + σ_{3}^{2}}{σ_{4}^{2} + σ_{3}^{2} + σ_{2}^{2} + σ_{1}^{2} / n_{1}} = ρ_{23} \frac{n_{1} ρ_{12}}{[1 + (n_{1} - 1) ρ_{12}]}, ρ_{23} = \frac{σ_{4}^{2} + σ_{3}^{2}}{σ_{4}^{2} + σ_{3}^{2} + σ_{2}^{2}}$ ${\tilde{ρ}}_{34} = {\tilde{ρ}}_{34} (n_{2}, n_{1}) = \frac{σ_{4}^{2}}{[σ_{4}^{2} + σ_{3}^{2} + \frac{σ_{2}^{2}}{n_{2}} + \frac{σ_{1}^{2}}{n_{1} n_{2}}]} = ρ_{34} \frac{n_{2} {\tilde{ρ}}_{23}}{[1 + (n_{2} - 1) {\tilde{ρ}}_{23}]}, ρ_{34} = \frac{σ_{4}^{2}}{σ_{4}^{2} + σ_{3}^{2}}$
	Covariance $τ^{2}$ and variance $τ^{2} + σ^{2}$ of cluster-time averages	Correlation $ρ = corr (Y_{it •}, Y_{is •})$ and variance $var (Y_{it •})$ of cluster-time averages
Level 4 (cluster) repeatedly measured; levels 3 and 2 and 1 sampled cross-sectionally (e.g. sub-clusters and subjects and observations)	$τ^{2} = σ_{4}^{2}$ $σ^{2} = \frac{σ_{3}^{2}}{n_{3}} + \frac{σ_{2}^{2}}{n_{3} n_{2}} + \frac{σ_{1}^{2}}{n_{3} n_{2} n_{1}}$	$ρ = \frac{ρ_{12} ρ_{23} ρ_{34} n_{1} n_{2} n_{3}}{[1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}] [1 + (n_{3} - 1) {\tilde{ρ}}_{34}]}$
Levels 4 (cluster) and 3 repeatedly measured; levels 2 and 1 sampled cross-sectionally	$τ^{2} = σ_{4}^{2} + \frac{σ_{3}^{2}}{n_{3}}$ $σ^{2} = \frac{σ_{2}^{2}}{n_{3} n_{2}} + \frac{σ_{1}^{2}}{n_{3} n_{2} n_{1}}$	$ρ = \frac{ρ_{12} ρ_{23} n_{1} n_{2} [1 + (n_{3} - 1) ρ_{34}]}{[1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}] [1 + (n_{3} - 1) {\tilde{ρ}}_{34}]}$
Levels 4, 3, and 2 units repeatedly measured;level 1 sampled cross-sectionally	$\begin{matrix} τ^{2} = σ_{4}^{2} + \frac{σ_{3}^{2}}{n_{3}} + \frac{σ_{2}^{2}}{n_{3} n_{2}} \\ σ^{2} = \frac{σ_{1}^{2}}{n_{3} n_{2} n_{1}} \end{matrix}$	$ρ = 1 - \frac{1 - ρ_{12}}{[1 + {(n_{1} - 1)}_{12}] [1 + (n_{2} - 1) {\tilde{ρ}}_{23}] [1 + (n_{3} - 1) {\tilde{ρ}}_{34}]}$

See Table I for definition of parameters and see the section “Variance inflation due to the multilevel structure” for their explanation.

Taking $f, g$ corresponding to a standard stepped-wedge design, we get

$var (\hat{δ}) = \frac{6}{I \cdot (s - \frac{1}{s})} \cdot σ^{2} \cdot [1 + \frac{\frac{s}{2} \cdot τ^{2}}{σ^{2} + (1 + \frac{s}{2}) τ^{2}}]$ (5a)

$var (\hat{δ}) = \frac{6 \cdot (1 - ρ)}{I \cdot (s - \frac{1}{s})} \cdot \frac{[1 + s ρ]}{[1 + \frac{s}{2} ρ]} \cdot var (Y_{it •})$ (5b)

For two levels, equation (5b) reduces to the variance formula in the appendix of the article by Woertman et al.⁴

For $f, g$ of the other designs, see SF4 and 5.

Impact of design and multilevel structure

The design (i.e. the specification of intervention/control condition for each cluster at each time) influences $var (\hat{δ})$ via $f, g$ or $a_{D}, b_{D}$ , while the data generating model (1) influences $var (\hat{δ})$ via $ρ$ and $var (Y_{it •})$ or, equivalently, $σ^{2}$ and $τ^{2}$ . Specifically, the number of levels and the sample size at each level determine $var (Y_{it •})$ , while the specification of which levels are measured as a cohort and which levels cross-sectionally determines $ρ$ (see Table 2).

As illustrated for the CHANGE trial in the section “Introduction,” various scenarios can arise because up to a certain level, all units of lower levels are measured cross-sectionally, and from that level upward, all levels have their units measured repeatedly as cohort. Relevant formulas for each possible scenario with two, three, and four levels are provided in Table 2. Derivation and implementation of these formulas in SAS^® and Excel^® programs are in the SF, which also contains the results for more than four levels.

Variance inflation due to the multilevel structure

The factor $var (Y_{it •})$ in equations (3) and (4) is calculated the same as in cluster-randomized trials with a parallel group post-test (i.e. with one measurement) design. For two levels, $var (Y_{it •}) = [1 + (n - 1) ICC] \cdot 4 σ_{tot}^{2} / N_{tot}$ where $ICC = ρ_{12}$ is the intracluster correlation of subjects within clusters and $[1 + (n - 1) ICC]$ is the variance inflation factor $(VIF)$ , also known as design effect⁵ (SF1.2). For more than two levels, variance inflation factors due to the multiple levels of clustering can also be used, and there are several ways to define these. One is to define separate variance inflation factors for the correlation of level 1 units in level 2 units, for the correlation of level 2 units in level 3 units and so on;^6,7 another is to define separate variance inflation factors based on the correlation of level 1 units in the same level 2 units, the correlation of level 1 units in the same level 3 units, but different level 2 units, and so on.^8,9 Both types of intracluster correlations and variance inflation factors can be expressed in terms of the other (SF1.1). Here, we use only the first mentioned type. Then, the variance inflation for p levels is

$\begin{matrix} VI F_{p} = [1 + {(n_{1} - 1)}_{12}] \cdot [1 + (n_{2} - 1) {\tilde{ρ}}_{23}] \dots \\ [1 + (n_{p - 1} - 1) {\tilde{ρ}}_{p - 1, p}] \end{matrix}$ (6)

To clarify the meaning of this in the CHANGE trial setting, the intracluster correlation $ρ_{12}$ is the true (population) correlation between any pair of observations within the same nurse; the intracluster correlation $ρ_{23}$ is the correlation between true outcomes of two nurses within the same ward; and so on. Because we only have a sample of $n_{1}$ observations per nurse, the true outcome of the nurses can only be approximated by taking the average of the observations per nurse, and therefore, the correlation between the outcomes of two nurses within the same home is attenuated to ${\tilde{ρ}}_{23}$ . The same holds for the other correlations. More on the estimation, interpretation and the attenuation of these intracluster correlations can be found in the article by Teerenstra et al.⁶

Variance inflation factor for stepped-wedge designs

Using equation (3) or (4) and the research by Girling and Hemming² and Thompson et al.,³ we provide variance inflation factors for the p-level “standard” cluster-randomized stepped-wedge design with s sequences, the stepped wedge with more/fewer/no observations before and/or after roll-out, and the hybrid design (SF8). We formulate these compared to a p-level cluster-randomized parallel group design with one measurement $(cPG 1)$ design

$VI F_{rm : cPG 1} = {\begin{matrix} VI F_{S W_{s} : cPG 1} = \frac{3}{2} \cdot \frac{(1 - ρ) \cdot (1 + s \cdot ρ)}{(s - \frac{1}{s}) \cdot (1 + \frac{s}{2} \cdot ρ)} \\ VI F_{S W_{s} (a, b) : cPG 1} = \frac{3}{2} \cdot \frac{(1 - ρ) \cdot (1 + [a + b - 2 + s] \cdot ρ)}{(s - \frac{1}{s}) \cdot (1 + [a + b - 2 + \frac{s}{2}] \cdot ρ)} \\ VI F_{H (β, s) : cPG 1} = \frac{(1 - ρ)}{T} \cdot \frac{1}{1 - \frac{β^{2}}{3} (1 + \frac{2}{s^{2}}) + R \cdot (1 - \frac{β}{3} [2 + \frac{1}{s^{2}}])} \end{matrix}}$ (7)

and thus, the variance inflation factor compared to a parallel group individually randomized design with one measurement (using a t-test) is then

$VIF = VI F_{rm : cPG 1} \cdot VI F_{p}$ (8)

where $VI F_{p}$ is the variance inflation factor due to multilevel structure as explained above.

From equation (8), we can see that the total variance inflation comes from two aspects of the design: the manner of assigning intervention over the measurement times and the multilevel structure at each measurement time.

Sample size and power calculation

As sample size formulas and programs for a parallel group individually randomized designs with one measurement (i.e. post-test design) are readily available, sample size calculation for the stepped-wedge trial with p levels can easily be performed by first calculating the total sample size $N_{tot, PG 1}$ (to detect a prespecified effect $δ$ with prespecified power of $(1 - β) \cdot 100 %$ at a significance level $α$ ). Note that most programs and formulas give the number of subjects per arm, so for the total sample size, this needs to be doubled. After that, we multiply this total sample size by the variance inflation factors to account for the multilevel stepped-wedge design. For a “standard” stepped-wedge design, the total sample size at each measurement time $N_{tot, t}$ (i.e. the total required number of level-1 units across all clusters and arms at each measurement time/period) is

$N_{tot, t} = VIF \cdot N_{tot, PG 1} = VI F_{rm} \cdot VI F_{p} \cdot N_{tot, PG 1}$ (9)

and dividing this by the number of level-1 units per cluster at each measurement yields the total required number of clusters $(I)$ . Dividing this total number of clusters by the number of steps gives the number of clusters per sequence $c = I / s$ (in the hybrid design after accounting for the fraction $β$ ). The parameters $ρ$ and $VI F_{p}$ needed to calculate VIF follow from Table 2 for three-level and four-level designs or from the arguments used in the SF for p-levels designs.

Instead of calculating the total sample size (or number of clusters needed), power for a range of feasible configurations (i.e. number of clusters, sample size at different levels, and intracluster correlations) could be calculated to see which configuration, if any, provides sufficient power. This can be done using the usual power formula

$Power (δ) = Φ (\frac{δ}{\sqrt{var (\hat{δ})}} - z_{1 - α / 2})$

where $Φ$ is the cumulative distribution function of the standard normal distribution and $z_{1 - α / 2}$ is its $100 % \cdot (1 - α / 2)$ percentile.

To calculate $var (\hat{δ})$ , equation (5a) with $σ^{2}$ and $τ^{2}$ can be applied or equation (5b) with $ρ$ and $var (Y_{it •})$ using the appropriate formulas for $σ^{2}, τ^{2}, ρ, var (Y_{it •})$ in Table 2. The latter comes down to using the variance inflation factors, that is, $var (\hat{δ}) = VI F_{rm} \cdot VI F_{p} \cdot 4 σ_{tot}^{2} / N_{tot, t}$ where $N_{tot, t}$ is the total number of level-1 units in the trial at each measurement time/period. For the standard stepped wedge, we can rewrite this to

$\begin{matrix} var (\hat{δ}) = 4 σ_{tot}^{2} \\ \frac{3}{2} \cdot \frac{(1 - ρ) \cdot [1 + s ρ]}{(s - \frac{1}{s}) \cdot [1 + \frac{s}{2} ρ]} \cdot \frac{1}{I} \cdot \frac{[1 + {(n_{1} - 1)}_{12}]}{n_{1}} \\ \cdot \frac{[1 + (n_{2} - 1) {\tilde{ρ}}_{23}]}{n_{2}} \dots \frac{[1 + (n_{p - 1} - 1) {\tilde{ρ}}_{p - 1, p}]}{n_{p - 1}} \end{matrix}$ (10)

in order to investigate the impact of various design parameters on the power. Figure 3 shows $VI F_{SW : cPG 1} (s, ρ)$ for increasing values of $ρ$ for various values of s, the number of sequences.

Figure 3.

Variance inflation factor for the standard stepped wedge as a function of the correlation $ρ$ between cluster averages over time. From top to bottom, the curves for the number of sequences $s = 2, 3, 4, 5, 6, 10, 20$ are shown.

For a small number of clusters, the sample size and power formulas hold only approximately. For continuous, normally distributed outcomes, this is because of the low degrees of freedom, while for binary/rate outcomes, this is because formulas (2) and (4) depend on approximating the statistical distribution of cluster averages by a normal distribution using the central limit theorem. Therefore, we recommend the use of simulation studies to check power and also type I error for designs with a small number of clusters. However, the formulas in this article can be used to see whether feasible designs (i.e. in terms of number of clusters and/or number of measurements) would be worth such further investigation.

Binary and incidence outcomes

As the argumentation underlying the formulas relies on approximating the statistical distribution of cluster averages by the normal distribution using the central limit theorem, the formulas can be used for binary and incidence outcomes as well, provided the number of clusters is sufficiently large. We now discuss what value for $σ_{tot}^{2}$ could be taken for non-small and small samples.

If we take a two-level design and a binary outcome as an example, we can model the trial hierarchically as follows. Each subject j in cluster i has a binary outcome $B_{ij}$ that is 1 with probability $p_{i}$ , when cluster i is in the control condition, and with probability $p_{i} + δ$ , when cluster i is in the invention condition. The probabilities $p_{i}$ vary over the clusters according to some distribution with mean $μ$ and variance $s_{c}^{2}$ . Then, the within-cluster variance in cluster i is $p_{i} (1 - p_{i})$ in the control condition. Over all clusters in the control condition, the expected total variance, that is, the variance of a level 1 unit regardless (unconditional) of the cluster it comes from, is $σ_{tot}^{2} = μ (1 - μ)$ , which can be decomposed into an expected within-cluster variance of $σ_{1}^{2} = E [p_{i} (1 - p_{i})] = μ (1 - μ) - s_{c}^{2}$ and between-cluster variance of $σ_{2}^{2} = var (p_{i}) = s_{c}^{2} = ρ_{12} \cdot σ_{1}^{2} / (1 - ρ_{12})$ (SF9). Because these expectations are averages that hold when the number of clusters is sufficiently large, it may make sense to take the following small-sample strategy. If we think that cluster-specific probabilities $p_{i}$ will in practice mostly be between $p_{\min}$ and $p_{\max}$ , we take within that range the value $p_{close}$ that is closest to $0.5$ , and set $σ_{1}^{2} = p_{close} (1 - p_{close})$ , because that is the maximum value of the within-cluster variances $p_{i} (1 - p_{i})$ in the clusters in control condition. Noting that $σ_{1}^{2} = (1 - ρ_{12}) \cdot σ_{tot}^{2}$ , we set the total variance to $σ_{tot, control}^{2} = p_{close} (1 - p_{close}) / (1 - ρ_{12})$ . The same reasoning could be applied when clusters are in the intervention condition, and thus, the largest (or average) of the two could be taken as $σ_{tot}^{2}$ . This result also holds when there are more than two levels.

For a rate (incidence) outcome, the count (or rate) outcome of subject j in cluster i is $R_{ij}$ that has expected value (average) $λ_{i}$ , and these $λ_{i}$ have mean $λ$ and variance $s_{c}^{2}$ . For a cluster in the control condition, the expected total variance, that is, the variance of a level-1 unit unconditional of the cluster it comes from, is $σ_{tot}^{2} = λ + s_{c}^{2} = λ / (1 - ρ_{12})$ with $σ_{1}^{2} = E [λ_{i}] = λ$ the expected (i.e. average over the clusters) within-cluster variance and $σ_{2}^{2} = var (R_{i}) = s_{c}^{2} = ρ_{12} \cdot σ_{1}^{2} / (1 - ρ_{12})$ the between-cluster variance. A conservative small sample strategy could then be to take $σ_{1}^{2} = λ_{\max}$ , and thus set $σ_{tot, control}^{2} = λ_{\max} / (1 - ρ_{12})$ , if we think that cluster-specific rate $λ_{i}$ will in practice mostly fall between $λ_{\min}$ and $λ_{\max}$ . A similar reasoning applies when a cluster is in the intervention condition and the average or maximum of these two could be taken as $σ_{tot}^{2}$ .

To illustrate sample size versus power calculations, for different endpoints, and small versus large sample considerations, we present two examples in the setting of the CHANGE trial. These were not the final calculations for this trial but similar to those performed.

Example 1: binary outcome in four-level standard stepped wedge

As a first example, we calculate power for hand hygiene compliance (a binary outcome) in a four-level standard stepped wedge using the following assumptions. The duration of the trial only allows four sequences $(s = 4)$ . The target effect size is an improvement from 20% to 35% $(δ = 0.15)$ . It is assumed that the correlation among measurements within a nurse would be rather high $(ρ_{12} = 0.6)$ , while the correlation among nurses within a ward would be smaller $(ρ_{23} = 0.05)$ and that of wards within a nursing home even smaller $(ρ_{34} = 0.01)$ . Based on feasibility, around five observations $(n_{1} = 5)$ per nurse, 15 nurses $(n_{2} = 15)$ per ward, maximally five wards per nursing home $(n_{3} = 5)$ , and four nursing homes $(I = n_{4} = 4)$ would be possible. Given the small number of clusters (four nursing homes), it could make sense to take a conservative approach for the total variance $σ_{tot}^{2}$ as was discussed above. If the level-1 probabilities are closest to 0.5 at $p_{closest} = 0.40$ (instead of 0.35) in the control condition and at $p_{closest} = 0.25$ (instead of 0.20) in the experimental condition, respectively, we take the average of the corresponding variances $σ_{1}^{2} = (0.40 \cdot 0.60 + 0.25 \cdot 0.75) / 2 = 0.21375$ and given that $(1 - ρ_{12}) σ_{tot}^{2} = σ_{1}^{2}$ , the total variance is then $σ_{tot}^{2} = 0.21375 / (1 - 0.6) = 0.534375$ . If different nurses are sampled in each measurement time/period, level-2 and -1 units (nurses and measurements) are not repeated, and using the formulas in Table 2 (second scenario of the four-level standard stepped wedge)

$\begin{matrix} τ^{2} = σ_{4}^{2} + \frac{σ_{3}^{2}}{n_{3}} = (ρ_{34} ρ_{23} ρ_{12} + \frac{(1 - ρ_{34}) ρ_{23} ρ_{12}}{n_{3}}) σ_{tot}^{2} \\ = (0.0003 + \frac{0.0297}{5}) \cdot 0.534375 ≅ 33.345 \cdot 10^{- 4} \end{matrix}$

and

$\begin{matrix} σ^{2} = \frac{σ_{2}^{2}}{n_{3} n_{2}} + \frac{σ_{1}^{2}}{n_{3} n_{2} n_{1}} = (\frac{(1 - ρ_{23}) ρ_{12}}{n_{3} n_{2}} + \frac{1 - ρ_{12}}{n_{3} n_{2} n_{1}}) \\ \cdot σ_{tot}^{2} = (\frac{0.57}{5 \cdot 15} + \frac{0.4}{5 \cdot 15 \cdot 5}) \cdot 0.534375 ≅ 46.313 \cdot 10^{- 4} \end{matrix}$

so that

$\begin{matrix} var (\hat{δ}) = \frac{6}{I \cdot (s - \frac{1}{s})} \cdot σ^{2} \cdot [1 + \frac{\frac{s}{2} \cdot τ^{2}}{σ^{2} + (1 + \frac{s}{2}) τ^{2}}] \\ ≅ \frac{6}{4 \cdot (4 - \frac{1}{4})} \cdot 46.313 \cdot 10^{- 4} \\ \cdot [1 + \frac{\frac{4}{2} \cdot 33.345 \cdot 10^{- 4}}{46.313 \cdot 10^{- 4} + (1 + \frac{4}{2}) \cdot 33.345 \cdot 10^{- 4}}] \\ ≅ 26.967 \cdot 10^{- 4} \end{matrix}$

and $P o w e r (δ) = Φ (δ / \sqrt{\hat{δ}} - z_{1 - α / 2}) = Φ (0.15 / \sqrt{26.967 \cdot 10^{- 4}} - 1.96) = Φ (0.928515) = 0.8234$ . Figure 4 gives an impression of the sensitivity when one of the sample sizes or intracluster correlations is varied while the others are kept constant.

Figure 4.

Impact of cluster size and intracluster correlations at different levels in a “standard” stepped wedge. Power of the 4 level “standard” stepped-wedge trial of Example 1 when varying either one sample size (part a-c) or one intracluster correlation (part d-f) at the specified level while keeping the other sample sizes and intracluster correlations constant. The vertical reference lines indicate the values of sample size and intracluster correlation as in Example 1 $(ρ_{12} = 0.6, ρ_{23} = 0.05, ρ_{34} = 0.01, n_{1} = 5, n_{2} = 15, n_{3} = 5, n_{4} = I = 4)$ .

Example 2: rate outcome in a three-level standard stepped wedge

As second example, we use the variance inflation factor to calculate sample size for infection incidence (a rate). These rates are measured on patients within wards in nursing homes; hence, a 3-level design. We would expect the correlation of infection rates within wards to be high $(ρ_{12} = 0.7)$ , while infections in one ward would not automatically increase infections in another ward within the same nursing home, so a low correlation of ward-infection rates within a nursing home $(ρ_{23} = 0.01)$ . The effect of interest is a decrease from 11 to 5 infections per 1000 resident days $(δ = 6 \cdot 10^{- 3})$ . Anticipating a large number of clusters, we do not take $σ_{1}^{2} = λ_{\max}$ the maximum of the cluster-specific rates per condition but the average of the cluster-specific rate $λ$ for each condition. Thus, $σ_{1}^{2} = (λ_{ctl} + λ_{\exp}) / 2 = 8 \cdot 10^{- 3}$ and $σ_{tot}^{2} = σ_{1}^{2} / (1 - ρ_{12}) ≅ 26.67 \cdot 10^{- 3}$ . The total sample size in an equal size parallel group individually randomized design needed to detect this difference with 0.8 power at a significance level of 0.05 is

$\begin{matrix} N_{tot, PG 1} = 2 \cdot 2 \cdot {(z_{1 - \frac{0.05}{2}} + z_{0.8})}^{2} \cdot σ_{tot}^{2} / δ^{2} ≅ 2 \cdot \\ 2 \cdot 7.85 \cdot 26.67 \cdot 10^{- 3} / {(6 \cdot 10^{- 3})}^{2} = 23, 262 \end{matrix}$

With $n_{1} = 10$ patients per ward and $n_{2} = 4$ wards per nursing home, the variance inflation due to clustering is $VI F_{3} = [1 + (n_{1} - 1)_{12}] [1 + (n_{2} - 1) \cdot {ρ_{23} \cdot n_{1} \cdot ρ_{12} / (1 + (n_{1} - 1) ρ_{12})}] = [7.3] [1 + 3 \cdot {0.01 \cdot 10 \cdot 0.7 / (7.3)}] = 7.51$ . If we assume that only patients are cross-sectionally measured, we are in the second three-level scenario (Table 2) and $ρ = ρ_{12} \cdot n_{1} \cdot [1 + (n_{2} - 1) ρ_{23}] / VI F_{3} = 0.7 \cdot 10 \cdot [1.03] / [7.51] ≅ 0.96$ . Thus, the variance inflation due to the stepped-wedge design is

$\begin{matrix} VI F_{SW : cPG 1} = \frac{3}{2} \cdot \frac{(1 - ρ) \cdot [1 + s ρ]}{(s - \frac{1}{s}) \cdot [1 + \frac{s}{2} ρ]} \\ = \frac{3}{2} \cdot \frac{(1 - 0.96) \cdot [1 + 4 \cdot 0.96]}{(4 - \frac{1}{4}) \cdot [1 + \frac{4}{2} \cdot 0.96]} ≅ 0.026 \end{matrix}$

and the total variance inflation is $VI F_{SW} = 0.026 \cdot 7.51 ≅ 0.20$ . Then, the total sample size needed per measurement time/period is $N_{tot, PG 1} \cdot VI F_{SW} = 23, 262 \cdot 0.20 ≅ 4652$ and the number of nursing homes (clusters) needed $N_{tot, PG 1} \cdot VI F_{SW} / n_{1} n_{2} ≅ 4652 / (10 \cdot 4) ≅ 116$ , so four groups of 29 clusters should suffice.

Programs (SAS^® and MS Excel^®) to facilitate calculations are provided via https://github.com/steventeerenstra/multilevel-stepped-wedge and in the SF (SAS^® program only).

Discussion

Power and sample size formulas for stepped-wedge designs are typically restricted to two or three levels.^7,9 In this article, these formulas were extended to designs with more levels and it was demonstrated that they can either be expressed in terms of variance components or intracluster correlations. The latter expression clearly shows the separate effect of the multilevel structure within time and the stepped-wedge structure over time, similar to what has been shown for other designs but with two levels.^10,11

From the formulas, it can be seen that the different design parameters have the following impact on power and sample size:

$(I)$ : Increasing the number of clustersI increases power (SF7.1).

$(s)$ : Increasing the number of sequencess increases power,^1,4,9 except for the case of the hybrid design and when the total cluster size over all measurements is constant (SF7.1).

$(n_{i})$ : Increasing the sample size at any level increases power (SF7.2, Figure 4). We can achieve any desired power by sufficiently increasing the sample size at any of the levels that are measured as cohort and also by increasing the sample size of the first “cross-sectional” level that is below those levels (SF7.3). In particular, this also applies to the two-level stepped-wedge design, so by increasing the number of cross-sectionally measured subjects, we can reach any power level. This is in contrast with the parallel arm cluster-randomized trial that can plateau (potentially below 80%) if the number of subjects is increased indefinitely.¹² As a consequence, a lack of power due to a limited number of clusters can be compensated by increasing the sample size at particular lower levels. As one can see in Figure 4, not only the sample size at level 3, but also at level 2 can increase the power to 1, but power plateaus below 0.9 when increasing the sample size at level 1. This behavior can most easily be understood in a two-level stepped-wedge trial. As the random effect of a cluster is assumed not to vary over time, the within-cluster comparison is actually a comparison of all subjects before switching to the intervention and after, because the random effect of cluster drops out of the equation. This means that the within-cluster comparisons can become arbitrarily precise with increasing level 1 sample size and this drives the power to 1.

$(ρ_{u, u + 1})$ : Unlike in parallel group cluster-randomized trials, an increase in the intracluster correlation coefficients does not necessarily mean a decrease in power, but actually may increase power in some situations as can be observed in Figure 4. This is because increasing an intracluster correlation $ρ_{u, u + 1}$ influences the power both via the variance inflation factor due to the multilevel structure, $VI F_{p}$ , and via the stepped-wedge design variance inflation factor, $VI F_{SW : cPG 1}$ . The first factor, $VI F_{p}$ , will linearly increase with $ρ_{u, u + 1}$ (Formula (6)). However, $VI F_{SW : cPG 1}$ will generally first increase and then decrease when an intracluster correlations $ρ_{u, u + 1}$ increases. This is because with increasing $ρ_{u, u + 1}$ , the correlation $ρ$ between averages of the same cluster at different times/periods will increase as well (SF7.4), but $VI F_{SW (s)} (ρ)$ will first increase with increasing $ρ$ until some turning point and then decrease as is illustrated in Figure 3. Intuitively, this decrease can be understood because the standard stepped wedge depends on between- and within-cluster comparisons. The between-cluster comparisons will become less precise when the correlation $ρ_{u, u + 1}$ increases, but the precision will be dominated by the within-cluster comparisons for larger $ρ_{u, u + 1}$ . In the within-cluster comparisons, the random effects for clustering drop out, and so increasing $ρ_{u, u + 1}$ will mean that the units at level u before and after the switch will be better correlated, so the within-cluster comparison will be more precise. All in all, an increasing intracluster correlation $ρ_{u, u + 1}$ can thus give different patterns for the variance inflation and power. For example, when the increasing behavior of $VI F_{p}$ dominates for small $ρ_{u, u + 1}$ , while for larger $ρ_{u, u + 1}$ the decreasing behavior of $VI F_{SW (s)}$ dominates, then we would see power first decrease and then increase as a function of $ρ_{u, u + 1}$ . Another typical behavior is that power decreases with increasing $ρ_{u, u + 1}$ , because the increasing behavior of $VI F_{p}$ dominates that of $VI F_{SW (s)}$ for all values of $ρ_{u, u + 1}$ . Both behaviors can be seen in Figure 4.

Both increasing sample size and intracluster correlations coefficients can have unexpected power properties due to the random effects canceling out. Therefore, one may question how realistic it is to assume that the random effects (of a cluster) are not varying over time. This assumption implies that the correlation of two subjects within a cluster is the same whether they are measured at the same time t or at different times. It also implies that the correlation $ρ$ of cluster means at different times only depends on intracluster correlations $ρ_{u, u + 1}$ , that is, correlations at a fixed time (Table 2). For some outcomes in type-2 diabetes, Martin et al.¹³ found this not to be the case in a two-level setting. More empirical research is needed to see whether and when an assumption of constant correlation over time is reasonable; if this is not the case, then power will be lower than what is calculated from our formulas.^11,14

The variance components or intracluster correlation coefficients needed for the calculations should preferably be estimated from studies with similar outcomes and context. These studies should have the same number of levels, but do not need to be stepped wedge, prospective, or randomized. In the absence of such studies, content-matter specialists could provide plausible values, and they could do so either in terms of variance components or intracluster correlations. Given the uncertainties in these educated guesses, we recommend that a range of plausible values for each of these parameters be considered.

Supplemental Material

180914_WEBAPPENDIX_Sample_size_for_cRCT_stepped_wedge_trials_with_more_than_2_levels – Supplemental material for Sample size calculation for stepped-wedge cluster-randomized trials with more than two levels of clustering

Supplemental material, 180914_WEBAPPENDIX_Sample_size_for_cRCT_stepped_wedge_trials_with_more_than_2_levels for Sample size calculation for stepped-wedge cluster-randomized trials with more than two levels of clustering by Steven Teerenstra, Monica Taljaard, Anja Haenen, Anita Huis, Femke Atsma, Laura Rodwell and Marlies Hulscher in Clinical Trials

Supplemental Material

number_of_clusters_and_power_for_2_3_4_level_designs_with_continuous_binary_incidence_outcomes – Supplemental material for Sample size calculation for stepped-wedge cluster-randomized trials with more than two levels of clustering

Supplemental material, number_of_clusters_and_power_for_2_3_4_level_designs_with_continuous_binary_incidence_outcomes for Sample size calculation for stepped-wedge cluster-randomized trials with more than two levels of clustering by Steven Teerenstra, Monica Taljaard, Anja Haenen, Anita Huis, Femke Atsma, Laura Rodwell and Marlies Hulscher in Clinical Trials

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

This work was supported in part by the Netherlands Organisation for Health Research and Development (ZonMw;grant no. R522002009).

Supplemental material

Supplemental material for this article is available online.

Trial described

CHANGE trial (

NCT02817282).

References

Hussey

Hughes

Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials 2007; 28: 182–191.

Girling

Hemming

Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med 2016; 35: 2149–2166.

Thompson

Fielding

Hargreaves

et al

The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs. Clin Trials 2016; 14: 639–647.

Woertman

De Hoop

Moerbeek

et al . Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol 2013; 66: 752–758.

Donner

Klar

Design and analysis of cluster randomization trials in health research. London: Arnold, 2000.

Teerenstra

Moerbeek

van Achterberg et al . Sample size calculations for 3-level cluster randomized trials. Clin Trials 2008; 5: 486–495.

Hemming

Lilford

Girling

AJ.

Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs. Stat Med 2015; 34: 181–196.

Teerenstra

Preisser

et al . Sample size considerations for GEE analyses of three-level cluster randomized trials. Biometrics 2010; 66: 1230–1237.

Heo

Kim

Rinke

et al . Sample size determinations for stepped-wedge clinical trials from a three-level data hierarchy perspective. Stat Methods Med Res 2018; 27: 480–489.

10.

Teerenstra

Eldridge

Graff

et al . A simple sample size formula for analysis of covariance in cluster randomized trials. Stat Med 2012; 31: 2169–2178.

11.

Hooper

Teerenstra

De Hoop

et al . Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med 2016; 35: 4718–4728.

12.

Guittet

Giraudeau

Ravaud

A priori postulated and real power in cluster randomized trials: mind the gap. BMC Med Res Methodol 2005; 5: 25.

13.

Martin

Girling

Nirantharakumar

et al . Intra-cluster and inter-period correlation coefficients for cross-sectional cluster randomised controlled trials for type-2 diabetes in UK primary care. Trials 2016; 17: 402.

14.

Kasza

Hemming

Hooper

et al . Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Stat Methods Med Res. Epub ahead of print 1 January 2017. DOI: 10.1177/0962280217734981.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.16 MB

1.24 MB

0.00 MB