Sage Journals: Discover world-class research

Abstract

New meta-regression methods are introduced that identify whether the magnitude of heterogeneity across study findings is correlated with their standard errors. Evidence from dozens of meta-analyses finds robust evidence of this correlation and that small-sample studies typically have higher heterogeneity. This correlated heterogeneity violates the random-effects (RE) model of additive and independent heterogeneity. When small studies not only have inadequate statistical power but also high heterogeneity, their scientific contribution is even more dubious. When the heterogeneity variance is correlated with the sampling-error variance to the degree we find, simulations show that RE is dominated by an alternative weighted average, the unrestricted weighted least squares (UWLS). Meta-research evidence combined with simulations establish that UWLS should replace RE as the conventional meta-analysis summary of psychological research.

Keywords

meta-analysis heterogeneity small samples meta-regression random effects open data open materials preregistered

Meta-analysis is sometimes seen to be at the top of the “pyramid of evidence,” and random effects (RE) is the canonical meta-analysis model of psychological research (Ioannidis, 2016; Owens et al., 2010). Large-scale surveys and preregistered multilab replications (PMRs) have revealed that publication-selection bias, high heterogeneity, and low statistical power are the central challenges to the credibility of psychological research (Fraley & Vazire, 2014; Klein et al., 2018; Open Science Collaboration, 2015; Stanley et al., 2018, 2022). Simulation studies establish that RE typically produce large biases and high rates of false positives when there is publication-selection bias (Bom & Rachinger, 2019; Carter et al., 2018; Henmi & Copas, 2010; Stanley, 2017; Stanley & Doucouliagos, 2014, 2015; Stanley et al., 2017; van Assen & van Aert, 2015). RE’s large biases and high rates of false positives are corroborated in applications when RE results are compared with PMRs (Kvarven et al., 2020). The central purpose of this article is to demonstrate that the unrestricted weighted least squares (UWLS) weighted average should routinely replace RE in psychology meta-analyses, regardless of whether there is publication bias.

If, as we show below, small-sample studies are more heterogeneous, then three major challenges to psychology (heterogeneity, low power, and publication-selection bias) emanate largely from a single source. Furthermore, when heterogeneity is correlated with a study’s standard errors, we show that RE estimates are dominated, statistically, by an alternative meta-analysis weighted average—the UWLS (Stanley & Doucouliagos, 2015, 2017). Unlike RE, UWLS better accommodates correlated heterogeneity because it is built on a model of multiplicative heterogeneity in which heterogeneity is proportional to the variance of each study.

To make this case, we need to show that UWLS is expected to have superior statistical properties relative to RE even when there is no publication bias. Our simulations show that if standard errors and heterogeneity are correlated in a meta-analysis, then UWLS will dominate RE in all cases, with or without publication bias (see Table 3). But what evidence is there that standard error and heterogeneity are typically correlated in an area of psychological research? We offer preregistered meta-research evidence that standard errors and heterogeneity are typically correlated within a psychology meta-analysis and across dozens of meta-analyses. However, before we can conduct this meta-meta-analysis and gather evidence of widespread correlation of standard error with heterogeneity, we must first introduce the meta-regression tests (variance ratio meta-regression analysis [VR-MRA]) that can identify whether standard errors and heterogeneity are in fact correlated in a meta-analysis.

After an illustration, we introduce these new meta-regression tests for a correlation of standard errors and heterogeneity. We then apply these new tests to dozens of meta-analyses to evaluate whether there is evidence of a predominant correlation between standard errors and heterogeneity in psychology. Only after offering evidence supporting these two important lines of reasoning do we directly address our main thesis: that UWLS statistically dominates RE in typical application.

As an illustrative example, consider the once highly regarded theory of ego depletion. Ego depletion posits that people have a limited supply of willpower and it decreases with overuse. Ego depletion is one of the theories that have come into question because of the failure of a PMR.¹ Hagger et al.’s (2016) PMR found a scientifically and statistically negligible ego-depletion effect (d = 0.04). This PMR is so large that any scientifically non-null effect can be rejected with low rates of both Type I and Type II errors (Witte & Zenker, 2017). Therefore, all statistically significant ego-depletion findings (45%) are false positives (Carter et al., 2015).

However, conventional meta-analysis misses the underlying weakness of ego-depletion experiments altogether. For example, strong evidence of a medium to large ego-depletion effect, d = 0.62 (confidence interval [CI] = [0.57, 0.67]), was published in Psychological Bulletin (Hagger et al., 2010).² Later, the RE estimate from 116 ego-depletion studies remains of notable size and clear statistical significance—d = 0.426 (CI = [0.339, 0.513]; Carter et al., 2015). Our new meta-regression methods, see Equation 2 below, finds clear evidence that heterogeneity is correlated with the standard errors of ego-depletion effects. Thus, for ego depletion, the RE model may not be valid, and its results should be treated with caution.

In this article, we develop new meta-regression methods to identify whether heterogeneity is associated with standard errors and thereby with sample size both within an area of research and across many meta-regressions when combined into a meta-meta-regression. If heterogeneity is correlated with standard errors, the RE model is no longer valid because it assumes that random heterogeneity is independent of sampling errors. We applied these new meta-regression methods to a preregistered group of 53 meta-analyses and found clear and robust evidence that heterogeneity and standard errors are generally correlated in psychology. Finally, we offer new simulations grounded on the correlation revealed by this meta-research evidence that shows UWLS statistically dominating RE whether or not there is publication-selection bias. These results have substantial implications for practice because they compel the replacement of RE by UWLS as the conventional method to summarize systematic reviews and meta-analyses of psychological research.

RE Versus Correlated Heterogeneity

The RE model assumes that effect sizes, such as Cohen’s d, d_i, are normally distributed as:

$d_{i} = δ + θ_{i} + ε_{i} i = 1, 2, . . . k .$ (1)

Random heterogeneity, $θ_{i}$ , is assumed to be normally distributed with variance, τ² and independent of the sampling errors, $ε_{i},$ and its variance, $σ_{i}^{2}$ . Thus, the RE model implies that squared deviations from the mean, ${(d_{i} - δ)}^{2}$ , will be randomly and independently distributed around $σ_{i}^{2}$ + τ².

Regression analyses of squared deviations have a long history as tests of the assumptions of a statistical model. Examples include the White, the Park, and the Glejser tests of homoskedasticity (Glejser, 1960; Park, 1966; White, 1980). Tests of individual variances are based on specific regression models. A systematic pattern among these squared deviations is treated as evidence that the assumed model is invalid (heteroskedastic) and its standard errors biased. We use meta-regression analysis (MRA) to investigate whether the observed heterogeneity, ${(d_{i} - δ)}^{2}$ , is positively correlated with $S E_{i}$ , the estimate of $σ_{i}$ . Such a correlation, if found, is a violation of the RE model and provides evidence that small-sample studies are more heterogeneous.

These considerations lead to a meta-regression of the square root of the variance ratio (VR) on a study’s $S E_{i}$ with a test of whether heterogeneity is correlated with SE (H₀: $γ_{1}$ = 0):

$\sqrt{V R_{i}} = \sqrt{{(d_{i} - \hat{δ})}^{2} / ({\hat{τ}}^{2} + S E_{i}^{2})} = γ_{0} + γ_{1} S E_{i} + u_{i} .$ (2)

For the derivation and statistical rationale of this VR-MRA (Equation 2), see Section I of the Supplemental Material available online. Next, we report simulations that establish the validity of VR-MRA as a test of correlated heterogeneity and thereby a test of the validity of the RE model. In this article, the central role of this new test, VR-MRA, is to establish the widespread correlation of standard errors and heterogeneity by applying VR-MRA to a preregistered meta-meta-analysis (see Meta-research Evidence section below).

Simulations

We conduct several simulations in which the key research dimensions (heterogeneity, the distribution of sample sizes, and mean effect sizes) are set to reflect the typical values found in large surveys of psychological research (Fraley & Vazire, 2014; Stanley et al., 2019). Supplement II.A in the Supplemental Material provides full details of simulations of VR-MRA using the core of past simulations’ code and design, previously posted on OSF (https://osf.io/eh974/; Stanley, 2019) and employed in studies of psychology (Stanley & Doucouliagos, 2022; Stanley et al., 2021) using other meta-analysis methods.

The distribution of sample sizes in the primary studies {15, 35, 50, 100, or 200} mirrors a large survey of personality and social-psychology experiments (Fraley & Vazire, 2014). Following Stanley et al. (2021), each simulated meta-analysis has its mean effect (in terms of Cohen’s d) drawn for a beta distribution (1, 3) to match the effect sizes found in surveys of psychology. For further details, see Supplement II.A in the Supplemental Material. Individual studies comprising a meta-analysis are drawn from a population with the same mean. However, following the RE model, the true effect of each study is a random normal draw from a population with the same mean and a fixed standard deviation, τ, independent of sampling errors. Stanley (2019) and Stanley et al. (2021) set heterogeneity, τ = {.1, .2, .3, .4, .5d}, to approximate what is seen among the 53 meta-analyses of psychology experiments, measured as Cohen’s d, from Stanley et al. (2018). We focused on these 53 meta-analyses because they contain a large number of experimental studies (3,541 in total) that are all measured in comparable standardized-mean-difference units. We filed a preanalysis plan to use these same 53 meta-analyses as a test of the RE model and correlated heterogeneity.³ Each of the 10,000 simulated meta-analyses has heterogeneity, randomly and uniformly distributed from τ = .1 to .5, independent of sampling errors. In all simulation conditions but one, τ is held constant for all studies in a meta-analysis, forcing heterogeneity to be the same for all studies regardless of their sample size (see Supplement II.A in the Supplemental Material).

Table 1 reports the mean, Type I error rate, and power of the estimated VR-MRA slope coefficient, ${\hat{γ}}_{1}$ . The first column, 1: RE Model Imposed, reports these statistics for ${\hat{γ}}_{1}$ , from Equation 2, when there are 10,000 replications of VR-MRA and when heterogeneity is fixed for all studies in a meta-analysis regardless of their sample size or standard errors. Following past simulation studies and to encompass the typical number of effects per meta-analysis, we assumed that each meta-analysis contained either k = 20 or 80 studies. However, out of an abundance of caution, we also simulate k = 10, 30, and 160 to cover the large majority of meta-analyses in psychology.⁴

Table 1.

VR-MRA Simulations: 10,000 Replications

N studies	1: RE model imposed		2: Root n: RE model imposed		3: Correlated heterogeneity		4: Publication bias
k	M	Type I	M	Type I	M	Power	M	Power
10	.0824	.0348	.0006	.0335	.9122	.0573	1.123	.0625
20	.0887	.0418	.0013	.0371	1.047	.1063	1.096	.1325
30	.1029	.0431	.0013	.0369	1.080	.1395	1.110	.1662
80	.1196	.0495	.0016	.0379	1.153	.3239	1.163	.4247
160	.1236	.0530	.0016	.0326	1.170	.5896	1.150	.6310
Average	.1034	.0444	.0013	.0356	1.0724	.2433	1.128	.2834

Note: M is the average value of the VR-MRA slope coefficient across 10,000 simulations. Type I is the Type I errors of rejecting a zero VR-MRA slope when heterogeneity is independent of standard errors and the RE model is valid. RE model imposed forces the RE model to be valid. Root n uses the square root of a study’s per-group sample size as the independent variable. Correlated heterogeneity assumes that τ is correlated with standard errors. The publication-bias condition also imposes the RE model on the simulations; however, half of the reported studies are selected to be statistically positive. VR-MRA = variance ratio meta-regression analysis; RE = random effects.

Note that ${\hat{γ}}_{1}$ has a small positive bias (≈.103) when heterogeneity is, in fact, independent of standard errors. As discussed in Section II of the Supplemental Material, this positive average is caused by the term of $d_{i}^{2}$ in standard errors’ formula, which, in turn, is correlated with the numerator of VR. In Column 2, Root n: RE Model Imposed, we replace the independent variable, $S E_{i}$ , with the square root of the study’s sample size, $n_{i}$ , which we know to be independent of $d_{i}^{2}$ . As a result, this bias vanishes to inconsequential rounding errors, and its Type I errors are always well within their .05 nominal level. For our purposes, maintaining Type I errors within their nominal 5% level is our central concern, and the VR-MRAs with $S E_{i}$ achieve this (see Column 1 of Table 1). We use standard errors because in practice, we do not always know the corresponding samples sizes. In general, standard errors are routinely reported in meta-analyses, whereas sample sizes may not be.

Nonetheless, VR-MRA has an important limitation when applied to individual meta-analyses—low statistical power. Column 3: Correlated Heterogeneity reports the results of the same simulation experiment but where heterogeneity is forced to be moderately correlated with standard errors. In particular, τ is set at {.4, .3, .3, .3, .15 d} for sample sizes {15, 35, 50, 100, 200}, respectively. To be clear, small- and large-sample studies can have very different true effects relative to other small- and large-sample studies as dictated by random draws from a fixed normal distribution. However, studies with the largest sample size will tend to have less heterogeneity than those with the smallest sample sizes, whereas the other 60% have the same heterogeneity variance found to be typical in psychology. With this moderately correlated heterogeneity, the average estimates of $γ_{1}$ increases by nearly 10-fold; see Column 3 of Table 1. However, the probability of detecting this correlated heterogeneity is only about 32% for a meta-analysis of 80 studies and about 59% when there are 160 studies. For the median number of studies in psychology meta-analyses (k = 31), power is quite low (about 14%). As staunch advocates of the importance of adequate statistical power, we cannot ignore the low power of VR-MRA (Ioannidis et al., 2017; Stanley et al., 2018, 2022).

Thus, VR-MRA should not be applied to the average meta-analysis alone, but only to large meta-analyses or across many meta-analyses. Nevertheless, knowing that a test has low power and interpreting the findings, accordingly, may still allow using the test more broadly. There is precedence for this practice in tests that probe selective-publication bias. For example, the Egger test has comparable power, and it is frequently used (Egger et al., 1997; Stanley et al., 2021). However, results of the Egger test do not permit conclusive statements (Lau et al., 2006). We caution inappropriate overinterpretation of the VR-MRA test if applied to single meta-analyses.

Illustration

Returning to Carter et al.’s (2015) ego-depletion meta-analysis, VR-MRA provides clear evidence that heterogeneity is correlated with standard errors, ${\hat{γ}}_{1}$ = 1.90 (CI = [0.67, 3.13]; p < .01), and thereby the RE summary of ego-depletion meta-analysis should be treated with some reservation.⁵ Recall, VR-MRA tends to have low power; thus, it is all the more remarkable that we found clear statistical evidence that small ego-depletion studies are more heterogeneous. In Section IV of the Supplemental Material, we discuss in detail exactly how UWLS, weighted average of the adequately powered (WAAP), and VR-MRA are calculated for ego depletion using both STATA and R.

The practical implication of this example is to not trust RE and instead use alternate methods that are not based on the RE model. The UWLS is such a meta-analysis summary estimator. It is neither fixed effect (FE) nor RE. UWLS and FE always give the same point estimate, but UWLS automatically accommodates heterogeneity when present. Like RE and FE, UWLS is an inverse variance weighted average. However, RE’s inverse variance weights are $\frac{1}{S E_{i}^{2} + {\hat{τ}}^{2}}$ , whereas UWLS weights = $\frac{1}{γ S E_{i}^{2}}$ for $γ$ > 0. The parameter, $γ$ , accommodates heterogeneity but is missing in FE (Stanley & Doucouliagos, 2015, 2017). That is, instead of assuming that heterogeneity is additive and independent of the sampling-error variance, $S E_{i}^{2}$ , UWLS allows the heterogeneity variance to vary proportionately with $S E_{i}^{2}$ . Thus, UWLS offers a model in which heterogeneity is correlated with standard error. When small-study findings are more heterogeneous, hence unreliable, UWLS fittingly down-weights them relative to RE: $\frac{1}{γ S E_{i}^{2}}$ versus $\frac{1}{S E_{i}^{2} + {\hat{τ}}^{2}}$ . For a more detailed discussion of UWLS, see UWLS section below. For ego depletion, UWLS = 0.347 (CI = [0.263, 0.431])—see Section IV in the Supplemental Material. Because UWLS accommodates correlated heterogeneity, UWLS estimates mean effect to be notably smaller than RE.

The WAAP is a version of UWLS that down-weights small studies even further. Not only are small ego-depletion studies inadequately powered, VR-MRA offers evidence that they are more unreliable as well. Only one ego-depletion study is adequately powered (power > 80%), and only 10 of 116 have power greater than 50%, retrospectively calculated.⁶ WAAP uses Cohen’s (1988) widely accepted convention of 80% to define adequate power (Stanley et al., 2017). WAAP = UWLS when UWLS is calculated only on those studies with retrospective power greater than 80%.⁷ For ego depletion, WAAP = 0.100 (CI = [−0.096, 0.295]). Likewise, UWLS calculated on only those ego-depletion studies with at least 50% power is not statistically significant, d = 0.193 (CI = [−0.048, 0.435]). When RE is dominated by studies with retrospective power less than 50%, such as ego-depletion research, Stanley et al. (2022) showed that RE findings are not likely to be credible. To properly reflect both the low power and the unreliability (i.e., high heterogeneity) of small studies, this illustration demonstrates how correlated heterogeneity makes it imperative to down-weight small studies much more than RE. In the UWLS section below, we report the statistical properties of RE, UWLS, and WAAP when there is correlated heterogeneity. First, however, we demonstrate how correlated heterogeneity is widespread in psychological research.

Meta-research Evidence

To overcome VR-MRA’s low power in individual meta-analyses, we conducted a meta-analysis of many VR-MRA results. Meta-analysis is often regarded as the best way to increase the statistical power of individual studies and to resolve the ambiguity of mixed findings across studies. Jackson and Turner (2017) showed that five or more studies “reasonably consistently achieve powers from random-effects meta-analyses that are greater than the studies that contribute to them” (p. 280). To increase the power of individual VR-MRA results, we combined the VR-MRA findings from these 53 “preregistered” meta-analyses and used RE meta-analysis to summarize their aggregate evidence. Our purpose for seeking meta-research evidence of the correlation of standard errors and heterogeneity is to establish that this correlation is widespread among meta-analyses and to gauge its magnitude to accurately calibrate simulations that compare the statistical properties of RE and UWLS—see UWLS section below.

Consistent with our simulations and correlated heterogeneity (Column 3 of Table 1), we found that 18 (or 34%) of these VR-MRAs have a statistically positive estimate of $γ_{1}$ . Across these 53 VR-MRAs, the RE estimate of $γ_{1}$ is 1.46 (z = 2.53; p ≈ .01; CI = [0.33, 2.59]). Closer inspection of these 53 meta-analyses uncovers that the two meta-analyses that provide the strongest evidence against RE (i.e., largest ${\hat{γ}}_{1}$ ) likely contain coding errors because some |d_is| > 10. To be sure that our evidence is not contaminated by a few outliers, we removed the upper and lower 5% of these $γ_{1}$ estimates. Across the remaining 47 meta-analyses, the RE estimate of $γ_{1}$ is notably smaller (1.06) yet has a clearer statistical signal (z = 5.52; p < .001; CI = [0.68, 1.44]). To further ensure that this meta-research evidence of correlated heterogeneity is not somehow spurious, we conducted another set of simulation experiments, this time of RE meta-meta-analyses of VR-MRA estimates below. For further details, see Supplement II.B in the Supplemental Material.

Meta-meta-analysis simulations

To the same simulation design and structure used for VR-MRA estimates and reported in Table 1, we added a loop that collects 50 random VR-MRA findings at a time and calculates a conventional RE estimate of these 50 meta-regression estimates of $γ_{1}$ . Just as the previous simulations of VR-MRA, random subject data are first generated; each study’s effect size and standard errors are calculated, forming meta-analyses of different sizes; and $γ_{1}$ and its standard errors are estimated for each meta-analysis. These meta-meta-analysis simulations differ by further collecting 10,000 random groups of 50 VR-MRA results and calculating 10,000 RE meta-VR-MRAs. As a result, each simulation entails 4 billion random subjects, 500,000 VR-MRAs, and 10,000 RE meta-VR-MRAs, each one of which imitates the typical conditions found among these same 53 experimental meta-analyses.

Column 1 of Table 2 reports the average RE estimate and z value across 10,000 replications of 50 VR-MRAs in which heterogeneity is independent of sampling error and its variance is forced to be the same for all studies in a meta-analysis. Table 2 also reports the largest and smallest values for any of the RE estimates (and their z values) of 50 VR-MRAs found among 10,000 replications. In this way, we can better evaluate whether our meta-research findings for a particular RE estimate of VR-MRAs are merely chance or genuine evidence of correlated heterogeneity. Not a single RE estimate of $γ_{1}$ from 50 VR-MRAs in 10,000 replications is nearly as large as what we found (1.06). Likewise, the z value for RE in our meta-research (5.52) is larger than the largest z value in 10,000 replications when heterogeneity is independent of standard errors—see Column 1, Table 2. In contrast, when these simulations assume that small studies have higher heterogeneity than large studies, the results that we found for either 53 or 47 Psychological Bulletin meta-analysis is quite consistent with what is seen in these meta-VR-MRA simulations—see Column 2, Table 2. Thus, we have clear meta-research evidence of correlated heterogeneity in psychology.

Table 2.

Simulations of 10,000 Random-Effects Meta-Analyses of VR-MRAs

		1: RE model imposed		2: Correlated heterogeneity
kk	Statistics	RE estimate	z	RE estimate	z
50	M	0.1139	0.8685	1.106	7.743
50	SD	0.1290	0.9918	0.1405	1.147
50	Minimum	−0.3592	−3.011	0.4878	3.138
50	Maximum	0.6076	4.406	1.610	11.88
16	M	0.1595	0.6617	1.104	4.306
16	SD	0.2316	0.9837	0.2523	1.078
16	Minimum	−0.7015	−3.293	0.2121	0.6680
16	Maximum	1.065	4.530	2.098	8.680

Note: kk is the number of VR-MRA meta-regression analyses summarized by each RE meta-analysis. z is the RE z value. RE model imposed forces the RE model to be valid. Correlated heterogeneity reports simulations in which τ is correlated with standard errors. VR-MRA = variance ratio meta-regression analysis; RE = random effects.

Robustness of the meta-research evidence

For the sake of robustness and further independent validation, we investigate a second set of meta-analyses. Kvarven et al. (2020) conducted a systematic review of all meta-analyses that have an associated PMR and found 15 such pairs. The RE estimate of only 15 VR-MRA estimates will have much less power. Nonetheless, the RE estimate of $γ_{1}$ from these 15 VR-MRA estimates is 1.24 (z = 3.96; p < .001; CI = [0.63, 1.86]). Again, we corroborated the validity of this meta-research evidence by conducting yet another simulation experiment in which the RE model is imposed. This RE estimate, 1.24, is larger than any of the 10,000 replications, each with 16 VR-MRA estimates of $γ_{1}$ , when the RE is true. As before, these meta-research findings are quite consistent with what is seen among the simulations of correlated heterogeneity—see Table 2, bottom half.

Finally, as another robustness check, we provide further meta-research evidence that heterogeneity is correlated with standard errors from an alternate MRA model of RE variances in Section III of the Supplemental Material:

$T V_{i} = {(d_{i} - \hat{δ})}^{2} = β_{0} + β_{1} S E_{i}^{2} + ν_{i} .$ (3)

See Section III of the Supplemental Material for a discussion of the total variance meta-regression analysis (TV-MRA) model (Equation 3), its application to these sets of meta-analyses as reported above, and the corresponding simulation findings of 10,000 RE meta-analyses of collections of both 50 and 16 randomly generated meta-regressions of this alternative model of RE’s variance. Evidence from TV-MRA model (Equation 3) supports the above evidence of a correlation between heterogeneity and SE in psychology—see the Section III in the Supplemental Material.

Discussion

Combining evidence across meta-regression tests consistently supports the hypothesis that heterogeneity is correlated with standard errors, thereby inconsistent with the RE model. This correlation is also corroborated in the aggregate by a correlation between the median standard error and RE estimated heterogeneity, ${\hat{τ}}^{2}$ , across 200 Psychological Bulletin meta-analyses (r = .2; p < .01; Stanley et al., 2018).

But why would small studies be found to be more heterogeneous? There are several likely and overlapping reasons. Researcher flexibility in choosing methods, protocols, and outcome measures provides the variation across which a statistically significant result can be selected. Such researcher flexibility generates heterogeneity, clearly seen in large differences of heterogeneity found among tightly controlled multilab replications versus meta-analyses (Klein et al., 2018; Kvarven et al., 2020; Linden & Hönekopp, 2021). As seen in many simulations, small studies require more intensive selection across this heterogeneity to achieve statistical significance (Stanley & Doucouliagos, 2014). When 50% of the reported results have been selected to be statistically significant, the average value of VR-MRA’s slope coefficients, ${\hat{γ}}_{1}$ , is quite consistent with simulation results when correlated heterogeneity is imposed—see Column 4 of Table 1. In 100,000 replications of this simulation design in which the RE model is imposed on individual studies and there is 50% selection for statistical significance, we found that the smallest quintile of studies reported results with twice the heterogeneity ( $average {\hat{τ}}^{2}$ = 0.1465) as the largest quintile ( $average {\hat{τ}}^{2}$ = 0.0733). This factor of 2 is also seen among meta-analyses of health and medicine (IntHout et al., 2015). Thus, we know that researcher flexibility combined with selection for statistical significance can be a cause of higher heterogeneity in small studies.

However, other forces are also likely to be at work. By their very nature, exploratory studies are likely to find notably different effect sizes from one exploration to the next. Small studies may employ lower-quality standards with higher risk of bias (IntHout et al., 2015, p. 866), and less reliability generates higher heterogeneity. Correlated heterogeneity may be caused by a mixture of different types of “replications” that typically comprise meta-analyses. Several researchers have classified replications as “conceptual” versus “direct” (or “close”; Hedges & Schauer, 2019; Linden & Hönekopp, 2021; Schauer & Hedges, 2020; S. Schmidt, 2009). Direct or close replications involve the use of the same experimental procedures in an effort “to replicate an earlier study as faithfully as possible” (Linden & Hönekopp, 2021, p. 360). In contrast, studies that are regarded as conceptual replications use different methods to explore the boundaries of theory, widen the field’s understanding, and assist in developing new theory (S. Schmidt, 2009). Thus, the results from conceptual replications are expected to produce higher heterogeneity than direct replications (Linden & Hönekopp, 2021).

Furthermore, direct replications often use large sample sizes (e.g., the Open Science Collaboration and Many Labs projects) to ensure adequate power. When not adequately powered, a lack of replication success would be quickly dismissed as the expected result of low power rather than attributed to the original experiment. Conversely, conceptual replications, which are more numerous, face no such demands, as demonstrated by the low power that many surveys of psychology have found (Cohen, 1962; Fraley & Vazire, 2014; Maxwell, 2004; F. L. Schmidt & Oh, 2016; Stanley et al., 2018). In fact, small samples might be advantageous for conceptual replications:

A safer strategy might be to “salami-slice” one’s resources to generate more studies which, with sufficient analytical flexibility, will almost certainly produce a number of publishable studies. . . . Authors may therefore (consciously or unconsciously) conduct a larger number of smaller studies, . . . rather than risk investing their limited resources in a smaller number of larger studies. (Vankov et al., 2014, pp. 1–2)

Thus, meta-analyses that include largely conceptual replications along with a few direct replications would be expected to produce higher heterogeneity in small studies than in large ones.

Needless to say, VR-MRA has limitations beyond the low power in single meta-analyses discussed above. Low power will be exacerbated in fields that have few studies per meta-analysis, as seen in some fields of medicine and health psychology. VR-MRA, as a regression, requires notable variation of its independent variable (standard errors or sample sizes) in a meta-analysis to be estimated reliably. Nevertheless, combining many VR-MRAs can be informative if most have notable variation in sample sizes.

Implications for Practice

Since Cohen (1988), statistical power has been universally acknowledged as a central determinant of a study’s scientific contribution. “Studies with low statistical power produce inherently ambiguous results because they often fail to replicate” (Psychonomic Society, 2012, p. 1). “You should routinely provide evidence that your study has sufficient power to detect effects of substantial interest (e.g. see Cohen, 1988)” (American Psychological Association [APA], 2010, p. 30). Yet the majority of studies in psychology are underpowered (Stanley et al., 2018). For decades, many psychologists have recognized that small studies are of little scientific value (APA, 2010; Cohen, 1962, 1988; Fraley & Vazire, 2014; Maxwell, 2004; Psychonomic Society, 2012; Rossi, 1990), and areas of research dominated by small studies (power < 50%) are associated with highly biased and falsely positive RE findings (Stanley et al., 2022):

Unless psychologists begin to incorporate methods for increasing the power of their studies, the published literature is likely to contain a mixture of apparent results buzzing with confusion. . . . Not only do underpowered studies lead to a confusing literature but they also create a literature that contains biased estimates of effect sizes. (Maxwell, 2004, p. 161)

When small studies systematically produce highly heterogeneous findings, their scientific contribution further erodes. Recent surveys of psychology meta-analyses found substantial heterogeneity among study findings, average τ > 0.3 d—(Linden & Hönekopp, 2021, p. 7; Stanley et al., 2018).⁸ When small studies have yet higher heterogeneity than average, small to medium results will be overwhelmed by uncertainty. Thus, the results from small-sample studies should be routinely treated with skepticism.

Heterogeneity correlated with standard errors also has implications for the practice of systematic reviews and meta-analysis. It has long been known that RE overweight small studies and thereby is highly biased when there is publication selection bias (Carter et al., 2018; Henmi & Copas, 2010; Poole & Greenland, 1999; Stanley & Doucouliagos, 2014, 2015). When heterogeneity is correlated with standard errors, the RE model is invalid, and RE will further overweight unreliable small-study findings. Fortunately, there is a simple alternative meta-analysis approach, the UWLS, that automatically accommodates correlated heterogeneity and gives unreliable and potentially biased small studies less weight.

Unrestricted Weighted Least Squares

As discussed above, UWLS is a simple weighted average that allows heterogeneity to be correlated with standard errors. UWLS and FE have identical point estimates, but UWLS standard errors and CIs are larger when there is heterogeneity (Stanley & Doucouliagos, 2015, 2022). UWLS is easily calculated by a simple regression of the standardized effect size ( $d_{i} / S E_{i}$ ) on its precision ( $1 / S E_{i}$ ) when there is no intercept:

$t_{i} = d_{i} / S E_{i} = α (1 / S E_{i}) + u_{i} i = 1, 2, . . ., k .$ (4)

UWLS is the estimated slope, $\hat{α}$ . All regression software will automatically calculate UWLS, its standard errors, test statistics, and CIs after properly adjusting for heterogeneity.⁹

Stanley et al. (2017) offered a variation of UWLS that uses only those studies with 80% or higher power, thereby giving the smallest studies no weight at all. Simulations show that this WAAP is less biased than other weighted averages (specifically, RE, FE, and UWLS) when there is publication-selection bias, and the bias reduction can be quite large in application (Ioannidis et al., 2017; Stanley et al., 2017).

However, when the RE model is imposed on the simulation structure and there is no publication bias, these simulations show that RE has slightly lower mean squared error (MSE) than UWLS (Bom & Rachinger, 2019; Stanley & Doucouliagos, 2014; Stanley et al., 2017). When there is publication-selection bias, these same simulations show that UWLS has notably smaller MSE than RE. What remains to be investigated is whether UWLS will dominate RE in all cases when heterogeneity is correlated with standard errors, as typically seen in psychology. Next, we present a new simulation study that considers the consequences of correlated heterogeneity on the statistical properties of RE, UWLS, and WAAP.

Simulations

Our final simulation study closely followed the simulation design of VR-MRA, reported above and detailed in Section II of the Supplemental Material. The code for the core of the design was posted online in 2019 and used in other studies (Stanley, 2019; Stanley & Doucouliagos, 2022; Stanley et al., 2021). The most influential research dimensions are calibrated from large surveys of psychological research (Fraley & Vazire, 2014; Stanley et al., 2018)—for greater details, see Section II of the Supplemental Material.

The central difference of these simulations from those previously published is that we assume that heterogeneity, τ, is correlated with standard errors to the same degree as seen in our meta-meta-analysis results. In particular, heterogeneity, τ = {.4, .3, .3, .3, .15}, is assumed to be associated with sample sizes, n_i = {15, 35, 50, 100, or 200}, respectively, per group. This choice of the distribution of heterogeneity across samples sizes was selected because it produces an average VR-MRA coefficient, ${\hat{γ}}_{1}$ , quite close to what is seen in our meta-meta-analysis of psychological experiments and their simulations (e.g., Column 2, Table 2). Otherwise, the RE model is forced on these simulations. Specifically, random heterogeneity, $θ_{i}$ , and random sampling error, $ε_{i}$ , are generated normally, independently, and additively as the RE model demands—recall Equation 1. However, because these simulations force heterogeneity’s standard deviation, τ, to be correlated with standard errors, the data-generating process is somewhere between the RE and UWLS models.

In the upper half of Table 3, we report the bias, MSE, and Type I error rate (or power) for RE, UWLS, and WAAP when there is no publication bias; the lower half includes the same information after an assumption that 50% of the reported results have gone through a process of selection for statistical significance. As Table 3 shows, UWLS has smaller MSE and Type I errors than RE in all cases in which heterogeneity is correlated with standard errors and in which there is no publication-selection bias. Biases are inconsequential rounding errors. When there is publication-selection bias (Table 3, bottom half), UWLS improvement over RE is much greater. With 50% publication-selection bias, UWLS’s MSE is only 59% as large as RE, and bias is 73% of RE bias (Table 3, bottom row), and WAAP is better still.

Table 3.

Correlated Heterogeneity: Statistical Properties of RE, UWLS, and WAAP

Bias					MSE			Type I error/power
δ	k	RE	UWLS	WAAP	RE	UWLS	WAAP	RE	UWLS	WAAP
No selection for statistical significance								Type I error/power
0	10	0.0006	−0.0014	−0.0013	0.01058	0.00947	0.00949	0.1060	0.0630	0.0600
0	20	−0.0020	−0.0016	−0.0016	0.00560	0.00476	0.00476	0.0800	0.0620	0.0620
0	40	0.0026	0.0023	0.0023	0.00251	0.00208	0.00208	0.0770	0.0540	0.0540
0	80	0.0007	0.0003	0.0003	0.00136	0.00112	0.00112	0.0800	0.0630	0.0630
0	160	0.0008	0.0008	0.0008	0.00063	0.00053	0.00053	0.0580	0.0570	0.0570
Average		0.0005	0.0001	0.0001	0.00414	0.00359	0.00360	0.0802	0.0598	0.0592
β	10	−0.0003	−0.0022	−0.0028	0.01076	0.00917	0.01085	0.5875	0.5447	0.3812
β	20	−0.0009	−0.0031	−0.0040	0.00527	0.00454	0.00550	0.6731	0.6662	0.6207
β	40	−0.0013	−0.0036	−0.0048	0.00265	0.00225	0.00277	0.7526	0.7620	0.7601
β	80	−0.0016	−0.0037	−0.0052	0.00135	0.00116	0.00154	0.8241	0.8307	0.8294
β	160	−0.0017	−0.0040	−0.0055	0.00067	0.00060	0.00088	0.8687	0.8745	0.8739
Average		−0.0012	−0.0033	−0.0045	0.00414	0.00354	0.00431	0.7412	0.7356	0.6931
50% selection for statistical significance								Type I error/power
0	10	0.2618	0.2092	0.1967	0.07455	0.04924	0.04305	0.7014	0.4080	0.2749
0	20	0.2626	0.2087	0.2019	0.07201	0.04627	0.04297	0.9571	0.7951	0.7486
0	40	0.2633	0.2086	0.2060	0.07093	0.04489	0.04372	0.9999	0.9907	0.9855
0	80	0.2637	0.2086	0.2084	0.07027	0.04421	0.04410	1.000	1.000	1.000
0	160	0.2639	0.2088	0.2087	0.07002	0.04393	0.04391	1.000	1.000	1.000
Average		0.2630	0.2088	0.2043	0.07156	0.04571	0.04355	0.9317	0.8388	0.8018
β	10	0.1672	0.1236	0.0928	0.03874	0.02458	0.01997	0.9336	0.8177	0.5069
β	20	0.1693	0.1240	0.0917	0.03589	0.02150	0.01611	0.9943	0.9687	0.8410
β	40	0.1711	0.1252	0.0917	0.03474	0.02019	0.01466	10.000	0.9997	0.9861
β	80	0.1722	0.1258	0.0921	0.03437	0.01963	0.01391	1.000	1.0000	0.9956
β	160	0.1711	0.1245	0.0905	0.03348	0.01885	0.01322	1.000	1.0000	0.9978
Average		0.1702	0.1246	0.0918	0.03544	0.02095	0.01557	0.9856	0.9572	0.8655

Note: Mean effect δ = 0 measured as Cohen’s d. δ = β means that the true mean effect is generated from a β (1, 3) distribution that has median d = 0.206 and is highly skewed. k is the number of estimates. RE and UWLS denote the RE and UWLS meta-analysis averages, respectively. MSE = mean squared error; RE = random effects; UWLS = unrestricted weighted least square; WAAP = weighted average of the adequately powered (Stanley et al., 2017).

Discussion

When the heterogeneity variance is correlated with sampling error variance (or sample size), simulations show that UWLS dominates RE, and WAAP does even more to reduce bias and MSE when there is publication-selection bias. Because we find robust meta-research evidence that heterogeneity and standard errors are typically correlated in psychology, UWLS (and, whenever possible, preferably its WAAP variant) should be adopted as the conventional meta-analysis estimate of mean effects and summary of systematic reviews. Even if heterogeneity and standard errors are independent and the RE model is entirely valid, simulations show that there is practically nothing to gain by using RE over UWLS when there is no publication-selection bias; however, there is much to lose if there is publication-selection bias (Bom & Rachinger, 2019; Stanley & Doucouliagos, 2014; Stanley et al., 2017). When correlated heterogeneity is common, the choice is clear—UWLS. If the systematic reviewer fears the effect of publication-selection bias and wishes to reduce it more aggressively, then there are versions of UWLS that also accomplish this goal—WAAP and weighted and iterative least squares (WILS). WILS uses UWLS to identify whether there is an excess of statistical significance in an area of research and discards those studies most responsible (Stanley & Doucouliagos, 2022; Stanley et al., 2021). Often, the remaining exaggeration is scientifically and practically insignificant (Stanley & Doucouliagos, 2022). When all studies in a meta-analysis are small, then any meta-analytic estimate should be interpreted with great caution (Ioannidis, 2005; Stanley et al., 2022).

Conclusion

We introduce new meta-regression methods, VR-MRA and TV-MRA, that can identify whether the magnitude of heterogeneity across study findings is correlated with their standard errors. Evidence from the meta-analysis of 53 “preregistered” meta-analyses (as well as a separate set of 15 meta-analyses) finds clear and robust evidence of this correlation and that small-sample studies typically have higher heterogeneity. Such variable heterogeneity is a violation of the RE model of additive and independent heterogeneity—recall Equation 1. Both findings have important implications for practice.

For decades, there has been wide recognition that the low power (and small sample size) of the typical psychology study compromises reliable scientific inference (APA, 2010; Cohen, 1962, 1988; Maxwell, 2004; Psychonomic Society, 2012; Rossi, 1990). When small studies have not only inadequate statistical power but also high heterogeneity, their scientific contribution is dubious. Our results, therefore, further expose the necessity of preregistration and preanalysis plans if typical sample sizes (n ≤ 50, per group) are to be used at all (Fraley & Vazire, 2014).

The meta-research evidence presented in this article also serves as a test of the RE model. When the heterogeneity variance is correlated with the sampling-error variance to the degree found among in dozens of VR-MRAs, simulations show that RE is dominated by an alternative weighted average, the UWLS. With or without publication-selection bias, UWLS statistically dominates RE when heterogeneity is correlated. The advantage of UWLS over RE is quite notable when there is selection for statistical significance (i.e., publication-selection bias or questionable research practices). UWLS is built on a model of multiplicative heterogeneity and thereby easily accommodates correlated heterogeneity. It has long been known that UWLS dominates RE when there is publication bias. When the magnitude of heterogeneity is also correlated with standard errors, the UWLS advantage is absolute. Thus, there is a strong case for the UWLS weighted average with its WAAP and WILS variants to replace random effects as the conventional meta-analysis estimator of psychological research.

Supplemental Material

sj-docx-1-amp-10.1177_25152459221120427 – Supplemental material for Beyond Random Effects: When Small-Study Findings Are More Heterogeneous

Supplemental material, sj-docx-1-amp-10.1177_25152459221120427 for Beyond Random Effects: When Small-Study Findings Are More Heterogeneous by T. D. Stanley, Hristos Doucouliagos and John P. A. Ioannidis in Advances in Methods and Practices in Psychological Science

Footnotes

Transparency

Action Editor: Pamela Davis-Kean

Editor: David A. Sbarra

Author Contribution(s)

T. D. Stanley: Conceptualization;Data curation;Formal analysis;Investigation;Methodology;Software;Validation;Writing – original draft;Writing – review & editing.

Hristos Doucouliagos: Conceptualization;Writing – review & editing.

John P. A. Ioannidis: Conceptualization;Writing – review & editing.

ORCID iD

T. D. Stanley

References

American Psychological Association. (2010). Manual of the American Psychological Association (6th ed.).

Bom

P. R. D.

Rachinger

(2019). A kinked meta-regression model for publication bias correction. Research Synthesis Methods, 10, 497–514.

Carter

E. C.

Kofler

L. E.

Forster

D. F.

McCullough

M. E.

(2015). A series of meta-analytic tests of the depletion effect: Self-control does not seem to rely on a limited resource. Journal of Experimental Psychology: General, 144, 796–815.

Carter

E. C.

Schönbrodt

F. D.

Gervais

W. M.

Hilgard

(2018). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices for Psychological Science, 2, 115–144.

Cohen

(1962). The statistical power of abnormal-social psychological research: A review. The Journal of Abnormal and Social Psychology, 65, 145–153.

Cohen

(1988). Statistical power analysis in the behavioral sciences (2nd ed.). Academic Press.

Egger

Smith

G. D.

Schneider

Minder

(1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315, 629–634.

Fraley

R. C.

Vazire

(2014). The n-pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLOS ONE, 9, Article e109019. https://doi.org/10.1371/journal.pone.0109019

Glejser

(1960). A new test for heteroscedasticity. Journal of the American Statistical Society, 64, 316–323.

10.

Hagger

M. S.

Chatzisarantis

N. L. D.

Alberts

Anggono

C. O.

Birt

Brand

Cannon

(2016). A multi-lab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11, 546–573.

11.

Hagger

M. S.

Wood

Stiff

Chatzisarantis

N. L. D.

(2010). Ego depletion and the strength model of self-control: A meta-analysis. Psychological Bulletin, 136, 495–525.

12.

Hedges

L. V.

Schauer

J. M.

(2019). Statistical analyses for studying replication: Meta-analytic perspectives. Psychological Methods, 24(5), 557–570.

13.

Henmi

Copas

J. B.

(2010). Confidence intervals for random effects meta-analysis and robustness to publication bias. Statistics in Medicine, 29, 2969–2983.

14.

IntHout

Ioannidis

J. P. A.

Borm

G. F.

Goeman

J. J.

(2015). Small studies are more heterogeneous than large ones: A meta-meta-analysis. Journal of Clinical Epidemiology, 68, 860–869.

15.

Ioannidis

J. P. A.

(2005). Why most published research findings are false. PLOS Medicine, 2, Article e124. https://doi.org/10.1371/journal.pmed.0020124

16.

Ioannidis

J. P. A.

(2016). The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. The Milbank Quarterly, 94, 485–514.

17.

Ioannidis

J. P. A.

Stanley

T. D.

Doucouliagos

H. C.

(2017). The power of bias in economics research. The Economic Journal, 127, F236–F265.

18.

Jackson

Turner

(2017). Power analysis for random-effects meta-analysis. Research Synthesis Methods, 8, 290–302.

19.

Klein

R. A.

Vianello

Hasselman

Adams

B. G.

Adams

R. B.

Alper

Aveyard

Axt

J. R.

Babalola

M. T.

Bahník

Š.

Batra

Berkics

Bernstein

M. J.

Berry

D. R.

Bialobrzeska

Binan

E. D.

Bocian

Brandt

M. J.

Busching

. . . Nosek

B. A.

(2018). Many Labs 2: Investigating variation in replicability across sample and setting. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225

20.

Kvarven

Strømland

Johannesson

(2020). Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature: Human Behavior, 4, 659–663.

21.

Lau

Ioannidis

J. P. A.

Terrin

Schmid

C. H.

Olkin

(2006). The case of the misleading funnel plot. BMJ, 333(7568), 597–600.

22.

Linden

A. H.

Hönekopp

(2021). Heterogeneity of research results: A new perspective from which to assess and promote progress in psychological science. Perspectives on Psychological Science, 16(2), 358–376.

23.

Maxwell

S. E.

(2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, 147–163.

24.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716

25.

Owens

D. K.

Lohr

K. N.

Atkins

Treadwell

J. R.

Reston

J. T.

Bass

E. B.

Chang

Helfand

(2010). AHRQ Series Paper 5: Grading the strength of a body of evidence when comparing medical interventions—Agency for Healthcare Research and Quality and the Effective Health-Care Program. Journal of Clinical Epidemiology, 63(5), 513–523.

26.

Park

R. E.

(1966). Estimation with heteroscedastic error terms. Econometrica, 34, 888.

27.

Poole

Greenland

(1999). Random-effects meta-analyses are not always conservative. American Journal of Epidemiology, 150, 469–475. https://doi.org/10.1016/j.jclinepi.2009.03.009

28.

Psychonomic Society. (2012). New statistical guidelines for journals of the psychonomic society. https://www.springer.com/psychology?SGWID=0-10126-6-1390050-0

29.

Rossi

J. S.

(1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646–656.

30.

Schauer

J. M.

Hedges

L. V.

(2020). Assessing heterogeneity and power in replications of psychological experiments. Psychological Bulletin, 146, 701–719.

31.

Schmidt

F. L.

I.-S.

(2016). The crisis of confidence in research findings in psychology: Is lack of replication the real problem? Or is it something else? Archives of Scientific Psychology, 4(1), 32–37.

32.

Schmidt

(2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100.

33.

Stanley

T. D.

(2017). Limitations of PET-PEESE and other meta-analysis methods. Social Psychology and Personality Science, 8, 581–591.

34.

Stanley

T. D.

(2019). Making meta-analysis credible: Supplemental materials. OSF. https://osf.io/eh974/

35.

Stanley

T. D.

Carter

Doucouliagos

H. C.

(2018). What meta-analyses reveal about the replicability of psychological research. Psychological Bulletin, 144, 1325–1346.

36.

Stanley

T. D.

Doucouliagos

H. C.

(2014). Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods, 5, 60–78.

37.

Stanley

T. D.

Doucouliagos

H. C.

(2015). Neither fixed nor random: Weighted least squares meta-analysis. Statistics in Medicine, 34, 2116–2127.

38.

Stanley

T. D.

Doucouliagos

H. C.

(2017). Neither fixed nor random: Weighted least squares meta-regression analysis. Research Synthesis Methods, 8, 19–42.

39.

Stanley

T. D.

Doucouliagos

H. C.

(2022). Harnessing the power of excess statistical significance: Weighted and iterative least squares. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000502

40.

Stanley

T. D.

Doucouliagos

H. C.

Ioannidis

J. P. A.

(2017). Finding the power to reduce publication bias. Statistics in Medicine, 36, 1580–1598.

41.

Stanley

T. D.

Doucouliagos

H. C.

Ioannidis

J. P. A.

(2022). Retrospective median power, false positive meta-analysis and large-scale replication. Research Synthesis Methods, 13, 88–108.

42.

Stanley

T. D.

Doucouliagos

H. C.

Ioannidis

J. P. A.

Carter

(2021). Detecting publication selection bias through excess statistical significance. Research Synthesis Methods, 12, 776–795.

43.

van Assen

M. A. L. M.

van Aert

R. C. M

. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20(3), 293–309.

44.

Vankov

Bowers

Munafò

M. R.

(2014). On the persistence of low power in psychological science. Quarterly Journal of Experimental Psychology, 67, 1037–1040.

45.

White

H. A.

(1980). Heteroscedasticity consistent covariance matrix estimator and a direct test of heteroscedasticity. Econometrica, 48, 817–818.

46.

Witte

E. H.

Zenker

(2017). Extending a multilab preregistered replication of the ego-depletion effect to a research program. Basic and Applied Social Psychology, 39, 74–80

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.11 MB

0.00 MB