Abstract
More often than not, real-life data may not satisfy the conditions assumed by the statistical models that are used (Blanca et al., 2013; Bono et al., 2017; Micceri, 1989; Sladekova & Field, 2024b). Especially in the context of linear regression, model assumptions are strict and not easily evaluated. Depending on the severity, violations can have dire consequences on the Type I error rate, power, or even bias of the parameter estimates (Field & Wilcox, 2017; Wilcox, 2022). Many prebuilt packages and functions for even the most recent robust methods can be accessed through statistical programming languages, such as R. Yet SPPS, which primarily supports conventional methods, remains a widely used software among applied researchers (Blanca et al., 2018; Masuadi et al., 2021). Hence, the goal of this tutorial is to present applied researchers with alternative inference methods in the scope of linear regression that are also available in IBM SPSS (Version 30). Most important, these methods do not require researchers to familiarize themselves with a new type of analysis altogether but simply have the benefit of potentially leading to more valid conclusions given the data at hand.
Linear Regression
Many hypotheses pertain to the relationship between two continuous variables, combined effects of multiple predictors on one outcome, or unique effects of a single variable on an outcome while statistically controlling for linear effects of other covariates. In these cases, linear regression analysis, using the ordinary-least-squares (OLS) method, is generally the method of choice. Mathematically, linear regression can be described by the following model:
Here, an outcome variable
In most applied settings, researchers often focus on whether predictors have a nonzero effect on the outcome. This is typically tested using standard
One problem with this approach is that when OLS assumptions are violated,
So what conditions does OLS regression assume to produce results that are valid with respect to those two quality criteria? In OLS regression, the errors
Violations of homoskedasticity will often lead to larger than anticipated (i.e., inflated) Type I error rates (Astivia & Zumbo, 2019; Cribari-Neto, 2004; Long & Ervin, 2000; Rajh-Weber et al., 2025). Simultaneously, depending on the variance pattern, the statistical power can be lower under heteroskedasticity compared with scenarios in which assumptions are not violated (Hayes & Cai, 2007; Long & Ervin, 2000; Rajh-Weber et al., 2025).
In this tutorial, we focus solely on violations of the normality and homoskedasticity assumptions. This is because in such cases, researchers can easily use alternative methods that remain within the familiar framework of OLS regression while obtaining corrected
Present Work
Because many applied researchers continue to rely on the functionalities provided by the SPSS software (Blanca et al., 2018; Masuadi et al., 2021), in this tutorial, we focus on robust inference methods available in IBM SPSS (Version 30). Because the performance of an inference method generally varies depending on the type and severity of assumption violation, it is important to enable researchers to flexibly choose a method fitting for a given scenario. Therefore, in this tutorial, we present eight different alternative inference methods beyond the classical OLS regression method that are accessible in SPSS: using either an HC3 or HC4 standard error for inference or using a pairs or wild bootstrap, each in combination with either a
For best use of this tutorial, we prepared step-by-step instructions (including many screenshots and detailed elaborations of procedures) that can be found on OSF (https://osf.io/7du4t/) alongside some example data and the complete SPSS syntax to replicate the results. This shall, if necessary, allow updating of online step-by-step guides with new SPSS version releases independently from this tutorial, which is more focused on the general procedures. In addition, custom R functions were created for this tutorial reflecting SPSS’s functionality. R code for these custom functions, R code on how the sample data were created, and an R tutorial file are also provided on OSF.
Example Data
To showcase the different inference methods covered in this tutorial, we simulated one example data set. This hypothetical data set contains 95 data points and four variables. The “id” variable denotes identifiers of some fictitious participants. The other three continuous variables are called “TV,” “reading,” and “focus.” In this made-up example, we are interested in how the hours spent watching TV or reading a book on an average day can predict the focus of a person, measured by some imaginary metric scale ranging from 5 (
where we want to estimate the intercept parameter with
Because we simulated the data ourselves, we have an advantage that we do not have in any real-life experiment: We know the true data-generating process, that is, we know the true values of
Initial Analysis and Visualization of the Residuals
For the initial analysis, we recommend saving the model residuals from the original OLS-regression model so that a histogram or P-P plot of the (studentized) residuals can be produced (see Fig. S1 in the supporting information for SPSS provided on OSF). We also recommend inspecting a scatterplot visualizing the relationship between (standardized) predicted values and (studentized) residuals (Schützenmeister et al., 2012) and all the partial (residual) plots. Here, one must differentiate between two types of plots: bivariate scatterplots in which each predictor is plotted against the residuals of the model, recommended by textbooks (see Cohen et al., 2003), and the partial plots of SPSS, where the partial correlation between each predictor and the outcome is plotted, when removing the influence of all other predictors. The former is more in line with the classic predicted values versus residual plot, in which the spread around a horizontal line intercepting the
As previously discussed, the regression assumptions about homoskedasticity and normality pertain to the (usually) unknown errors

Visual inspection of the model residuals. (a) Histogram of the model residuals, (b) normal P-P plot of model residuals, (c) scatterplot of the predicted values against the model residuals, (d) scatterplot of the values of predictor TV against the model residuals, and (e) scatterplot of the values of predictor reading against the model residuals.
In our example data, the distribution of our residuals does not seem to have those features. Instead, it looks like there are a lot of data points close to the mean of zero and then quite a few others spread out much farther than would be expected from a normal distribution. This is also referred to as “heavy tails,” reflected in the difference in expected and observed probabilities for low and high values in Figure 1b.
Homoskedasticity, or constant variance of the errors for all levels of the set of predictors, can be visualized through multiple scatterplots. In the scatterplot of the predicted values against the residuals, the data points are thus expected to be evenly distributed around zero for all levels of the predicted values (see Fig. 1c). This is not the case in our example because we can see that the data points spread farther apart for small (negative) predicted values and spread less for larger predicted values. Again, in a model with perfectly homoskedastic errors, one would expect to see the data points equally spread out around the zero line for all levels of the predictor in the partial (residual) plots as well. In our example, for the predictor TV, a clear funnel shape is visible (see Fig. 1d). For the predictor reading, no distinct shape can be observed (see Fig. 1e), but the residual variance seems to be larger around average reading values, hinting at an inverse butterfly shape (Sladekova & Field, 2024a). Based on the visual inspection of the sample data alone, we can see that using the classical inference method for the unique effect of TV on focus, maybe even reading on focus, is likely not going to be valid with respect to power and Type I error rate.
Note that significance tests of assumption violations, such as the Breusch-Pagan test for heteroskedasticity or the Shapiro-Wilk test for nonnormality, also exist and are easily accessible in SPSS. However, consistent with many sources (Field & Wilcox, 2017; Long & Ervin, 2000; Sanchis-Segura & Wilcox, 2024), we do not recommend to rely solely on significance tests of assumption violations.
HC Standard Errors
To perform a significance test for a linear regression model, the uncertainty around the estimated regression coefficient must be quantified through its standard error. The classical inference method in OLS computes a valid standard error only when it can be assumed that among other things (see Berry, 1993), the errors are homoskedastic and normally distributed.
Unlike classical standard errors in OLS regression, HC standard errors do not assume homoskedasticity. Instead of using a single estimate of the error variance for computing the standard errors, HC standard errors use information contained in the variability of the residuals. In their first version (today known as HC0), the squared
Today, there are multiple versions of HC standard errors, with HC0 up to HC4 being the most well known, that are also readily available in many software programs. HC0, HC1, and HC2 showed increased Type I error rates compared with newer versions (Long & Ervin, 2000). Therefore, the ones often recommended (Cribari-Neto, 2004; Hayes & Cai, 2007) and demonstrated in this tutorial are HC3 and HC4.
The HC3 and HC4 standard errors are usually preferred because they do not simply use the raw residuals but instead transform them based on information about their respective leverage, thus they accommodate differences in influence. Both HC3 and HC4 standard errors have been found to result in Type I error rates closer to the nominal value (e.g., 5%) even under strong heteroskedasticity (Cribari-Neto, 2004; Long & Ervin, 2000; Rajh-Weber et al., 2025).
How to run the analysis in IBM SPSS (Version 30)
So far, SPSS does not allow for the computation of HC standard errors via the linear-regression window (“Analyze >> Regression >> Linear”). Currently, there are, however, two different ways to still get the desired standard errors and associated
One way is to specify the regression model via a window typically used for analysis of variance (“Analyze >> General Linear Model >> Univariate”; see Fig. S5 on OSF) and to categorize all continuous predictors as “covariates.” In this menu, the versions available for robust standard errors are HC0 up to HC4. Note that using any HC standard error via the analysis-of-variance window will result in robust inference only for the estimated regression coefficients. The overall null hypothesis
If a robust
Bootstrap Methods
Compared with the robust-standard-error methods, the bootstrap methods rely on a different approach that does not assume a specific shape for the distribution of a test statistic. Instead, the theoretical distribution of regression coefficients that would result if samples of size
Once the bootstrap sampling distribution is obtained by either method, it can be used for statistical inference. For instance, the bootstrap sampling distribution can be used to compute confidence-interval limits or
Pairs bootstrap
The pairs bootstrap approximates the theoretical distribution of regression coefficients by drawing many bootstrap samples of size
The name “pairs bootstrap” is meant to convey that entire cases, that is, pairs of predictor(s) and outcome, are resampled in the bootstrap process (Flachaire, 2005). In SPSS, this sampling method is known as “simple.” The wild bootstrap, a different resampling method, introduces only randomness through transformation of the residuals and is discussed in the next section.
How to run the analysis in IBM SPSS (Version 30)
Using a pairs-bootstrap method for inference in SPSS can currently be accessed via “Analyze >> Regression >> Linear . . .” (see Fig. S9 on OSF). In this menu, SPSS performs a pairs bootstrap as described above if the keyword “simple” is selected from the bootstrap-sampling options. Moreover, either the percentile or BCa method can be selected for the bootstrap confidence interval. A bootstrap
Wild bootstrap
Some bootstrap methods, such as the wild bootstrap, were specifically developed to counteract problems with heteroskedasticity in regression models (Chernick & LaBudde, 2011; MacKinnon, 2006). Contrary to the pairs bootstrap, the wild bootstrap does not resample cases but adds only some random perturbation to the residuals. In particular, a regression model is fit to the original data, and the residuals (optionally transformed, see below) from this model are saved. Then, in each bootstrap iteration, each residual is multiplied with a random number drawn from a distribution with mean 0 and variance 1. These new residuals are used to compute a new outcome variable, which is then used to fit a new linear-regression model using the original set of predictors. Repeating this procedure results in a distribution of regression coefficients that approximates the respective parameter’s sampling distribution. For further details, see, for example, MacKinnon (2013) or Rajh-Weber et al. (2025).
How to run the analysis in IBM SPSS (Version 30)
Running the wild-bootstrap procedure in IBM SPSS (Version 30) is currently slightly more complicated than for the pairs-bootstrap procedure. First, a regression model must be fit to the original data to acquire the residuals required in the next step. Here, we recommend saving the deleted residuals instead of the unstandardized residuals. This is a transformation of the residuals using leverage values that mirrors the HC3 procedure (MacKinnon, 2013).
In a second step, another regression analysis must be performed, now using the previously saved deleted residuals for the wild-bootstrap procedure (see Fig. S10 on OSF). A step-by-step guide on how to implement this in SPSS is provided in the supporting material on OSF. For R users, this two-step process was automated in the custom R functions provided in the additional resources for this tutorial on OSF (lm_wild_p, lm_wild_percentile, and lm_wild_bca).
Comparison of the Results
For the
Comparison of All Considered Methods Regarding
Note: For values formatted as bold, the null hypothesis would be rejected at the .05 significance level. CI = confidence interval; HC = heteroskedasticity-consistent standard error; BCa = bias-corrected and accelerated.
In fact, a
To compare all methods, a common binary conclusion scheme is used in which the null hypothesis that no effect exists in the population is either rejected or not rejected. The null hypothesis is rejected if the
If only the classical-inference approach was considered, the conclusion would be to reject the null hypothesis for the unique effect of TV on focus and of reading on focus in this example. Thus, one would (cautiously) infer that watching TV has a negative effect on the ability to focus, for a constant level of reading, because the 95% confidence interval ranges from −4.92 to −0.65 and would further conclude that reading has a positive effect on the ability to focus, for a constant level of watching TV, because the 95% confidence interval ranges from 0.46 to 5.04. However, as mentioned before when describing the example data, we know for a fact that in truth, watching TV has no effect on the ability to focus because this is how the data were generated. This means that basing our conclusions on the classical method would lead us to falsely reject the null hypothesis for this effect in this example (i.e., a Type I error).
Based on the visual inspection of the scatterplots, we already suspected that there might be some heteroskedasticity present, specifically, heteroskedasticity related to the predictor TV. Indeed, both the HC3 and the HC4 methods would instead lead us to not reject the null hypothesis for the unique effect of TV on focus. For the unique effect of reading on the ability to focus, HC3 and HC4 agree with the classical method to reject the null hypothesis. Likewise, most of the bootstrap methods, except for the wild bootstrap with the BCa confidence interval, would also encourage us to reject the null hypothesis only for the effect of reading and not for watching TV.
Discussion
The goal of this tutorial was to inform applied researchers that robust standard errors and different types of bootstrap methods are viable alternatives to classical inference, which are readily available in commercial software, such as SPSS.
All the methods presented here were recently tested for a variety of data scenarios, including different combinations of heteroskedasticity and nonnormality. In many instances, the HC standard errors and the wild-bootstrap-resampling method (especially combined with the percentile confidence interval) were shown to perform satisfactorily (Rajh-Weber et al., 2025) regarding both Type I error rate and power. However, apart from computer simulations, the true data-generating processes are hardly ever known in real-life settings. Generally, there are an infinite number of combinations of assumption violations by type and degree, so deducing which method delivers the most valid results for a specific observed data situation is virtually impossible. Here, simulation studies that assess a variety of data scenarios can help with the choice of some methods over others for at least some specific scenarios (Cribari-Neto, 2004; Long & Ervin, 2000; Rajh-Weber et al., 2025).
A limitation of the methods presented here is that they affect only inference associated with the regression coefficients, not the estimation of the regression coefficient itself. These methods are suitable under nonnormality and/or heteroskedasticity, but different estimators should be sought out when dealing with outliers, for example (Wilcox, 2022).
Practical Recommendations
As often recommended in literature (Field & Wilcox, 2017; Wagenmakers et al., 2021; Wilcox, 2022), we also encourage researchers to familiarize themselves with their own data through assessing assumption violations visually (at least in addition to commonly used significance tests) and also compare results of classical and robust methods.
Even though the desire for definite guidelines in this context is understandable, it is not possible to recommend one method over all others without further insight into the specific research scenario. Still, some general recommendations regarding the methods covered in this tutorial might be summarized as follows: (a) Use an HC standard error if there are signs of or if one wants to protect against heteroskedasticity. Out of the available HC standard errors, HC3 or HC4 should be preferred over HC0 to HC2 (Cribari-Neto, 2004; Hayes & Cai, 2007). HC4 is especially recommended if a few cases exhibit high leverage (Hayes & Cai, 2007), but HC4 is not always better than HC3 (MacKinnon, 2013). (b) Bootstrap methods can provide results that are free of distributional assumptions. Both the pairs and the wild bootstrap have been shown to also work well under heteroskedasticity but should be combined with the percentile instead of the BCa confidence interval (Rajh-Weber et al., 2025). Inference using bootstrap confidence intervals compared with bootstrap
For working with commercial software such as SPSS, good practice also means making sure analyses are reproducible by using seeds for bootstrap methods or saving SPSS syntax files and data sets in nonproprietary forms (e.g., txt and csv, respectively). In addition, being transparent about the types of analyses performed and reporting if and how they differed are generally good practice.
Finally, discussing why different methods produced different results can give insight into potential processes that may generate the data. To paraphrase Ly et al. (2020), if different methods agree with each other, the confidence in the conclusions is strengthened; if they clash, one’s confidence may be weakened, but the sensitivity of results on methodological choices may itself convey valuable information for the research problem at hand. “Either way, something useful has been learned” (Ly et al., 2020, p. 160).
Footnotes
Acknowledgements
The example data, the SPSS syntax, and R Code, including all methods as custom functions and an R markdown tutorial, can be found on the OSF: https://osf.io/7du4t/. The article has been uploaded to OSF as a preprint:
.
Transparency
