Abstract
A study design is said to be factorial in nature if participants are randomized into two or more groups and if participants in each of these groups are further randomized into two or more subgroups. As an example, we conduct a study in which 96 adults with major depressive disorder (MDD) are randomized to receive escitalopram or placebo, and patients in each of these two groups are randomized to receive cognitive behavioral therapy (CBT) or waitlisted CBT. Because drug (escitalopram vs. placebo) and therapy (CBT vs. waitlist) each have two categories, this is a 2 × 2 factorial design.
Table 1 presents endpoint Hamilton Rating Scale for Depression (HAM-D) scores in our hypothetical study. The data file from which Table 1 was generated is made available in the supplementary materials so that readers can run the analyses on their own, if they wish.
Endpoint Depression Ratings in a Hypothetical 2 × 2 Factorially Designed Study.
Data in cells are mean (standard deviation) Hamilton Rating Scale for Depression scores and sample size (
Two-way ANOVA
Analysis of variance (ANOVA) is a statistical procedure used to compare the means of two or more groups. We analyze the Table 1 data using a statistical test known as two-way ANOVA. It is called “two-way” because, as the table shows, there are two factors. One factor is drug, presented in rows in the table, and the other factor is therapy, presented in columns in the table (the rows and the columns are the two “ways”). Each factor has two levels, making it, as already stated, a two row × two column (2 × 2) design with four groups in the study, represented by four boxes (cells) in the table.
The two-way ANOVA, performed using a hand calculator or a statistical program, gives us three results. These are a main effect for drug (
Main Effects
The significant main effect for drug tells us that,
Similarly, the significant main effect for therapy tells us that,
Interaction Effect
The significant drug × therapy interaction tells us that the extent of improvement with drug depended on what therapy patients received. In the table, we see that placebo patients fared only marginally better with CBT relative to waitlist (endpoint HAM-D means, 18.8 vs. 19.9), whereas escitalopram patients fared noticeably better with CBT relative to waitlist (endpoint HAM-D means 10.3 vs. 14.9). The marked advantage for the escitalopram–CBT group is visually depicted in the supplementary materials; in the line diagram, the noticeable difference in the slopes of the placebo and escitalopram lines is due to the interaction (Supplementary Figure 1).
Summary
The two-way ANOVA tells us that, in our study of patients with MDD, escitalopram was superior to placebo (main effect for drug), CBT was superior to waitlist (main effect for therapy), and CBT improved outcomes with escitalopram more than it improved outcomes with placebo (drug × therapy interaction).
Specific Notes
Instead of randomizing and then subrandomizing, as described in the opening paragraph of this article, patients can be directly randomized into the four groups shown in Table 1.
In the worked example in this article, the endpoint HAM-D score was the outcome variable. Endpoint scores of other rating instruments could be analyzed in the same way, using two-way ANOVA, provided that the ratings are continuous (measured along a ratio scale) and not categorical.
If we actually conducted a study as described in this article, the method of analysis would be more elaborate than that presented here. Whereas the scenario and analysis presented here are technically correct, they are meant to explain concepts in the simplest possible way, and not to recommend a plan of analysis.
General Notes
Here is a technical point for geeks. The main effects are not merely the equivalent of
Here is the same message for non-geeks. How main effects are independent of the interaction effect can be visually understood from the line diagram in Supplementary Figure 1. As an example for the main effect for drug, the escitalopram line is wholly below the placebo line. As an example for the main effect for therapy, the CBT circles are below the waitlist circles. As a cautionary note, what appears likely from visual inspection needs to be confirmed in the statistical analysis.
Other visual examples of different combinations of significant and nonsignificant main and interaction effects are presented in Supplementary Figures 2–6. Readers are urged to view the supplementary materials to obtain a fuller understand of what is explained in this article.
We can have a 3 × 2 factorial design if drug has three levels (e.g., escitalopram, bupropion, and placebo) and therapy has two levels (CBT and waitlist). We can have a 3 × 3 design if drug has three levels and CBT also has three levels (e.g., CBT, art therapy, and waitlist). However, the statistical test used to analyze the data is still a two-way ANOVA because there are still only two “ways”: drug (rows) and therapy (columns). We will still get only three results: a main effect for drug, a main effect for therapy, and a drug × therapy interaction. If any result is statistically significant and we want to know which drug level is better than which other drug level, or which therapy level is better than which other therapy level, and from where a significant interaction arises, we would need to do post hoc analyses. This is conceptually similar to performing post hoc analyses after a one-way ANOVA yields a significant
A two-way ANOVA can also be applied to nonrandomized designs, such as when we want to see whether there is a main effect for sex (men vs. women), a main effect for quantity of alcohol consumed (one drink vs. two drinks), and a sex × quantity of alcohol interaction on performance on various cognitive tasks. Whereas we can randomize subjects into one drink vs. two drinks groups, sex is fixed; we cannot randomize subjects to be men or women. 1
The concepts described in this article can be applied to analyses of longitudinal data. Consider a study in which MDD patients randomized to escitalopram or placebo are rated on the HAM-D at baseline, at 2 weeks, at 4 weeks, and at 6 weeks. The data are analyzed using two-way
Finally, more complex factorial designs are possible. For example, in a three-way ANOVA, we could examine treatment outcomes based on sex (male vs. female), drug (escitalopram vs. placebo), and therapy (CBT vs. waitlist). We would get three main effects: for sex, for drug, and for therapy. We would get three two-way interactions: for sex × drug, drug × therapy, and sex × therapy. And, we would get one three-way interaction: for sex × drug × therapy. Such higher order ANOVAs are seldom performed because interpretation of the different interactions is difficult.
Supplemental Material
Supplemental material for this article is available online.
Supplemental Material
Supplemental material for this article is available online.
Footnotes
Declaration of Conflicting Interests
Funding
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
