Abstract
Earlier articles in this series described classifications in research design, 1 prospective and retrospective studies, cross-sectional and longitudinal studies, 2 and cohort studies. 3 This article considers a research design that is often used in present-day research in medicine and psychiatry: the case-control study.
Case-Control Study: General Description
A case-control study is one in which cases are compared with controls to identify historical exposures that are significantly associated with a current state or, stated in different words, variables that are significantly associated with caseness. In case-control studies, cases are subjects with a particular characteristic. The characteristic that defines caseness may be a clinical diagnosis (e.g., schizophrenia [Sz]), a treatment outcome (e.g., treatment-resistance), a side effect (e.g., tardive dyskinesia), or any other characteristic that is the subject of interest. Controls are subjects who do not have the characteristic that defines caseness. For Sz, controls may be healthy controls; for treatment-resistance, controls would be subjects with the same diagnosis and who are treatment-responsive; for tardive dyskinesia, controls would be subjects who received the same treatment but did not develop this adverse outcome. Controls are commonly selected based on matching with cases for variables such as age, sex, site of recruitment, and other variables. Matching may be 1:1, but when data are drawn from large electronic databases, it is often possible to match five or even 10 controls with each case. In such studies, there may be thousands of cases and tens or even hundreds of thousands of controls.
As an actual example of a case-control study, children with autism spectrum disorder (ASD) may be compared with normally developing children to determine whether a history of maternal antidepressant use during pregnancy is more frequent among cases than among controls; if it is, and if the association remains statistically significant after adjusting for confounding variables, one may speculate that gestational exposure to antidepressants predisposes to autism spectrum disorder. 4 Here, readers may note that there is only one exposure of interest: gestational exposure to antidepressant drugs.
As a hypothetical example of a case-control study, patients with Sz may be compared with healthy controls to determine whether a family history of Sz, viral infection during pregnancy, season of birth, obstetric complications during pregnancy, brain insults in early childhood, and other variables are associated with Sz in the sample. Here, readers may note that all the variables listed are exposures of interest and corrections are desirable to protect against the risk of Type 1 statistical error associated with multiple hypothesis testing. 5
In summary, in case-control studies, there are cases and there are controls that are matched with cases. Researchers then “look back” to ascertain what past events (exposures) are associated with caseness. The exposures of interest may be one or many.
Analysis of Case-Control Studies
Case-control studies are analyzed using logistic regression. The dependent variable is the (dichotomous) grouping variable: case vs. control. The independent variables are the exposure(s) of interest plus the confounding variables whose effects must be adjusted for in the regression to understand the unique effect of the exposure variable(s). The logistic regression yields an odds ratio and a statistical significance (P) value for each independent variable; this allows us to understand whether or not the independent variables are significantly associated with caseness, and, if they are, what the effect sizes are, as exemplified by the odds ratios. Readers may note that whether a significant association is a marker of risk or a cause of the risk cannot be determined from an observational study; this was explained in an earlier article. 3
As a special note, when cases and controls are well matched on many important variables, a procedure known as
Characteristics of Case-Control Studies
How do case-control studies fit into classifications of research design described in an earlier article? 1 Case-control studies are empirical studies that are based on samples, not individual cases or case series. They are cross-sectional because cases and controls are identified and evaluated for caseness, historical exposures, and confounding variables at a single point in time. They are observational; there is no intervention. They are prospective when cases and controls are identified and interviewed in real time, such as in an outpatient department, and retrospective when they are identified in and studied from medical records or electronic health care databases. Strengths and limitations of prospectively vs. retrospectively ascertained data were described in an earlier article. 3
The
Parting Notes
There are two reasons why, in case-control studies, large samples are desirable, and why many controls may be matched to a single case. One reason is that patients are not randomized to be cases or controls. In such circumstances, as in quasi-controlled studies, 9 there is bound to be confounding. With larger samples, statistical power to adjust for confounding will improve. The other reason is that, in case-control studies, data are usually drawn from medical records or databases. Information extracted from such sources is very unlikely to have been collected and recorded with the expectation of use in future research. So, there are bound to be inaccuracies. When data are blurred (inaccurate), there is statistical noise. When the sample size is large, it becomes easier to see a signal through the noise.
Cohort and case-control study designs are not “opposites” as are prospective vs. retrospective, or cross-sectional vs. longitudinal, or controlled vs. uncontrolled research designs. Rather, like the randomized controlled and quasi-controlled designs, these designs are special kinds of research design in the controlled vs. uncontrolled classification. Note that whereas a case-control study is always a special kind of controlled study, a cohort study can be classified under controlled or uncontrolled, depending on whether or not there is a comparison group for the group of interest.
Case-control studies in India tend to be poor in quality because they are based on small sample sizes. Small samples do not have sufficient statistical power to adjust for the multitude of confounding variables that bedevil research in psychiatry. Large samples cannot be identified because India does not as yet have large electronic health care databases as a source of data.
Finally, case-control studies, like cohort studies, are observational in nature, and authors who conduct and report such studies should follow the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.
