Abstract
Imagine that I have a cohort of bipolar patients. In this cohort, I identify patients who are receiving either lithium or valproate in monotherapy. I follow these patients in order to determine which drug is better at preventing relapse into a mood episode. I add new patients to the cohort, if they fulfill my study selection criteria, as and when they present to my center. Eighteen months later, I end the study and examine the proportion of lithium and valproate patients who have relapsed. Can I use a chi-square test to determine whether or not the relapse rate differs significantly between the lithium and valproate groups?
The answer is “No” because of the possibility that, regardless of whether or not the relapse rates are similar, relapse may have occurred substantially earlier in one group than in the other. So, can I use an independent sample t test to compare the mean time to relapse in the two groups?
The answer is again “No”, and for two reasons. First, different patients entered the cohort at different times, but the study ended on the same date for everybody; so, many patients may not have relapsed by the study endpoint only because they were followed for less time than others. Second, some patients who had not relapsed may have dropped out because, as examples, they shifted houses or withdrew consent; others may have been lost to follow-up. Data from these patients should not be discarded because their medication clearly protected them from relapse, at least until the time of dropout or loss to follow-up.
Survival Analysis
Data such as these are analyzed using
For survival analysis,
For time to event, which is the variable of interest in analysis, there are two possibilities: the event occurred, resulting in a classification at the time the event occurred, or the event did not occur, resulting in “
Censoring merely means that data availability ends at the point of censoring; had censoring not happened, the expectation is that the event would occur, given enough time, but we cannot know when. In this context, a
Kaplan-Meier Curve
The Kaplan-Meier curve displays the probability of survival (event did not occur) as a function of time. Time is plotted on the X-axis and the probability of survival on the Y-axis. So, the graph starts at probability = 1.0 (100%) because, at the start of the study, when time = 0, nobody has experienced the event; that is, the probability of survival is 100%.
As the study progresses, the curve is defined by new probability points; these are plotted each time a patient experiences an event or is censored (e.g., because of dropout); the latter is because there are now fewer patients available based on whom the probability of survival is estimated.
Figures 1 and 2 in the supplementary materials display Kaplan-Meier curves for a single group and for two groups, respectively. Accompanying notes explain the curves. The spreadsheet from which Figure 2 was generated is also included in the supplementary materials.
Cox Regression
Cox proportional hazards regression, or just Cox regression, is conceptually similar to multivariable linear or logistic regression. Cox regression examines survival as a function of several different independent variables (IVs), and the statistical significance of each of these IVs is assessed for the outcome of interest (occurrence of the event). More usually, we are interested in just one IV, and the remaining IVs are covariates, the effects of which are “adjusted for” in the analysis. In my hypothetical study of lithium vs. valproate in bipolar patients, treatment (lithium vs. valproate) is the IV of interest, and other IVs, such as age, sex, illness duration, and the number of previous episodes, can be adjusted for because I believe that these may also influence relapse.
In Cox regression, the analysis yields a hazard ratio (HR) that is interpreted like a relative risk. Thus, values below 1 indicate a lower risk of occurrence of the event relative to the comparison group, values above 1 indicate a higher risk, and a value of 1 indicates an identical risk. Here, 95% confidence intervals and a
Parting Notes
More detailed discussions (with examples) on survival analysis and related concepts are available in the supplementary materials accompanying this article as well as elsewhere.1,2
Supplemental Material
Supplemental material for this article is available online.
Supplemental Material
Supplemental material for this article is available online.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
