Abstract
Keywords
INTRODUCTION
In occupational epidemiology, exposure-response analyses play a crucial role in the evaluation of the role of chemical and physical exposures in the etiology of disease. Furthermore, the results of such analyses may provide the basis for quantitative risk assessment in the process of determining regulatory exposure standards. The development of statistical methods for the evaluation of exposure-response patterns has focused on the estimation of the rate ratio, risk ratio or odds ratio based on internal comparison analyses. Examples of such methods include categorical analysis, spline regression, fractional polynomial regression, and the use of linear models [Boucher et al. 1998; Greenland 1995; Harrell et al. 1988; Witte and Greenland 1997]. The standardized mortality or morbidity ratio (SMR) based on external comparisons has been frequently used in occupational epidemiology, typically in situations where occupational exposure data are not available (i.e., a comparison of mortality in a cohort of workers to mortality in the general population). However, SMRs have also been used to explore exposure-response relationships by stratifying an occupational cohort into subgroups defined by duration of employment or exposure level. These statistical approaches to evaluate exposure-response patterns using SMRs have mostly been limited to categorical analyses. In this paper, a graphical method for evaluating exposure-response patterns is presented based on SMR calculations using moving exposure windows.
METHODS
Data display
The approach described here can be applied to occupational cohort studies in which disease risk is evaluated in relation to quantitative measures of exposure, such as duration of employment or cumulative exposure. The first step is to create exposure categories including one death or incident case. That is, exposure categories are created based on the distribution of exposure among the observed cases N, with cut-off points at each 100/N percentile. For example, in a study with 10 deaths this would lead to a cut-off at each tenth percentile of the exposure distribution among the cases, resulting in 10 exposure categories under the assumption that all workers have been exposed. If a proportion of the population was not exposed, an unexposed category (with possibly multiple cases) would be added to these exposure categories. An exposure level such as the mean, median or midpoint corresponding to each exposure category is based on the exposure distribution of the person-time units in the category. Finally, for each exposure group the number of person-years and the expected numbers of cases based on external reference rates are calculated. For each individual, the amount of person-time is calculated as the time elapsed from exposure onset until the individual experiences the disease, is lost to follow-up, or reaches the end of follow-up [Checkoway et al. 1989b]. Subsequently, the amount person-time the individual spent in each exposure group can be derived. The total amount of person-time contributed to each exposure category by the entire study population is computed as the sum of individual person-times. Finally, the amount of person-time is multiplied by the external reference rate to yield the expected number of events.
Moving exposure windows analysis
The results for the exposure categories are combined by adding the number of observed exposed cases and expected cases across moving exposure windows. In the event that there is one unexposed case, this would be included in the moving exposure windows analysis; otherwise, the unexposed cases will be excluded. Creating moving exposure windows based on at least five cases may result in a smoother, more stable exposure-response curve. The average exposures corresponding to these SMRs are based on the exposure level (i.e., mean, median or midpoint) corresponding to the collapsed exposure categories within the exposure window weighted by the number of person-years in each category.
To demonstrate the moving exposure window calculation, the results of two hypothetical cohort studies are presented in Table 1. These artificial data were chosen after an iterative search to find two distinct exposure-response patterns based on different numbers of expected events. Both studies observed ten cases of a certain disease. The calculation of the moving exposure window curve is arbitrarily based on windows with five observed cases. The first SMR1–5 is calculated by combining the observed and expected number of cases for the first five exposure categories, and calculating the corresponding average exposure level. The subsequent exposure window combines exposure categories 2–6, and the accompanying SMR2–6 and exposure level are computed as above. Hence, this SMR2–6 and the SMR1–5 have 4 deaths in common. The SMR3–7 can be calculated by combining exposure categories 3–7, and so on. The lower and upper two exposure categories are based on the remaining exposure windows with less than five cases. This approach results in six exposure categories with five observed cases, 2 groups with four observed cases, and 2 groups with 3 observed cases. The results from these calculations for both hypothetical scenarios are presented in Table 2.
Disease risk in relation to exposure in two hypothetical scenarios based on epidemiological observation
Expected number of deaths in bold print differs between the two scenarios
Standard mortality ratio (SMR) in relation to exposure in two hypothetical scenarios based on rolling SMR and categorical analysis
Expected number of deaths in bold print differs between the two scenarios
Categorical analysis
In order to compare the results of the moving exposure window analysis to the conventional approach of evaluating exposure-response relationships, SMRs are calculated based on a categorization of exposure. Exposure groups are formed based on percentiles (e.g., tertiles or quantiles) of the exposure distribution among cases. For example, in the two hypothetical cohort studies with ten exposed cases, exposure groups are formed based on 3 (SMR1–3), 3 (SMR4–6) and 4 (SMR7–10) cases, and corresponding SMRs are computed (Table 2). Confidence intervals (95% CI) are calculated under the assumption of a Poisson distribution [Bailar and Ederer 1964].
Poisson regression
A linear line fitted to the categorical SMR results is visually compared with the exposure-response pattern derived from the moving exposure window SMR analysis. For the calculation of the linear slope, we assume that the number of events follows a Poisson distribution. A linear nonthreshold multiplicative model is fit to the SMR results from the categorical analysis (three categories; see above) using iteratively re-weighted least-squares estimation [Hanley and Liddell 1985; Hertz-Picciotto and Smith 1993]:
where E[] indicates the expectation of a random variable (in this case from a Poisson distribution),
In addition to the linear relative risk model, for each scenario we fit a fractional polynomial model [Royston et al. 1999] to the observed and expected events presented in Table 1 using Poisson regression:
where
RESULTS
Table 1 presents the crude data for the two hypothetical cohort studies. For five exposure categories, a different number of expected cases were assumed whereas the data are equivalent otherwise. The total SMRs for scenario 1 and 2 are 1.24 (95% CI = 0.60–2.29) and 1.09 (95% CI = 0.52–2.01), respectively. The results of the moving exposure windows and categorical analysis are presented in Table 2, and displayed in Figures 1 and 2.

Standard mortality ratio (SMR) in relation to exposure based on moving exposure window and regression analysis: Scenario 1. Goodness of fit chi-square for linear relative risk model = 0.01 (p-value = 0.91).

Standard mortality ratio (SMR) in relation to exposure based on moving exposure window and regression analysis: Scenario 2. Goodness of fit chi-square for linear relative risk model = 0.62 (p-value = 0.47).
Using the three exposure categories, a linear slope estimated by iteratively re-weighted least-squares estimation provides an adequate fit to both exposure-response situations. The fit of the linear model was somewhat better for the first scenario (goodness of fit p-value = 0.91) relative to the second scenario (goodness of fit p-value = 0.47). A monotonic exposure-response pattern was seen based on visual inspection, the moving exposure window analysis, and fractional polynomial regression for scenario 1. However, the assumption of linearity appeared inappropriate for the second scenario with little indication for an exposure-response association except at relatively high levels of exposure. The moving exposure window analysis showed an abrupt increase in risk between the cumulative exposure values 0.3 and 0.4, and the fractional polynomial model showed a J-shaped exposure-response pattern. The categorical SMR was slightly elevated only in the highest exposure category.
DISCUSSION
It is well recognized that exposure groups selected for categorical analysis are often arbitrary and can lead to misleading results [Greenland 1995; Schulz et al. 2001]. Alternative methods of exposure-response analysis have been developed based on internal comparisons to improve upon these limitations [Boucher et al. 1998; Greenland 1995; Harrell et al. 1988; Witte and Greenland 1997]. Nevertheless, results of internal comparison analyses can be imprecise when the study population is small with a limited number of exposed cases [Marsh et al. 2001]. Exposure-response analysis incorporating an external reference population may improve the precision of the risk estimates [Rice et al. 2001], and may account for geographic variation in cultural or socioeconomic factors [Doll 1985].
The calculation of SMRs using moving exposure windows may be useful for exploring exposure-response relationships when the number of observed cases is small, as well as for selecting appropriate cut-points for defining exposure categories for categorical analyses of exposure-response patterns. Furthermore, the technique easily accommodates the evaluation of the exposure-response relationship using excess rate models [Jarvholm 1997]. It can also be extended to internal comparison analyses by first estimating SMRs (or corresponding excess rates) using the disease rate of all risk groups combined to compute the expected number of cases and subsequently computing the ratio of SMRs using a specified referent group (e.g., least exposed) [Frome and Checkoway 1985], which is appropriate when certain criteria are satisfied [Armstrong 1995].
It is recognized that the more advanced methods of exposure-response analysis developed for internal comparisons, such as cubic spline regression, could also be applied to SMR data [Rice et al. 2001]. However, it is unlikely that these models would substantially improve the fit to the data as compared to an intercept model (e.g., SMR = α), or conventional linear (e.g., SMR = α*[1+β*exposure]) or log-linear (e.g., SMR = eα+β*exposure) models if the number of observed cases is small since the number of parameters in the regression model may approach or exceed the number of cases of disease. Nonetheless, it would be informative to compare the exposure-response curve derived using moving exposure windows with those obtained from flexible regression models when the number of observed cases is sufficiently large. In the hypothetical examples presented here, the exposure-response patterns obtained from moving exposure window analysis and fractional polynomial models were quite similar.
Several limitations of the approach outlined here need to be acknowledged. First, specification of the width of the exposure window is required, and different widths may result in different shapes of the exposure-response curve. That is, the curve will become smoother with increasing width of the moving exposure windows. In addition, no parameters are estimated and since SMRs in adjacent exposure windows share observed cases, measures of variability are not easily computed. Therefore, the rolling SMR approach is most suitable for a graphical inspection of the exposure-response curve, and should be considered a preliminary step to guide more detailed analyses. Finally, it is known that comparisons of SMRs (or corresponding excess rates) between exposure categories are invalid if their confounder distributions differ [Checkoway et al. 1989a; Checkoway et al. 1989b], although the amount of bias generally tends to be small [Breslow et al. 1983].
In conclusion, a straightforward approach to graphically explore non-linearity in sparse epidemiological data is proposed. Further evaluation of this method is needed using empirical data. Meanwhile, it is recommended that researchers follow the structure of Table 1 to display SMR estimates based on a small number of cases, which would allow the reader to explore the exposure-response curve under various specifications of the width of moving exposure windows. Furthermore, it may be helpful to graphically display the original data (from Table 1) and relationships fitted with categorical, linear, fractional polynomial or moving exposure window models (as was done in Figures 1 and 2) to evaluate the fit of such models to the data.
