Abstract
Introduction
The connection between environmental chemical exposures and human health/diseases (eg, cancer) is a complex multidimensional problem that is of great interest to public health researchers. Further, exposure to environmental chemical mixtures varies across location and behavior patterns. For example, exposure patterns to polycyclic aromatic hydrocarbons (PAHs) from automobile exhaust in dense urban sites such as Detroit differ from those in rural, agricultural sites such as Iowa. However, there are many sources of PAHs besides automobile exhaust, and different sources may explain different levels across space. For example, PAH concentrations have been found to be associated with residence age.1,2 Some environmentally persistent chemicals such as polychlorinated biphenyls (PCBs) may be relatively similar in urban and rural areas. Total PCB levels have been found to be positively associated with percentage of developed land or population density and housing age. 3 Regardless, exposure studies demonstrate complex correlation patterns among environmental chemicals that may vary across regions. 4
Because individuals are exposed to multiple chemicals simultaneously, it is important to examine the relationship between chemical mixtures and disease risk. Our research focus is on developing analysis strategies that incorporate larger sets of chemicals, which are more representative to actual human exposure. In such a high-dimensional framework, some factors may increase risk, some may diminish risk, and others may have no effect on risk. The goal is to use an analysis strategy that most efficiently detects the signals and deemphasizes the noise in the data. Furthermore, we aim to develop and assess methods that accommodate site-specific exposure patterns and have the ability to accurately discern whether or not a chemical is associated with the outcome of interest.
Through weighted quantile sum (WQS) regression, 5 we are able to estimate a body burden index within a set of correlated environmental chemicals, and further estimate the association between the index and an outcome of interest. Additionally, the estimated chemical weights allow us to make generalized inference concerning relative chemical importance. WQS regression is constrained to model associations between the outcome and chemicals that are in one direction (all non-negative or all non-positive), making it appropriate in a risk setting where the goal is to identify exposures that are positively associated with a health outcome. As such, WQS regression is designed for variable selection with less emphasis on risk prediction.
Few existing studies of chemical exposure and disease risk have made efforts to consider the impact of spatially varying exposure patterns on the effect of a chemical mixture. One exception is Czarnota et al. 4 , where the authors take a site-specific approach in a preliminary effort to assess the effects of spatially varying exposure patterns among chemical mixtures on the risk of non-Hodgkin lymphoma (NHL). The objective in that work was to apply statistical methods to detect bad actors in environmental chemical mixtures, while considering different exposure patterns based on the geographic site. The focus here is on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of NHL to achieve realistic exposure situations, while setting which chemicals were truly associated with the health outcomes. For comparison, we also evaluated the performance of several penalized regression methods in correctly classifying chemicals as bad actors or unrelated to the outcome.
Methods
NCI-SEER Study Population
The NCI-SEER NHL population-based case-control study design has been previously described.6,7 Briefly, the study was conducted in Iowa, Los Angeles County, and the metropolitan areas of Detroit (Macomb, Oakland, and Wayne counties) and Seattle (King and Snohomish counties). Eligible cases were aged 20–74 years, diagnosed with a first primary NHL between July 1998 and June 2000, and uninfected with HIV. In Seattle and Iowa, all consecutive cases were chosen. In Detroit and Los Angeles, all African American cases and a random sample of white cases were eligible for study, allowing for oversampling of African American cases. Of the 2,248 potentially eligible cases, 320 (14%) died before they could be interviewed, 127 (6%) were not located, 16 (1%) had moved away, and 57 (3%) had physician refusals. Of the 1,728 remaining cases, 1,321 (76%) participated. Controls were selected from Center for Medicare and Medicaid Services files (≥65 years) or the general population using random digit dialing (<65 years) and were frequency matched to cases by sex, age, race, and study site. Of the 2,409 potentially eligible controls, 2,046 were able to be located and contacted, and 1,057 (52%) of these subjects participated.
Computer-assisted personal interviews were conducted in the home of each participant. Interviewers asked about demographics, including race and education, age of the home, housing type, the presence of oriental rugs, pesticide use in the home and garden, residential and occupational histories, and other factors. As described in detail,6,8 dust was collected between February 1999 and May 2001 from vacuum cleaners of participants who gave permission (93% of cases, 95% of controls) and who had used their vacuum cleaner within the past year and owned at least half of their carpets or rugs for five years or more (695 cases (57%), 521 controls (52%)). Dust samples from 682 cases (98%) and 513 controls (98%) were successfully analyzed between September 1999 and September 2001.
A total of 27 chemicals were measured in house dust (5 PCBs, 7 PAHs, and 15 pesticides). The PCBs were congeners 105, 138, 153, 170, and 180. The PAHs were benz(a)anthracene, benzo(a)pyrene, benzo(b)fluoranthene, benzo(k)fluoranthene, chrysene, dibenz(a,h)anthracene, and indeno(1,2,3-cd)pyrene. The pesticides were α-chlordane, γ-chlordane, carbaryl, chlorpyrifos, cis-permethrin, transpermethrin, 2,4-dichlorophenoxyacetic acid (2,4-D), DDE, dichlorodiphenyltrichloroethane (DDT), diazinon, dicamba, methoxychlor, o-phenylphenol, pentachlorophenol, and propoxur. Extraction and analysis were performed on 2-g aliquots of dust samples using gas chromatography/mass spectrometry (GC/MS) in selected ion monitoring mode. Concentrations were quantified using the internal standard method. Usual detection limits were 20.8 ng/g of dust for α-chlordane, γ-chlordane, DDE, DDT, propoxur, o-phenylphenol, PAHs, and PCBs; 42–84 ng/g for chlorpyrifos, diazinon, cis-permethrin, dicamba, pentachlorophenol, and 2,4-D; and 121–123 ng/g for carbaryl and trans-permethrin. Changes in analytic procedures during the study resulted in increased detection limits for methoxychlor (from 20.7 to 62.5 ng/g). A small proportion of samples weighing less than 2 g had detection limits that were higher than the usual detection limits.
The laboratory measurements for the 27 analytes contained various types of missing data, primarily when the concentration was below the minimum detection level. To a lesser extent, missing data occurred when there was co-elution between the target chemical and interfering compounds. Chemical concentrations were assumed to follow a log-normal distribution, and data were imputed using a fill-in approach to create 10 complete data sets for each of the 27 analytes. Details about the imputation of analyte values have been published previously.6,9 A total of 1,180 participants had available measurements of all 27 chemicals. Of these participants, 508 (43%) were controls and 672 (57%) were cases. With respect to study site, 202 (17%) were from Detroit, 340 (29%) from Iowa, 292 (25%) from Los Angeles, and 346 (29%) from Seattle.
Our primary interest in the NCI-SEER NHL study is the chemical exposure patterns and the correlation structure of the exposures. As illustrated in Figure 1, the concentrations among the PCBs were similar across the four sites, while the concentrations of PAHs and pesticides varied considerably by site. More specifically, concentrations for all seven PAHs were notably elevated in Detroit, while elevated concentrations of pesticides were seen in Iowa (eg, 2,4-D and methoxychlor) and Los Angeles (eg, α- and γ-chlordane and propoxur).

Distribution of chemical concentrations in the NCI-SEER NHL study by chemical group: (
The site-specific distributions of the Pearson pairwise correlation coefficients among the log-transformed concentrations are shown in Figure 2. The observed pairwise correlation patterns are complex, with correlations ranging from slightly negative to nearly perfectly correlated within all four sites. When examining the correlations by chemical group, we see that for each site, the PAHs and PCBs demonstrated a high degree of intragroup correlation. The pesticides generally exhibited lesser intragroup correlation, with the exception of the pairwise correlations between metabolites and analogs. For each site, correlation within chemical group is further illustrated in Figure 3 and summarized in Table 1. We see that the PCBs were most highly correlated in Los Angeles, with 75% of the intragroup correlations greater than 0.81. Additionally, we see that the correlation among the PAHs was most pronounced in Detroit (pairwise correlations ranging from 0.95 to 0.99) where PAH exposure was the highest and least pronounced in Los Angeles (interquartile range (IQR) of 0.68–0.86) where PAH exposure concentration was the lowest. As demonstrated by the NHL data, chemical exposure patterns may vary in both concentration and correlation across space, illustrating the need to consider site-specific risk analyses in the context of environmental chemical exposure.

Distribution of Pearson pairwise correlation coefficients among chemical concentrations (on the log scale) by study site in the NCI-SEER NHL study.

Distribution of the observed intergroup correlations for PCBs, PAHs, and pesticides by study site and across the full-study population.
Correlations within chemical group by study site and across the full population in the NCI-SEER NHL study.
Simulation Study Design
Using each of the four site-specific correlation structures for the 27 chemicals in the NCI-SEER NHL study, we generated data on a site-specific basis using the observed mean concentrations (from the log scale) and standard deviations, for each of the following three correlation patterns: (1) 65% of the observed site-specific correlation structure (moderate correlation), (2) 30% of the observed site-specific correlation structure (mild correlation), and (3) 1% of the observed site-specific correlation structure (near independence). We did not include the observed correlation patterns to generate data as the resulting correlation matrices were generally singular and could not be inverted.
The following seven chemicals (one PCB, two PAHs, and four pesticides/insecticides) were set to be associated with the response variable: PCB180 (X5), benzo(k)fluoranthene (X8), benzo(a)pyrene (X9), 2,4-D (X19), DDE (X20), methoxychlor (X24), and propoxur (X27). These chemicals were chosen in an effort to represent a wide range of correlations between and among the bad actors and benign chemicals. Correlation within the seven selected chemicals ranged from −0.08 to 0.94, with a median correlation of 0.14. We also ensured that at least one chemical was selected from each chemical group in an attempt to capture the variation in exposure patterns across the study sites. PCB 180 was purposefully selected to represent the PCBs because of its known link to NHL.
Given that chemical exposure patterns differed across site, the association with the outcome variable was assumed true only under the condition that the observed site-specific concentrations were high enough to have a health effect. More specifically, we assumed the association was true within a site if and only if more than 25% of the site-specific concentrations were higher than the overall median concentration. This condition was satisfied for each of the seven specified chemicals in the Detroit, Iowa, and Seattle sites, and thus, all seven of the specified chemicals were set to be associated with the response when simulating data. For the Los Angeles site, over 75% of the observed concentrations for chemicals X8, X9, and X19 were below the overall median concentration. Therefore, only four chemicals (X5, X20, X24, and X27) were set to be correlated with the outcome in Los Angeles.
For each correlation pattern, we simulated data where the selected chemicals (ie, the pre-specified chemicals that satisfied the above condition) had a correlation of 0.1 with the response (weak association with outcome) as well as where the selected chemicals had a correlation of 0.3 with the response (strong association with outcome). To simulate the analyses of case-control study data, we calculated a disease indicator for subjects with the highest 30% of the simulated continuous response variable.
We simulated 100 data sets for each set of conditions (correlation pattern and outcome correlation combination), with a sample size of
Data were simulated for each site based upon the site-specific observed mean vector (on the log scale) and covariance matrix. The continuous response variable
Let
Next, calculate the Cholesky decomposition U of the covariance matrix Σ. That is, calculate U
WQS Regression
The primary method of risk analysis used in this study was WQS regression. The WQS approach estimates a weighted linear index in which the weights are empirically determined through the use of bootstrap sampling. The approach considers data with
In the above equation,
For each bootstrap sample, the significance of the estimated vector of weights was evaluated through the significance (
Simultaneous estimation of the unknown weights and parameters was achieved through the use of an optimization algorithm that maximized the nonlinear function in Equation (1), subject to the linear constraint
For each of the 100 simulated data sets for each set of simulation conditions (correlation pattern and outcome correlation), we performed WQS regression on the full data set (adjusted for study location) and separately at each site. The ranks used in WQS regression were calculated within each site for the site-specific analyses and overall for the site-adjusted analysis of the full data set. The process was performed twice, once using the continuous outcome variable and once using the binary outcome variable. Therefore, for each set of conditions, a total of five indices (four site-specific indices and one full-study index) were estimated for each outcome variable. The median number of correctly and incorrectly selected chemicals was calculated for each of the five indices across the 100 simulated studies. A chemical was identified as selected if it received a weight of at least 0.05. The significance of the five estimated indices in their respective validation data sets was also examined.
Comparison of WQS Regression with Lasso, Adaptive Lasso, and Elastic Net
Modern methods that address collinearity and high-dimensionality (eg, lasso, elastic net) have been demonstrated to be less accurate in the selection of potentially harmful chemicals compared with WQS regression.
5
To further assess the use of shrinkage regression models for evaluating effects of chemical exposures, we fitted lasso,
11
adaptive lasso,
12
and elastic-net
13
models to the 100 training data sets (of size
For the lasso, adaptive lasso, and elastic-net methods, chemicals related to the outcome variable were identified as correctly chosen if they were retained in the model with a positive coefficient, while chemicals not related to the outcome variable were identified as incorrectly chosen if they were retained in the model. The median and IQR for the number of correctly and incorrectly selected chemicals were reported, and the three methods were compared in terms of sensitivity and specificity.
Results
Sensitivity and Specificity of WQS Regression
The median number of correctly and incorrectly chosen chemicals across 100 samples for each setting is displayed in Tables 2 and 3, respectively. When association with the outcome was strong (
Median [IQR] number of correctly chosen chemicals for the five WQS indices across the 100 simulated data sets.
Median [IQR] number of incorrectly chosen chemicals for the five WQS indices across the 100 simulated data sets.
From the results for the Los Angeles site, all four chemicals were correctly selected at least half of the time for each setting, but the number of incorrectly chosen chemicals ranged from two to four across the different settings. Because fewer chemicals (four) were set to be associated with the outcome within this site, it may have been advantageous to define a criterion for chemical selection unique to this site.
In summary, WQS regression had good sensitivity and specificity at all settings for the site-specific and overall analysis for both the continuous and binary outcome variables. Performance of the site-specific analyses was comparable to that of the overall analysis. We caution against overinterpretation of a one or two chemical difference in sensitivity and/or specificity, as any perceived improved performance for the overall analyses in comparison to the site-specific analyses may be a result of the four-fold increase in sample size in the estimation data set for the index derived from the full-study population. Furthermore, the results presented in this analysis are dependent upon chemical selection as defined by a minimum value of 0.05 for the estimated chemical weight. It was decided a priori that a chemical must receive at least 5% of the weights to be considered important. While this may be reasonably applied in practice, the method for best choosing a cutoff value is still an open area of research. The choice of cutoff value may be affected by number of chemicals, correlation structure, signal strength, etc.
With respect to chemical selection, we generally expect to see an increase in sensitivity and a decrease in specificity as the threshold weight for chemical selection is lowered. Figure 4 shows modified receiver operating curves (ROC) for the three different correlation structures among the chemicals in the setting of weak association with the (a) continuous outcome and (b) binary outcome, with the cutoff weight for chemical selection varied. The true positive rate (sensitivity) was calculated as the average percentage of correctly selected chemicals across the 100 simulations, and the false positive rate (1-sensitivity) was calculated as the average percentage of incorrectly selected chemicals across the 100 simulations. As the cutoff weight for chemical selection is lowered, we see an increase in both the average true and false positive rates as expected. The a priori chosen cutoff of 0.05 (ie, 5% of the total chemical weights) performed well, regardless of the level of correlation among chemicals or strength of association with outcome. When association with the continuous outcome was weak (Fig. 4A), the average false positive rate for the cutoff weight of 0.05 ranged from 2.3% to 4.5% across the correlation structures, while the average true positive rate ranged from 83.9% to 85.6%. Similarly, for the binary outcome (Fig. 4B), the average false positive rate for the cutoff weight of 0.05 ranged from 2.6% to 6.3% across the correlation structures, while the average true positive rate ranged from 85.6% to 89.9%. Finally, when association with the outcome was strong (results not shown), the average false positive rate for the cutoff weight of 0.05 was at most 0.05%, while the average true positive rate was at least 92.6% across the three correlation structures and both the continuous and binary outcomes.

Modified ROC for the WQS index derived from the full-study population with varying weight thresholds for chemical selection for a continuous outcome (
Distribution of WQS Regression Weights
In practice, we also look at the distributions of the weights in deciding which chemicals are important. Figure 5 shows the distribution of the average weights across the 100 simulated samples for the seven chemicals assumed to be associated with outcome for each of the five indices. The plots focus on the setting in which there was weak correlation with the continuous outcome for (A) moderate correlation (65% of that observed) among chemicals and (B) correlation diminished to 1% of that observed. For both correlation structures, WQS appropriately placed considerable weight on the true bad actors and also placed negligible weight on chemicals uncorrelated with the outcome. The latter is demonstrated by the near zero weight placed on X8, X9, and X19 by the Los Angeles index. Also, as correlation among chemicals was diminished, reliability of the weights improved, as evident by the narrowed distributions in Figure 5(B).

Distribution of WQS index weights for the five WQS indices across the 100 simulated data sets for the seven chemicals associated with the outcome. Site 1 = Detroit, 2 = Iowa, 3 = Los Angeles, 4 = Seattle, and F = full-study population. (
When comparing the weights from the different indices, the index for Los Angeles tended to place greater emphasis on chemicals X5, X20, X24, and X27 compared with the other sites. This is likely because of the fact that these four chemicals were the only true bad actors in this site, and thus, the weights as a whole were divided over fewer components. Additionally, the weights for the full-study population analysis seem to demonstrate an averaging effect across the sites, as they appear to shift downward for the chemicals that were unassociated with outcome in Los Angeles (X8, X9, and X19). For the chemicals associated with outcome in all four sites (X5, X20, X24, and X27), the weights estimated by the overall analysis were slightly higher than those estimated in site-specific analysis. This may be attributable to the increased power (greater sample size) of the overall analysis.
When strongly correlated with the continuous outcome (data not shown), the findings were consistent with those discussed above, but the important chemicals (true bad actors) tended to receive higher average weights. With respect to the binary outcome (data not shown), the findings were again analogous.
Power of WQS Regression
The estimated weights were applied to the validation data sets, and significance was assessed through the
Summary of testing results for the five WQS indices across the 100 simulated data sets for the continuous outcome variable.
Summary of testing results for the five WQS indices across the 100 simulated data sets for the binary outcome variable.
Comparison of WQS Regression with Lasso, Adaptive Lasso, and Elastic Net Regression
Lasso, adaptive lasso, and elastic-net regressions were performed on only the full-study population for both the binary and continuous outcomes for the six different simulation settings. The median number of correctly and incorrectly chosen chemicals for the lasso, adaptive lasso, elastic-net, and WQS regression models across 100 samples is given in Tables 6 and 7. When the predictors were strongly associated with the outcome, WQS and the traditional shrinkage methods demonstrated a high degree of sensitivity for both the continuous and binary response variables, regardless of the level of correlation among predictors. For the continuous outcome, each of the shrinkage methods correctly selected all seven chemicals at least half of the time, while WQS regression correctly selected a median of at least six of the seven chemicals. In the case of the binary outcome, each of the methods correctly chose all seven chemicals at least half of the time.
Median [IQR] number of correctly selected chemicals for lasso, adaptive lasso, elastic-net, and WQs regressions across the 100 simulated data sets for the full-study population.
Median [IQR] number of incorrectly selected chemicals for lasso, adaptive lasso, elastic net, and WQs regressions across the 100 simulated data sets for the full-study population.
When considering the setting of weak association among the predictors and the outcome, WQS regression correctly selected six of the seven chemicals at least half of the time, for both the continuous and binary outcomes, regardless of the level of correlation among the predictors. Similarly, the median number of correctly chosen chemicals for elastic net ranged between six and seven for both the continuous and binary responses. In contrast, lasso and adaptive lasso demonstrated diminished sensitivity, with the median number of correctly chosen chemicals ranging from four to seven for the continuous outcome and three to five for the binary outcome.
While the penalized regression models may have exhibited sensitivity that was comparable to that of WQS regression in several settings, the sensitivity of these traditional shrinkage methods was often overshadowed by their lack of specificity. WQS regression was highly specific, choosing at most a median of one incorrect chemical, regardless of the degree of correlation among predictors and regardless of the strength of association with the response. In contrast, as correlation among the chemicals increased, the penalized regression methods demonstrated a loss of specificity. In particular, when the predictors were strongly associated with the response (both continuous and binary), the penalized regression models chose a median of at least 14 incorrect chemicals in the presence of moderate or mild correlation among chemicals. Most notably, the lasso and elastic net had a tendency to select almost all the chemicals when the chemicals were moderately correlated and strongly associated with the response. The relatively low specificity of these shrinkage methods appears to limit their role in risk evaluation of environmental chemical mixtures.
Discussion and Conclusion
WQS regression demonstrated good sensitivity and specificity for both site-specific models and the full-study population models across a variety of conditions considered in this study. WQS adequately detected important predictors, while simultaneously placing negligible weight on chemicals unassociated with outcome, for both continuous and binary response variables. Additionally, the WQS index was significantly and positively associated with the outcome when tested in the validation data sets, and generally demonstrated good power. Results improved as correlation among chemicals diminished and association with the outcome strengthened. In comparison to the shrinkage regression methods of lasso and elastic net, WQS performed well for sensitivity and specificity, while the lasso and elastic-net models exhibited good sensitivity but poor specificity. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. This suggests that these methods may be limited for use in risk assessment, as they are unable to discern which chemicals are unassociated with health risk.
The WQS index weights for the full-study population demonstrated an averaging effect, suggesting that chemical weights estimated in an overall analysis may not be representative of the true bad actors within a site. Three chemicals were deemed unassociated with outcome in the Los Angeles site, as they were not present in high enough concentrations to satisfy the imposed definition of health risk. The overall analysis consistently identified these three chemicals unassociated with outcome in Los Angeles as bad actors. While this is representative of the data as a whole (these chemicals were set as truly bad actors in three of the four sites), it is not an accurate representation of the chemicals posing risk in Los Angeles. Additionally, the average weights assigned to these three chemicals by the index in the full-study population were lower in comparison to the weights assigned by the indices in Detroit, Iowa, and Seattle. The non-association in Los Angeles, therefore, seems to result in an underestimation (by the full index) of the importance of these chemicals in the sites in which they truly were bad actors.
With the goal of identifying chemicals that pose a significant health risk, it is of great importance to consider the toxicological principle that “the dose makes the poison,” especially given that exposure patterns are spatially varying. Although a chemical may not be present in high enough concentrations to pose a health risk in one location, it may still pose a significant health risk at other locations. Though limited, these simulation studies suggest that use of an overall index may overstate the importance of a chemical in sites where the concentration is too low to constitute risk and may understate the importance of a chemical in locations where it is present in concentrations that are high enough to adversely affect health.
The simulation studies conducted in this analysis were largely reflective of the exposure patterns observed in the original NCI-SEER NHL study, incorporating the exposure concentrations and the complex correlation among chemicals on a site-specific basis. However, simulation of the data utilized Cholesky decomposition, which required that the covariance matrices be positive definite. As a result, the simulations only incorporated (at most) 65% of the observed correlation structures, as the covariance matrices became singular if they were any less diminished. Other studies have used methods such as ridging to allow the correlation to be persevered or even inflated, 16 which should be considered in future work. We expect that as correlation among chemicals increases, one will encounter cases in which WQS may not perform as well as was in this study. Finally, in the general context of risk assessment, it is a limitation that the observed chemical concentrations in the NCI-SEER NHL study were external measures of exposures, as what is found in house dust may not be truly reflective of an individual's absorption or ingestion of chemicals.
Author Contributions
Conceived and designed the experiments: JC, CG, DCW. Analyzed the data: JC. Wrote the first draft of the manuscript: JC. Contributed to the writing of the manuscript: JC, CG, DCW. Agree with manuscript results and conclusions: JC, CG, DCW. Jointly developed the structure and arguments for the paper: JC, CG, DCW. Made critical revisions and approved final version: JC, CG, DCW. All authors reviewed and approved of the final manuscript.
