Abstract
Keywords
Introduction
The National Toxicology Program (NTP) is a branch of the US Department of Health and Human Services. A major current emphasis of NTP is “The Toxicology in the 21st Century: The Role of the National Toxicology Program.”
1
NTP describes this program as follows: The Role of the National Toxicology Program is to support the evolution of toxicology from a predominantly observational science at the level of disease-specific models to a predominantly predictive science focused upon a broad inclusion of target-specific, mechanism-based, biological observations.
In the current study, each of the 470 chemicals were categorized from 1 to 48 by the level of “clear” neoplastic evidence in male and female rats, and in male and female mice, and given an ordinal rank from 1 to 135 following additional considerations regarding tumor site concordance and tumor multiplicity. The resultant tumorigenicity category score and ordinal rank score were examined for associations with results in the Ames Salmonella mutagenicity assay; presence or absence of structural alerts of carcinogenicity; and three Hansch Quantitative Structure-Activity Relationship (QSAR) parameters, namely, calculated base 10 logarithm of the octanol–water partition coefficient (ClogP), calculated molar refractivity (CMR), and molecular volume (MgVol).
In the present study, we calculated three important molecular parameters for each of the 470 chemical compounds in the NTP database. These molecular parameters (ClogP, 2 MgVol, 3,4 and CMR 5 ) represent hydrophobic, electronic, and steric effects of a chemical on its biological activity and are extremely useful in developing QSAR models to investigate the quantitative relationship between the biological activity of chemicals and their hydrophobic, electronic, steric, and other physical and chemical characteristics. 6
NTP considers results from the Ames assay test to be very important in its deliberations as illustrated by the following statement from a recent Report on Carcinogens.
7
DNA reactivity combined with
Methods
Determination of neoplasticity categories 1–48
NTP classifies the level of evidence for neoplasia as Clear (Positive), Some, Equivocal, and Negative. Analysis of the entire NTP database demonstrated that only neoplastic evidence that rose to the level of “clear” was sufficiently robust to facilitate meaningful statistical analysis. 11,12 Each of the 470 chemicals for which final technical reports were available reported results for male rats, female rats, male mice, and female mice. In several cases, one of the four studies on a particular sex/species category was deemed as “inadequate” due to technical problems with that arm of the study, while the three other arms reported valid results. This situation was amenable to statistical analysis with “inadequate” ranked just higher than “negative” due to the possibility that if that arm had been completed without technical difficulty, it might have shown a level of neoplasticity higher than “negative.” The descending order of categorical rank was as follows: Clear Evidence > Some Evidence > Equivocal Evidence > Inadequate Evidence > Negative Evidence. This ranking scheme resulted in a highest category of Clear (male rats), Clear (female rats), Clear (male mice), and Clear (female mice), and a lowest category of Negative (male rats), Negative (female rats), Negative (male mice), and Negative (female mice). Due to a sporadic presentation of species/sex categories ranked as “inadequate,” the final number of categories is not set at 48 as the size of the NTP database grows, but rather that is the number of categories that result given the outcomes from the 470 current chemicals for which there are final technical reports. Online Appendix 1 shows the various combinations of Clear, Some, Equivocal, Inadequate, Negative, and the resultant categorical ranks. Figure 1 shows the number of chemicals tested per tumor potency category and Figure 2 shows the number of chemicals tested per tumor potency category (reverse order).

Number of chemicals tested per tumor potency category.

Number of chemicals tested per tumor potency category (reverse order).
Determination of ordinal rank numbers 1–135
Analysis of the entire NTP database across all routes of administration consistently showed that the highest hurdle of neoplastic evidence was tumor site concordance across species. 11,12 This result created a boundary condition under which ordinal rank could be further split within neoplasticity category (1–48), but a chemical in a lower category could not be assigned a higher ordinal rank than that of any chemical in a higher category. The second highest hurdle of neoplastic evidence was tumor site concordance across sex within species. The final criterion influencing ordinal rank was multiplicity of tumors that were not concordant by organ site. These non-concordant tumors are referred to in the ranking scheme as “single tumors.” Online Appendix 2 shows the ordinal ranking for each of the 470 chemicals resulting from simultaneous consideration of number of different tumors concordant by tumor site across species; number of different tumors concordant across sex within species; and number of discordant tumors. Figure 3 shows the number of chemicals tested per tumor potency ordinal and Figure 4 shows the number of chemicals tested per tumor potency ordinal (reverse order).

Number of chemicals tested per tumor potency ordinal.

Number of chemicals tested per tumor potency ordinal (reverse order).
Determination of tumorigenicity percentile rank
NTP currently classifies the overall level of neoplastic evidence for a particular chemical only qualitatively using the categories “Known to Be a Human Carcinogen” and “Reasonably Anticipated to Be a Human Carcinogen.” The breadth of these qualitative categories does not provide an indication of relative ranking as per tumorigenicity of the 470 chemicals tested to date for which interpretable final reports are extant. By defining the chemical with the highest ordinal rank as either 100% or 0%, a percentile rank of tumorigenicity can be assigned to any of the 470 chemicals tested to date or to any new chemical for which 2-year NTP test data are reported. In addition, each chemical can be assigned within either a quartile or quintile of tumorigenicity. Online Appendix 3 shows the percentile ranking of all 470 chemical compounds based on ordinal ranking with 2,3-dibromo-1-propanol (CASRN 96-13-9) being defined as either 100% (quintile 5) or 0% (quintile 1) since this compound has the highest tumorigenicity score via ordinal ranking.
Calculation of molecular parameters
Bio-Loom (version 1.6; Biobyte Corp., Claremont, CA, USA) 12 was used to compute the three parameters used in our QSAR analysis from the simplified molecular input line entry system representation of each chemical compound: ClogP, CMR, and MgVol (Online Appendix 4). The utility of Bio-Loom for comparative QSAR (C-QSAR) analysis in comparative correlation analysis has been discussed in Hansch and Leo. 5 The parameters used in this study are also discussed in detail in Hansch and Leo. 5 In brief, ClogP is the calculated logarithm of the partition coefficient in octanol/water and is a measure of hydrophobicity (or lipophilicity) of a chemical. 2,5 MgVol is the molar volume calculated by the method of Abraham and McGowan 3,4 and CMR is the calculated molar refractivity (MR) for the whole molecule. MR is calculated as follows:
where
Statistical methods
The following tests were applied to assess the statistical significance of the differences in proportions. 25
Pooled test
The null hypothesis is
The formula for the pooled test statistic comparing two proportions is
where
The standard error is
Unpooled test
The null hypothesis is
Chi-squared statistic
The chi-squared (
is distributed according to the
The
Pearson correlation statistic
The Pearson correlation coefficient is a measure of the strength of the linear relationship between two interval or numeric variables. Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in statistics is the Pearson correlation. This correlation shows the linear relationship between two sets of data.
The Pearson correlation coefficient, often referred to as the Pearson
Mann–Whitney–Wilcoxon statistic
Generally, hypothesis testing uses techniques for testing the equality of means in two independent samples. An underlying assumption for appropriate use of the tests described was the presence of sufficiently large samples (usually
The Mann–Whitney–Wilcoxon test is a nonparametric test to compare outcomes between two independent groups. The Mann–Whitney–Wilcoxon test is used to test whether two samples are likely to be derived from the same population (i.e. that the two populations have the same shape). Some interpret this test as comparing the medians between the two populations. A parametric test compares the means (
The Mann–Whitney–Wilcoxon test is often performed as a two-sided test when the populations are not equal as opposed to specifying directionality. A one-sided approach is used if interest lies in detecting a positive or negative shift in one population as compared to the other. The procedure for the test involves pooling the observations from the two samples into one combined sample, keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to
The general assumptions are as follows: All the observations from both groups are independent of each other. The responses are ordinal (i.e. one can at least say, of any two observations, which is the greater). Under the null hypothesis Under the alternative hypothesis
The test involves the calculation of a statistic, usually called
where
For any Mann–Whitney–Wilcoxon test, the theoretical range of
The
Results
Relationships between Ames “positive” status, Ames “negative” status, categorical rank (1–48), and ordinal rank (1–135)
Table 1 and Figures 5 and 6 show the relationships between Ames “positive” status, Ames “negative” status, categorical rank (1–48), and ordinal rank (1–135). The Mann–Whitney–Wilcoxon rank sum test shows that the trend in Ames versus category ranking is highly significant (
Relationships between Ames “positive” status, Ames “negative” status, categorical rank (1–48), and ordinal rank (1–135).

Relationships between Ames “positive” status and categorical rank (1–48).

Relationships between Ames “positive” status and ordinal rank (1–135).
Relationships between structural alerts of carcinogenesis, categorical rank (1–48), and ordinal rank (1–135)
Table 2 and Figures 7 and 8 show the relationships between structural alerts of carcinogenesis, categorical rank (1–48), and ordinal rank (1–135). The Mann–Whitney–Wilcoxon rank sum test shows that the trend in structural alerts versus category ranking is highly significant (
Relationships between structural alerts of carcinogenesis, categorical rank (1–48), and ordinal rank (1–135).

Relationships between structural alerts of carcinogenesis and categorical rank (1–48).

Relationships between structural alerts of carcinogenesis and ordinal rank (1–135).
Relationships between ClogP, categorical rank (1–48), and ordinal rank (1–135)
Table 3 shows the relationships between ClogP, categorical rank (1–48), and ordinal rank (1–135). The Mann–Whitney–Wilcoxon rank sum test shows no apparent relationship between ClogP and category ranking of tumor potency. Similarly, the Mann–Whitney–Wilcoxon rank sum test shows no apparent relationship between ClogP and ordinal ranking of tumor potency.
Relationships between ClogP, Categorical Rank (1–48), and Ordinal Rank (1–135).
ClogP: calculated base 10 logarithm of the octanol–water partition coefficient.
Relationships between CMR, categorical rank (1–48), and ordinal rank (1–135)
Table 4 shows the relationships between CMR, categorical rank (1–48), and ordinal rank (1–135). The Mann–Whitney–Wilcoxon rank sum test shows no apparent relationship between CMR and category ranking of tumor potency. Similarly, the Mann–Whitney–Wilcoxon rank sum test shows no apparent relationship between CMR and ordinal ranking of tumor potency.
Relationships between CMR, Categorical Rank (1–48), and Ordinal Rank (1–135).
CMR: calculated molar refractivity.
Relationships between MgVol, categorical rank (1–48), and ordinal rank (1–135)
Table 5 and Figures 9 and 10 show the relationship between MgVol, categorical rank (1–48), and ordinal rank (1–135). MgVol showed an average increase with category rank of tumor potency. MgVol showed an average increase with ordinal rank of tumor potency. Therefore, smaller molecular volumes were associated with higher levels of tumorigenicity.
Relationships between MgVol, Categorical Rank (1–48), and Ordinal Rank (1–135).
MgVol: McGowan molecular volume.

Relationships between MgVol and categorical rank (1–48).

Relationships between MgVol and ordinal rank (1–135).
Relationships between Ames Salmonella mutagenicity assay results and structural alerts of carcinogenicity
Table 6 shows the relationships between Ames Salmonella mutagenicity assay results and structural alerts of carcinogenicity. The contingency table shows that when structural alerts of carcinogenicity were present, the Ames test was positive for 127 chemicals and the Ames test was negative 155 times. The contingency table also shows that in the absence of structural alerts of carcinogenicity there were 26 chemicals that were positive in the Ames test and 164 chemicals that were negative in the Ames test.
Relationships between Ames Salmonella mutagenicity assay results and structural alerts of carcinogenicity: Contingency table for Ames test and structural alerts.
The null hypothesis is that the Ames test status does not correlate with structural alert status. The
The Pearson correlation
Relationships between ClogP and Ames Salmonella mutagenicity assay results
Table 7 shows the relationships between ClogP and Ames Salmonella mutagenicity assay results. The mean ClogP for Ames positive chemicals was 1.424 (154 observations). The mean ClogP for Ames negative chemicals was 2.046 (325 observations). The difference between the ClogP means for Ames negative and Ames positive chemicals is statistically significant (
Relationships between ClogP and Ames Salmonella mutagenicity assay results.
There is a significant difference in ClogP when there is a positive Ames test versus negative Ames test.
Relationships between ClogP and CMR and MgVol
Table 8 shows the relationships between ClogP and CMR and MgVol as calculated by the two-sample
Relationships between ClogP and CMR and MgVol.
ClogP: calculated base 10 logarithm of the octanol–water partition coefficient; CMR: calculated molar refractivity; MgVol: McGowan molecular volume.
Relationships between structural alerts of carcinogenicity and ClogP, CMR, and MgVol
Tables 9, 10, and 11 show the relationships between structural alerts of carcinogenicity and ClogP, CMR, and MgVol, respectively. Table 9 shows the relationship between structural alerts of carcinogenicity and ClogP. The mean ClogP when structural alerts are present is 2.170 (285 observations). The mean ClogP when structural alerts are absent is 1.393 (191 observations). The difference between the ClogP mean values for the presence and absence of structural alerts is highly statistically significant (
Relationships between structural alerts of carcinogenicity and ClogP.
There is a significant difference in ClogP when there is a structural alert versus no structural alert.
Relationships between structural alerts of carcinogenicity and CMR.
There is not a significant difference in CMR when there is a structural alert versus no structural alert.
Relationships between structural alerts of carcinogenicity and MgVol.
There is not a significant difference in MgVol when there is a structural alert versus no structural alert.
Table 10 shows the relationship between structural alerts of carcinogenicity and CMR. The mean CMR when structural alerts are present is 5.552 (281 observations). The mean CMR when structural alerts are absent is 5.192 (187 observations). The difference between the CMR mean values for the presence and absence of structural alerts is not statistically significant (
Table 11 shows the relationship between structural alerts of carcinogenicity and MgVol. The mean MgVol when structural alerts are present is 1.512 (285 observations). The mean MgVol when structural alerts are absent is 1.523 (191 observations). The difference between the MgVol mean values for the presence and absence of structural alerts is not statistically significant (
Table 12 shows a correlation matrix that summarizes the relationships noted in the text.
Correlation matrix for Ames test Results, Structural Alerts, ClogP, CMR and MgVol.
ClogP: calculated base 10 logarithm of the octanol–water partition coefficient; CMR: calculated molar refractivity; MgVol: McGowan molecular volume.
Relationships between MgVol, Ames results, and categorical ranking of carcinogenicity (1–48)
The correlation between carcinogenicity and the combination of Ames test/MgVol can be used to improve the correlation coefficient as both variables appear to be correlated with carcinogenicity ranking.
Linear correlations were calculated for [Carcinogenicity, Ames Positive, Average MgVol], [Carcinogenicity, Ames Positive], and [Carcinogenicity, Average MgVol]. Adjusted
Relationships between carcinogenetic potential, positive Ames Salmonella mutagenicity assay results, and average MgVol.
Relationships between carcinogenetic potential and positive Ames Salmonella mutagenicity assay results.
Relationships between carcinogenetic potential and positive Ames Salmonella mutagenicity assay results, average MgVol times Ames positive, and average MgVol times Ames negative.
MgVol: McGowan molecular volume.
Discussion
The current system employed by NTP for the categorization of the neoplasticity of chemicals is qualitative. 27 Part of the qualitative nature of the NTP categorization process is intrinsic and is due to at least two factors: (1) the less than exact nature of pathological diagnosis of pre-neoplastic and neoplastic lesions 27 and (2) the practical inability to use an extremely large number of rats and mice for the purpose of increasing the statistical power of pathological observations. While these two factors necessarily introduce a qualitative aspect into the categorization of the neoplasticity observed in 2-year rodent bioassays, the large number of chemicals tested to date for which interpretable final reports are extant, that is, 470, facilitates the ability to rank these 470 chemicals and future chemical results relative to one another.
There are three different but interrelated methods for ranking these chemicals. First, neoplasticity results can be categorized from 1 to 48 at the present time by considering the various combinations of the four levels of neoplastic evidence in the descending order of categorical rank: Clear Evidence > Some Evidence > Equivocal Evidence > Inadequate Evidence > Negative Evidence (Online Appendix 1). Second, an ordinal rank 1–135 can be determined using a boundary condition under which ordinal rank can be further split within neoplasticity category (1–48), but a chemical in a lower category cannot be assigned a higher ordinal rank than that of any chemical in a higher category. When tumor site concordance across sex within species, multiplicity of tumors not concordant by organ site, and non-concordant tumors referred to in the ranking scheme as “single tumors” are considered in descending order as described in the “Methods” section and shown in Online Appendix 2, an ordinal rank number 1–135 can be readily assigned. Finally, if the most tumorigenic chemical of the 470 test results to date is defined as either 100% or 0%, a percentile ranking of each chemical ever tested or to be tested in the future logically follows (see the “Methods” section and Online Appendix 3).
The internal correlation of the categorical and ordinal ranking systems with various measures of biological activity or molecular parameters showed the expected results. The expected association of positive Ames test results with categorical and ordinal ranks of increased tumorigenicity is displayed in Table 1. 28 Similarly, Table 2 shows that positive structural alerts results are strongly associated with categorical and ordinal ranks of increased tumorigenicity. 29 –31 Also, Table 5 demonstrates that smaller molecular volumes were associated with higher levels of tumorigenicity as determined by categorical and ordinal ranks. 29 –31
Table 7 shows the relationships between ClogP and Ames Salmonella mutagenicity assay results. The mean ClogP for Ames positive chemicals was 1.424 (154 observations). The mean ClogP for Ames negative chemicals was 2.046 (325 observations). The difference between the ClogP means for Ames negative and Ames positive chemicals is statistically significant (
There could be several other possible explanations for why the mean ClogP was lower for Ames positive chemicals than for Ames negative chemicals. First, the result might be artifactual since the criterion for determining whether a chemical was positive was based on whether a single positive Ames test result had been reported. Although a possibility, the large number of observations, that is, 154 observations for Ames positive chemicals and 325 observations for Ames negative chemicals, suggest that is probably not the case. Second, the collinearity between molecular size and lipophilicity might be confounding the relationship between ClogP and Ames. Specifically, as the number of hydrophobic groups on a molecule increases, the molecular size of the molecule increases. As discussed previously, smaller molecular size is associated with increased tumorigenicity (Table 5), and positive Ames test results are associated with increased tumorigenicity (Table 1). Third, both the mean ClogP value for positive Ames (1.424) and for negative Ames (2.046) represent significantly more solubility in lipid than in water, 26.55 times and 111.17 times more soluble in lipid than water, respectively.
Studies on several classes of chemicals have established that mutagenicity can be correlated with lipophilicity in a linear, parabolic, and bilinear fashion, depending upon the type of chemical class. 32 –37 A parabolic dependence on lipophilicity indicates that the measured biological activity of a chemical first increases with increasing lipophilicity up to an optimum value and then decreases with increasing lipophilicity. It is possible that only a chemical that is sufficiently lipophilic would be able to cross the cellular membranes and facilitate molecular transfer and thus increase tumorigenicity. Many QSAR studies for predicting mutagenicity and carcinogenicity have highlighted how individual chemicals within classes may have specific mechanisms of action. 36,37 Considering the different classes of chemicals and a vast number of chemicals (470) in the data set, the ClogP correlation with tumorigenicity in this study awaits a definitive explanation.
This study represents the fourth in a series of evaluations of the entire NTP database of 594 studies, 470 of which resulted in final reports. The sequential analyses reviewed 60 inhalation studies, 38 212 feed studies, 11 124 studies by gavage, 21 via drinking water, 18 by dermal administration, and 11 by intraperitoneal injection. 35 Across the various routes of administration, the predictive power of a positive Ames test result predicting the development of tumors in male rats, female rats, male mice, or female mice was low at approximately 35%. Similarly, the predictive power of a negative Ames test result was also low across the various routes of administration at approximately 24%. Across the various routes of administration, the predictive power of positive Ames test results predicting the development of tumors from ubiquitously neoplastic chemicals in male rats, female rats, male mice, and female mice was very low at approximately 8.3%. Similarly, the predictive power of negative Ames test results predicting the development of tumors from ubiquitously neoplastic chemicals in male rats, female rats, male mice, and female mice was also very low across the various routes of administration at approximately 5.6%. The heterogeneity of the historical database of tests of genetic toxicity other than Ames renders precise statistical analysis of this metric problematic, that is, many different tests results are reported including results from older tests, for example, sister chromatid exchange, more modern tests, for example, chromosome aberration, and less commonly conducted tests.
Conclusions
A statistical analysis of the results from the entire NTP 2-year rodent carcinogenicity database suggests two readily implementable areas of improvement. First, reliance on historical tests of genotoxicity can cloud rather than clarify the issue. It would be more cost-effective and much more definitive for the interested party (usually the manufacturer of the chemical under review) to provide a highly purified sample of the test chemical documented by a certificate of analysis to a contract laboratory previously approved by NTP and United States Environmental Protection Agency (USEPA) for the purpose of conducting a genotoxicity test battery under Good Laboratory Practices (GLP) and employing the Organization for Economic Co-operation and Development (OECD) protocol relevant to the physicochemical properties of the compound. This result would be considered the definitive evaluation of the genotoxicity of the chemical compound in question. Second, following the completion of each new NTP 2-year study, the newly tested chemical should be assigned a tumorigenicity percentile rank prior to the expert panel evaluation of the potential hazards of the chemical. In this manner, the panelists would be able to provide a relative perspective on the potential carcinogenicity of the chemical.
These suggestions for improvement may seem idealistic in the current environment of toxicity testing and may not be implementable at this time due to the variety of methodologies used, inter-lab variations, reporting and evaluation differences, purity of substances tested, etc. However, in light of the present situation, efforts must be made to improve the testing and reporting methods currently in place.
Footnotes
Declaration of conflicting interests
Funding
Supplemental material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
