The data of the Collaborative Behavioral Teratology Study were reanalyzed. The degree of reproducibility of the treatment effect across experiments was assessed as the magnitude of fluctuation in strength of association (η2) for the replication experiment. There was no evidence that fluctuation of treatment effects across experiments was greater for behavioral measures than for nonbehavioral ones, suggesting the conventional belief that low reproducibility of results in behavioral teratology reflects the low reliability of the behavior test per se is unwarranted. Although the distributions of obtained treatment effects were almost symmetric and unimodal for most measures, the ranges were considerable. Considering that experiments were conducted under strictly standardized conditions across experiments, finding of the considerable ranges indicates that inconsistency in results across experiments cannot be remedied adequately by employing solely a standardized method. Basing in the main their results in terms of conventional significance tests, research workers in the collaborative study concluded that excellent reproducibility of behavioral data was demonstrated. The logic underlying the statistically nonsignificant interaction used in the original work was examined critically. A serious limitation upon the significance test for assessing the reproducibility of results was pointed out, and a warning issued about customary reliance only on a significance test.