Sage Journals: Discover world-class research

Abstract

This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling false-positive rates and yielding higher true-positive rates. Only when the DIF pattern is balanced between groups or when there is a small percentage of DIF items in the test does M-ST perform as appropriately as M-SP. Moreover, both methods yield a higher true-positive rate under the two-parameter logistic model than under the three-parameter model. M-SP is preferable to M-ST, because DIF patterns in real tests are unlikely to be perfectly balanced and the percentages of DIF items may not be small.

Keywords

differential item functioning scale purification item response theory confirmatory factor analysis MIMIC

Get full access to this article

View all access options for this article.

References

Ackerman, T.A. ( 1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.

Birnbaum, A. ( 1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-424). Reading MA : Addison-Wesley.

Candell, G.L. , & Drasgow, F. ( 1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement , 12, 253-260.

Clauser, B. , Mazor, K. , & Hambleton, R.K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6, 269-279.

Cohen, A.S. , Kim, S.H. , & Wollack, J.A. ( 1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement , 20, 15-26.

Finch, H. ( 2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.

Fleishman, J.A. , Spector, W.D. , & Altman, B.M. ( 2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences, 57B(5), S275-S283.

French, B.F. , & Maller, S.J. ( 2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.

Glöckner-Rist, A. , & Hoitjink, H. ( 2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10, 544-565.

10.

González-Romá, V. , Hernández , A. , & Gómez-Benito, J. ( 2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41, 29-53.

11.

Hidalgo-Montesinos, M.D. , & Gómez-Benito, J. (2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19, 1-11.

12.

Holland, W.P. , & Thayer, D.T. ( 1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.

13.

Lautenschlager, G.J. , Flaherty, V.L. , & Park, D. ( 1994). IRT differential item functioning: An examination of ability scale purifications. Educational and Psychological Measurement, 54, 21-31.

14.

Linacre, M.J. ( 2003). Winsteps Rasch measurement software [Computer software] . Chicago: Winsteps.

15.

Lord, F.M. ( 1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

16.

MacIntosh, R. , & Hashim, S. ( 2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement , 27, 372-379.

17.

McDonald, R.P. ( 1967). Nonlinear factor analysis. Psychometric Monographs, 15, 1-167.

18.

Miller, M.D. , & Oshima, T.C. ( 1992). Effect of sample size, number of biased items and magnitude of bias on a two-stage item bias estimation method. Applied Psychological Measurement, 16, 381-388.

19.

Muthén, B.O. ( 1985). A method for studying the homogeneity of test items with respect to other relevant variables. Journal of Educational Statistics, 10, 121-132.

20.

Muthén, B.O. ( 1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer & H. Braun (Eds.), Test validity (pp. 213-238). Hillsdale, NJ: Lawrence Erlbaum.

21.

Muthén, B.O. , Kao, C.-F. , & Burstein, L. ( 1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items . Journal of Educational Measurement, 28, 1-22.

22.

Muthén, L.K. , & Muthén, B.O. (2004). Mplus user’s guide. Los Angeles: Author.

23.

Navas-Ara, M.J. , & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Journal of Psychological Assessment, 18, 9-15.

24.

Oort, F.J. ( 1998). Simulation study of item bias detection with restricted factor analysis. Structural Equation Modeling, 5, 107-124.

25.

Park, D.G. , & Lautenschlager, G.J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.

26.

Raju, N.S. ( 1988). The area between two item characteristic curves . Psychometrika, 53, 495-502.

27.

Shealy, R.T. , & Stout, W.F. ( 1993). A model-biased standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.

28.

Swaminathan, H. , & Rogers, H.J. ( 1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement , 27, 361-370.

29.

Takane, Y. , & de Leeuw, J. ( 1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika , 52, 393-408.

30.

Thissen, D. , Steinberg, L. , & Wainer, H. ( 1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum.

31.

Wang, W.-C. ( 2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.

32.

Wang, W.-C. , & Su, Y.-H. ( 2004a). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education , 17, 113-144.

33.

Wang, W.-C. , & Su, Y.-H. ( 2004b). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.

34.

Wang, W.-C. , & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.

35.

Wright, B.D. , & Stone, M.H. ( 1979). Best test design. Chicago: MESA Press.