Sage Journals: Discover world-class research

Abstract

The three-parameter logistic (3PL) model in item-response theory (IRT) has long been used to account for guessing in multiple-choice assessments through a fixed item-level parameter. However, this approach treats guessing as a property of the test item rather than the individual, potentially misrepresenting the cognitive processes underlying the examinee’s behavior. This study evaluates a novel alternative, the Two-Parameter Logistic Extension (2PLE) model, which re-conceptualizes guessing as a function of a person’s ability rather than as an item-specific constant. Using Monte Carlo simulation and empirical data from the PIRLS 2021 reading comprehension assessment, we compared the 3PL and 2PLE models on the recovery of latent ability, predictive fit (Leave-One-Out Information Criterion [LOOIC]), and theoretical alignment with test-taking behavior. The simulation results demonstrated that although both models performed similarly in terms of root-mean-squared error (RMSE) for ability estimates, the 2PLE model consistently achieved superior LOOIC values across conditions, particularly with longer tests and larger sample sizes. In an empirical analysis involving the reading achievement of 131 fourth-grade students from Saudi Arabia, model comparison again favored 2PLE, with a statistically significant LOOIC difference (ΔLOOIC = 0.482, z = 2.54). Importantly, person-level guessing estimates derived from the 2PLE model were significantly associated with established person-fit statistics (C*, U3), supporting their criterion validity. These findings suggest that the 2PLE model provides a more cognitively plausible and statistically robust representation of examinee behavior by embedding an ability-dependent guessing function.

Keywords

item-response theory 2PL 3PL models person-guessing estimation 2PLE extension model

Get full access to this article

View all access options for this article.

References

Baker

F. B.

Kim

S.-H.

(2017). The basics of item response theory using R. Springer.

Birnbaum

(1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord

F. M.

Novick

M. R.

(Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.

Carpenter

Gelman

Hoffman

M. D.

Lee

Goodrich

Betancourt

Brubaker

M. A.

Guo

Riddell

(2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32.

Dienes

(2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781. https://doi.org/10.3389/fpsyg.2014.00781

Drasgow

Levine

M. V.

Williams

E. A.

(1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x

Embretson

S. E.

Reise

S. P.

(2000). Item response theory for psychologists. Lawrence Erlbaum.

Han

K. T.

(2012). Fixing the c parameter in the three-parameter logistic model. Practical Assessment, Research & Evaluation, 17(1). http://pareonline.net/getvn.asp?v=17&n=1

Holland

J. L.

(1997). Making vocational choices: A theory of vocational personalities and work environments (3rd ed.). Prentice Hall.

Hulin

C. L.

Drasgow

Parsons

C. K.

(1983). Item response theory: Application to psychological measurement. Dorsey Press.

10.

Meijer

R. R.

Sijtsma

(2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957

11.

Molenaar

Tuerlinckx

van der Maas

H. L. J.

(2015). Fitting diffusion item response theory models for responses and response times using the R package diffIRT. Journal of Statistical Software, 66(4), 1–31. https://doi.org/10.18637/jss.v066.i04

12.

Mullis

I. V. S.

Martin

M. O.

(2017). PIRLS 2016 assessment framework. TIMSS & PIRLS International Study Center.

13.

Mullis

I. V. S.

Martin

M. O.

Foy

Drucker

K. T.

(2012). PIRLS 2011 International Results in Reading. TIMSS & PIRLS International Study Center, Boston College. https://timssandpirls.bc.edu/pirls2011/international-results-pirls.html

14.

Raykov

Marcoulides

G. A.

(2020). A note on the presence of spurious pseudo-guessing parameters for three-parameter logistic models in heterogeneous populations. Educational and Psychological Measurement, 80(3), 604–612. https://doi.org/10.1177/0013164419850882

15.

Raykov

Marcoulides

G. A.

Huber

(2020). Inaccurate individual ability estimates with three-parameter item response models in mixture settings. Measurement: Interdisciplinary Research and Perspectives, 18(1), 15–22. https://doi.org/10.1080/15366367.2019.1641876

16.

San Martın

del Pino

de Boeck

. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30, 183–203.

17.

Van der Flier

. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267–298. https://doi.org/10.1177/0022002182013003001

18.

Vehtari

Gelman

Gabry

(2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4

19.

Yen

(2005). Obtaining maximum likelihood trait estimates form number-correct scores for the three-parameter logistic model. Journal of Educational Measurement, 21(2), 93–111.

20.

Zhu

Wang

Tao

(2019). A two-parameter logistic extension model: An efficient variant of the three-parameter logistic model. Applied Psychological Measurement, 43(6), 449–463. https://doi.org/10.1177/0146621618800273

Guessing During Testing is a Person Attribute Not an Instrument Parameter

Abstract

Keywords

Get full access to this article

References