Sage Journals: Discover world-class research

Abstract

An examinee-level (or conditional) reliability is proposed for use in both classical test theory (CTT) and item response theory (IRT). The well-known group-level reliability is shown to be the average of conditional reliabilities of examinees in a group or a population. This relationship is similar to the known relationship between the square of the conditional standard error of measurement (SEM) and the square of the group-level SEM. The proposed conditional reliability is illustrated with an empirical data set in the CTT and IRT frameworks.

Keywords

Index terms: conditional standard error of measurement conditional reliability classical test theory item response theory

Get full access to this article

View all access options for this article.

References

Allen, M.J. , & Yen, W.M. (1979). Introduction to measurement theory. Belmont, CA: Wadsworth .

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (AERA, APA, & NCME). (1999). Standards for educational and psychological testing. Washington, DC: Author .

Brennan, R. (1997). BINSS and CBINSS: Conditional standard errors of measurement for scale scores using binomial and compound binomial assumptions . Iowa City: University of Iowa .

Cohen, R.J. , & Swerdlik, M. (2005). Psychological testing and assessment (6th ed.). Boston: McGraw-Hill .

Feldt, L.S. , & Brennan, R.L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105-146). New York: American Council on Education/Macmillan .

Feldt, L.S. , Steffen, M. , & Gupta, N.C. (1985). A comparison of five methods for estimating the standard error of measurement at specific score levels . Applied Psychological Measuarement, 9, 351-361 .

Hambleton, R.K. , & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff .

Kolen, M.J. , Hanson, B.A. , & Brennan, R.L. (1992). Conditional standard errors of measurement for scale scores . Journal of Educational Measurement, 29, 285-307 .

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum .

10.

Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability . Psychometrika, 48, 233-245 .

11.

Lord, F.M. , & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley .

12.

Muraki, E. , & Bock, D. (2002). PARSCALE 4: IRT based test scoring and item analysis for graded items and rating scales. Chicago: Scientific Software .

13.

Price, L.R. , Raju, N.S. , Lurie, A. , Wilkins, C. , & Zhu, J. (2004, August). Conditional standard errors of measurement for composite scores on the WPPSI-III . Poster session presentation at the annual meeting of the American Psychological Association, Division 5 (Quantitative Methods), Honolulu, HI.

14.

Price, L.R. , Raju, N.S. , Lurie, A. , Wilkins, C. , & Zhu, J. (2006). Conditional standard errors of measurement for composite scores on the WPPSI-III . Psychological Reports, 98, 237-252 .

15.

Qualls-Payne, A.L. (1992). A comparison of score level estimates of the standard error of measurement . Journal of Educational Measurement, 29, 213-225 .

16.

Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications . Applied Psychological Measurement,18, 229-244 .

17.

Thissen, D. , & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum .

18.

Thompson, B. (1994). Guidelines for authors . Educational and Psychological Measurement, 54, 837-847 .

19.

Thompson, B. (2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: Sage .

20.

Wainer, H. , & Thissen, D. (2001). True score theory: The traditional method. In H. Wainer & D. Thissen (Eds.), Test scoring (pp. 23-72). Mahwah, NJ: Lawrence Erlbaum .

21.

Wechsler, D. (2002). Wechsler Preschool and Primary Scale of Intelligence—Third Edition. San Antonio, TX: The Psychological Corporation .

22.

Wright, B.D. , & Stone, M.H. (1979). Best test design. Chicago: Mesa .

23.

Zimowski, M.F. , Muraki, E. , Mislevy, R.J. , & Bock, R.D. (2002). BILOG-MG [Computer software]. St. Paul, MN: Assessment Systems Corporation .