An examinee-level (or conditional) reliability is proposed for use in both classical test theory (CTT) and item response theory (IRT). The well-known group-level reliability is shown to be the average of conditional reliabilities of examinees in a group or a population. This relationship is similar to the known relationship between the square of the conditional standard error of measurement (SEM) and the square of the group-level SEM. The proposed conditional reliability is illustrated with an empirical data set in the CTT and IRT frameworks.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (AERA, APA, & NCME). (1999). Standards for educational and psychological testing. Washington, DC: Author .
3.
Brennan, R. (1997). BINSS and CBINSS: Conditional standard errors of measurement for scale scores using binomial and compound binomial assumptions . Iowa City: University of Iowa .
4.
Cohen, R.J. , & Swerdlik, M. (2005). Psychological testing and assessment (6th ed.). Boston: McGraw-Hill .
5.
Feldt, L.S. , & Brennan, R.L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105-146). New York: American Council on Education/Macmillan .
6.
Feldt, L.S. , Steffen, M. , & Gupta, N.C. (1985). A comparison of five methods for estimating the standard error of measurement at specific score levels . Applied Psychological Measuarement, 9, 351-361 .
7.
Hambleton, R.K. , & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff .
8.
Kolen, M.J. , Hanson, B.A. , & Brennan, R.L. (1992). Conditional standard errors of measurement for scale scores . Journal of Educational Measurement, 29, 285-307 .
9.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum .
10.
Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability . Psychometrika, 48, 233-245 .
11.
Lord, F.M. , & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley .
12.
Muraki, E. , & Bock, D. (2002). PARSCALE 4: IRT based test scoring and item analysis for graded items and rating scales. Chicago: Scientific Software .
13.
Price, L.R. , Raju, N.S. , Lurie, A. , Wilkins, C. , & Zhu, J. (2004, August). Conditional standard errors of measurement for composite scores on the WPPSI-III . Poster session presentation at the annual meeting of the American Psychological Association, Division 5 (Quantitative Methods), Honolulu, HI.
14.
Price, L.R. , Raju, N.S. , Lurie, A. , Wilkins, C. , & Zhu, J. (2006). Conditional standard errors of measurement for composite scores on the WPPSI-III . Psychological Reports, 98, 237-252 .
15.
Qualls-Payne, A.L. (1992). A comparison of score level estimates of the standard error of measurement . Journal of Educational Measurement, 29, 213-225 .
16.
Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications . Applied Psychological Measurement,18, 229-244 .
17.
Thissen, D. , & Wainer, H. (Eds.). (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum .
18.
Thompson, B. (1994). Guidelines for authors . Educational and Psychological Measurement, 54, 837-847 .
19.
Thompson, B. (2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: Sage .
20.
Wainer, H. , & Thissen, D. (2001). True score theory: The traditional method. In H. Wainer & D. Thissen (Eds.), Test scoring (pp. 23-72). Mahwah, NJ: Lawrence Erlbaum .
21.
Wechsler, D. (2002). Wechsler Preschool and Primary Scale of Intelligence—Third Edition. San Antonio, TX: The Psychological Corporation .
22.
Wright, B.D. , & Stone, M.H. (1979). Best test design. Chicago: Mesa .
23.
Zimowski, M.F. , Muraki, E. , Mislevy, R.J. , & Bock, R.D. (2002). BILOG-MG [Computer software]. St. Paul, MN: Assessment Systems Corporation .