Sage Journals: Discover world-class research

Abstract

The bookmark standard-setting procedure is an item response theory–based method that is widely implemented in state testing programs. This study estimates standard errors for cut scores resulting from bookmark standard settings under a generalizability theory model and investigates the effects of different universes of generalization and error sources on standard errors. This study produced several notable results. First, different patterns of variance component estimates are found for different cut scores; therefore, researchers should estimate separate variance components for each cut score and use them to estimate corresponding standard errors. Second, different universes of generalization produce different standard error estimates; thus, policy makers should consider which universe is appropriate for the proposed use of cut scores. Third, participants and groups have nonnegligible effects on several error sources. To decrease the standard errors for cut scores, increasing the number of small groups seems more efficient than increasing the number of participants.

Keywords

bookmark procedure standard setting generalizability theory standard error cut score

Get full access to this article

View all access options for this article.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing . Washington, DC: American Educational Research Association.

Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-597). Washington, DC: American Council on Education.

Arce-Ferrer, A. , & Yin, P. (2007, April). Standard errors of cut scores for vertically scaled assessments: A generalizability theory study of Angoff-based standard setting. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago .

Berk, R.A. (1986). A consumer's guide to setting performances standards on criterion-referenced tests. Review of Educational Research , 56, 137-172.

Brennan, R.L. (1992). Elements of generalizability theory. Iowa City, IA: American College Testing.

Brennan, R.L. (1995). Standard setting from the perspective of generalizability theory. In Proceedings of the Joint Conference on Standard Setting for Large-Scale Assessments (Vol. 2, pp. 269-287). Washington, DC: National Center for Education Statistics and National Assessment Governing Board.

Brennan, R.L. (1999). Manual for urGENOVA (Version 1.4) (Iowa Testing Programs Occasional Paper No. 46). Iowa City: University of Iowa.

Brennan, R.L. (2000, April). An essay on the history and future of reliability from the perspective of replications. Paper presented at the annual meeting of the National Council on Measurement in Education , New Orleans, LA.

Brennan, R.L. (2001). Generalizability theory. New York: Springer.

10.

Brennan, R.L. , & Lockwood, R.E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory. Applied Psychological Measurement, 4, 219-240.

11.

Chang, L. (1999). Judgmental item analysis of the Nedelsky and Angoff standard-setting methods. Applied Measurement in Education , 12, 151-165.

12.

Chang, L. , & Hocevar, D. (2000). Models of generalizability theory in analyzing existing faculty evaluation data. Applied Measurement in Education , 13, 255-275.

13.

Cizek, G.J. , Bunch, M.B. , & Koons, H. (2004). Setting performance standards: Contemporary methods . Educational Measurement: Issues and Practice, 24(2), 31-50.

14.

Crick, J.E. , & Brennan, R.L. (1983). Manual for GENOVA: A GENeralized analysis Of VAriance system (ACT Technical Bulletin No. 43). Iowa City, IA: American College Testing Program.

15.

Hambleton, R.K. , & Pitoniak, M.J. (2006). Setting performance standards . In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433-470). Westport, CT: American Council on Education and Praeger .

16.

Hauser, R. M., Edley, G. F., Jr. , Koenig, J. A., Elliott, S. W. (Eds.), & the Committee on Performance Levels for Adult Literacy for the National Research Council . (2005). Measuring literacy: Performance levels for adults. Washington, DC: National Academies Press.

17.

Jaeger, R.M. (1989). Certification of student competence. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 485-514). New York: American Council on Education and Macmillan.

18.

Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64, 425-461.

19.

Kane, M. , & Wilson, J. (1984). Errors of measurement and standard setting in mastery testing. Applied Psychological Measurement, 8, 107-115.

20.

Karantonis, A. , & Sireci, S.G. (2006). The bookmark standard-setting method: A literature review. Educational Measurement: Issues and Practice, 25(1), 4-12.

21.

Lee, G. , & Lewis, D.M. (2001, April). A generalizability theory approach toward estimating standard errors of cutscores set using the bookmark standard setting procedure. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA.

22.

Lewis, D.M. (1997). Overview of the standard errors associated with standard setting. Unpublished manuscript.

23.

Lewis, D.M. , Green, D.R. , Mitzel, H.C. , Baum, K. , & Patz, R.J. (1998, April). The bookmark standard setting procedure: Methodology and recent implementations. Paper presented at the annual meeting of the National Council on Measurement in Education , San Diego, CA.

24.

Lewis, D.M. , Mitzel, H.C. , & Green, D.R. (1996, June). Standard setting: A bookmark approach. In D. R. Green (Chair),IRT-based standard setting procedures utilizing behavioral anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large-Scale Assessment, Phoenix, AZ.

25.

Mitzel, H.G. , Lewis, D.M. , Patz, R.J. , & Green, D.R. (2001). The bookmark procedure: Psychological perspectives . In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods and perspectives (pp. 249-281). Mahwah, NJ: Lawrence Erlbaum.

26.

Nedelsky, L. (1954). Absolute grading standards for objective tests . Educational and Psychological Measurement, 14, 3-19.

27.

Perie, M. (2005, April). Angoff and bookmark methods. Workshop conducted at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.

28.

Shepard, L. (1980). Standard setting, issues, and methods. Applied Psychological Measurement, 4, 447-467.

29.

Yin, P. , & Sconing, J. (2006, April). Estimating standard errors of cut scores for Angoff-based and bookmark-based procedures: A generalizability theory approach. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.

30.

Zieky, M.J. (2001). So much has changed: How the setting of cut scores has evolved since the 1980s. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 19-51). Mahwah, NJ: Lawrence Erlbaum.