Abstract
A persistent concern in music performance assessments is the quality of ratings assigned by judges. Differences in rater judgment are most concerning in cases when student performances are scored by different raters (i.e., disconnected rating designs)—bringing into question the comparability of scores between raters and students. We used data from a formal solo music performance assessment to demonstrate and explore the impact of different data collection designs and a statistical adjustment procedure on the estimates and rank-ordering of student performances. Our results indicated notable discrepancies in conclusions about individual student performances between conditions where all raters scored all students, designs with no common performances between raters, designs that included overlapping performances between raters, and the results from a post hoc adjustment procedure for disconnected designs. We discuss the implications of our results for the design and interpretation of music performance assessments.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
