Abstract
This study is designed to explore the usefulness of different strategies used to convert item-level proportion correct standard-setting judgments in to a θ-metric test cutoff score that can be used with item response theory (IRT) scoring using Monte Carlo simulations. Simulated Angoff ratings, consisting of 1000 independent 100 item by 15 rater matrices were generated at five points along the θ continuum, ranging from negative 2 to positive 2, at five levels of rater bias in regards to the item characteristics curves. A total of 37,500,000 ratings were generated as the basis of the analyses. These simulated proportion-correct ratings were converted to the IRT θ scale using test-level and item-level methods developed by Kane (1987). Overwhelmingly, Kane’s method 1 weighted and method 3 performed the best when recovering the original θ values.