The Equivalence of Multiple Rater Kappa Statistics and Intraclass Correlation Coefficients

Abstract

Using the Gini-Light-Margolin concept of partioning variance for qualitative data, correspondences are established between various kappa statistics and intraclass correlation coefficients under general conditions (multiple raters and polychotomous category systems). A measure of marginal symmetry for multiple ratings is also developed and is shown to have a proportion-of-variance explanation.

References

Collis, G. M. (1985). Kappa, measures of marginal symmetry and intraclass correlations. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 45, 55-62.

Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322-328.

Fleiss, J. L. (1965). Estimating the accuracy of dichotomous judgements. Psychometrika, 30, 469-479.

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378-382.

Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 33, 613-619.

Fleiss, J. L. and Cuzick, J. (1979). The reliability of dichotomous judgements: Unequal number of judges per subject. Applied Psychological Measurement, 3, 537-542.

Gini, C. (1912). Variabilita e mutabilita: contributo allo studio delle distribuzioni e delle relazioni statistiche. Bologna: Cuppini.

Gini, C. (1939). Variabilita e Concentrazione. Vol. 1 di: Memorie di metodologia statistica. Milano: Giuffre.

Krippendorff, K. (1970). Bivariate agreement coefficients for reliability of data. In E. F. Borgatta and G. W. Bohrnstedt (Eds.), Sociological methodology 1970. San Francisco: Jossey-Bass.

10.

Light, R. J. and Margolin, B. H. (1971). An analysis of variance for categorical data. Journal of the American Statistical Association, 66, 534-544.

11.

Rae, G. (1984). On measuring agreement among several judges on the presence or absence of a trait. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 44, 247-253.

12.

Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.

13.

Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill.