An Analysis of Variance Model for Assessing Reliability of Naturalistic Observations

Abstract

This paper presents a subjects × raters partially nested factorial analysis of variance model for estimating coefficients of interrater reliability. Procedures and formulae are described for computing unbiased estimates using the between-subjects and error mean squares from the model. Inclusion or exclusion of rater variance in the estimates is also discussed. Nine specific advantages for using the analysis of variance approach over existing methods are listed. A direction for future research on reliability is suggested.

References

Berk

R. A.

Utility of analysis of variance with repeated measures programs for estimating reliability. Perceptual and Motor Skills, 1975, 41, 441–442.

Bijou

S. W.

Peterson

R. F.

Ault

M. H.

A method to integrate descriptive and experimental field studies at the level of data and empirical concepts. Journal of Applied Behavior Analysis, 1968, 1, 175–191.

Burdock

E. I.

Fleiss

J. L.

Hardesty

A. S.

A new view of interobserver agreemenr. Personnel Psychology, 1963, 16, 373–384.

Cochran

W. G.

The comparison of percentages in matched samples. Biometrika, 1950, 37, 256–266.

Cohen

A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20, 37–46.

Cohen

Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 1968, 70, 213–220.

Cohen

Weighted chi-square: An extension of the kappa method. Educational and Psychological Measurement, 1972, 32, 61–74.

Cornfield

Tukey

J. W.

Average values on mean squares in factorials. Annals of Mathematical Statistics, 1956, 27, 907–949.

Cronbach

L. J.

Gleser

G. C.

Nanda

Rajaratnam

The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley, 1972.

10.

Cronbach

L. J.

Rajaratnam

Gleser

G. C.

Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 1963, 16, 137–163.

11.

Ebel

R. L.

Estimation of the reliability of ratings. Psychometrika, 1951, 16, 407–424.

12.

Everitt

B. S.

Moments of the statistics kappa and weighted kappa. British Journal of Mathematical and Statistical Psychology, 1968, 21, 97–103.

13.

Flanders

N. A.

The problems of observer training and reliability. In Amidon

E. J.

Hough

J. B.

(Eds.), Interaction analysis: Theory, research, and application. Reading, Mass.: Addison-Wesley, 1967. Pp. 161–166.

14.

Fleiss

J. L.

Estimating the accuracy of dichotomous judgments. Psychometrika, 1965, 30, 469–479.

15.

Fleiss

J. L.

Cohen

The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 1973, 33, 613–619.

16.

Fleiss

J. L.

Cohen

Everitt

B. S.

Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 1969, 72, 323–327.

17.

Garrett

C. S.

Modification of the Scott coefficient as an observer agreement estimate for marginal-form observation scale data. Journal of Experimental Education, 1975, 43, 21–26.

18.

Goodwin

D. L.

Training teachers in reinforcement techniques to increase pupil task-oriented behavior: An experimental evaluation. Unpublished doctoral dissertation, Stanford Univer., 1966.

19.

Guilford

J. P.

Psychometric methods. New York: McGraw-Hill, 1954.

20.

Kass

R. E.

O'Leary

K. D.

The effects of observer bias in field-experimental settings. In Behavior analysis in education. Symposium presented at the University of Kansas, Lawrence, Kansas, 1970.

21.

Kendall

M. G.

The advanced theory of statistics. Vol. 1. (4th ed.) London: Griffin, 1948.

22.

Light

R. J.

Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin, 1971, 76, 365–377.

23.

Light

R. J.

Issues in the analysis of qualitative data. In Travers

R. M. W.

(Ed.), Second handbook of research on teaching. Chicago: Rand McNally, 1973. Pp. 318–381.

24.

Lipinski

Nelson

Problems in the use of naturalistic observation as a means of behavioral assessment. Behavior Therapy, 1974, 5, 341–351.

25.

Medley

D. M.

Mitzel

H. E.

Measuring classroom behavior by systematic observation. In Gage

N. L.

(Ed.), Handbook of research on teaching. Chicago: Rand McNally, 1963. Pp. 247–328.

26.

O'Leary

K. D.

Kent

R. N.

Behavior modification for social action: Research tactics and problems. Paper presented at the Fourth Banff International Conference on Behavior Modification, Banff, Alberta, Canada, 1972.

27.

O'Leary

K. D.

O'Leary

S. G.

Classroom management: The successful use of behavior modification. New York: Pergamon, 1972.

28.

Reid

J. B.

Reliability assessment of observation data: A possible methodological problem. Child Development, 1970, 41, 1143–1150.

29.

Romanczyk

R. G.

Kent

R. N.

Diament

O'Leary

K. D.

Measuring the reliability of observational data: A reactive process. Journal of Applied Behavior Analysis. 1973, 6, 175–184.

30.

Rosenshine

Furst

The use of direct observation to study teaching. In Travers

R. M. W.

(Ed.), Second handbook of research on teaching. Chicago: Rand McNally, 1973. Pp. 122–183.

31.

Scott

Burton

R. V.

Yarrow

Social reinforcement under natural conditions. Child Development, 1967, 38, 53–63.

32.

Scott

W. A.

Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 1955, 19, 321–325.

33.

Thomson

Holmberg

Baer

D. M.

A brief report on a comparison of time-sampling procedures. Journal of Applied Behavior Analysis, 1974, 7, 623–626.

34.

Wiggins

J. S.

Personality and prediction: Principles of personality assessment. Reading, Mass.: Addison-Wesley, 1973.

35.

Winer

B. J.

Statistical principles in experimental design. (2nd ed.) New York: McGraw-Hill, 1971.