✦ LIBER ✦

LETTER TO THE EDITOR A CRITICAL DISCUSSION OF INTRACLASS CORRELATION COEFFICIENTS by R. Müller and P. Büttner, Statistics in Medicine, 13, 2465–2476 (1995)

✍ Scribed by P. Vargha

Publisher: John Wiley and Sons
Year: 1997
Tongue: English
Weight: 135 KB
Volume: 16
Category: Article
ISSN: 0277-6715
DOI: 10.1002/(sici)1097-0258(19970415)16:7<821::aid-sim558>3.0.co;2-b

No coin nor oath required. For personal study only.

✦ Synopsis

Mu¨ller and Bu¨ttner recently provided an interesting review of intraclass correlation coefficients. Their paper can help readers find the method best fitted to their data setting. However, there are some points that seem worth commenting on, while others may even need correction.

First, it should be emphasized that intraclass correlation is an extension of the usual correlation concept to the case of interchangeable measurements. The only fact that seems to justify its application to non-interchangeable measurements is that, by contrast with the usual correlation coefficient, it may provide a compound measure comprising both interobserver and intraobserver (or method) reproducibility. However, with observer comparison as a goal, measurements are virtually never truly interchangeable.

The authors build their decision tree on the ANOVA based classification of Shrout and Fleiss. The three models, referred to by the authors as A, B and C, however, do not provide measures that can be regarded as only variants of the same parameter. The first two, though different with respect to underlying sampling theory, are both conceptually and by their actual sample values very close to each other. The measure provided by model C, however sound theoretically, fails to meet the requirements mentioned above, and is in fact, not surprisingly, much closer to the simple correlation coefficient (in the example equal to the second decimal place). Shrout and Fleiss in their paper argue in favour of their conceptual approach with Bartko, but their reasoning does not seem very convincing. Mu¨ller and Bu¨ttner also touch on this problem, but are perhaps a little too lenient in stating that in medicine model C may be inappropriate.

I could not track down Whitfield's method, but as the authors put it, he defined a coefficient for two different observers, apparently not interchangeable. Therefore it is unclear why this method is placed in a category different from that of, for example, the kappa statistic. Moreover, I cannot see why a statistic based on Spearman's correlation coefficient, which is simply the usual Pearson's correlation applied to ranks, is different in underlying sampling theory from another like Kendall's tau. What is easy to see is that, in spite of being in the same class, it is conceptually different from model C, a fact which is well demonstrated by the highly differing coefficients in their example.

Consequently, if observer (or methodology) agreement is to be expressed by a single correlation-like coefficient, and each observer takes a measurement on each subject, it is only the type of variables that will limit the choice. That means, for example, that instead of model C, model B is chosen with fixed observers as well (as even Shrout and Fleiss themselves suggested), irrespective of the fact that estimated errors will be more or less biased. However, it should be noted that methods, such as two described in their paper as potential alternatives, that are capable of assessing intraobserver and interobserver variability separately, seem more informative.

There is a point in method comparison which has very often been forgotten. It is the effect of the difference in the reproducibility of the methods to be compared in the estimation procedure. To account for this, difference reproducibility certainly needs to be estimated. However, in many fields, for example, in clinical laboratory, replicates can be made with ease, so variability can be assessed. In the theory of structural regression models the effect of the ratio of error variances on the estimation procedure is well known, so I would consider it as a lack of flexibility, rather than an advantage of the STRUCTREG