✦ LIBER ✦

On The Calibration of Probability Judgments: Some Critical Comments and Alternative Perspectives

✍ Scribed by GIDEON KEREN

Publisher: John Wiley and Sons
Year: 1997
Tongue: English
Weight: 164 KB
Volume: 10
Category: Article
ISSN: 0894-3257
DOI: 10.1002/(sici)1099-0771(199709)10:3<269::aid-bdm281>3.0.co;2-l

No coin nor oath required. For personal study only.

✦ Synopsis

Calibration of probability judgments has attracted in recent years an increasing number of researchers as re¯ected by an expanding number of articles in the literature on judgment and decision making. The underlying fundamental question that stimulated this line of research concerns the standards by which probability judgments could (or should) be assessed and evaluated. The most common (though certainly not exclusive) accepted criterion is what has been termed `calibration', the roots of which can be traced in the well-known Brier score and subsequent modi®cations (e.g. . Two main criteria that evolved from this line of research are customarily referred to as calibration and resolution. Calibration (or reliability) supposedly measures the accuracy of probability judgments whereas resolution measures the diagnosticity (or discriminability) of these judgments. The two major substantive and pervasive ®ndings (e.g. are overcon®dence and the interaction between the amount of overcon®dence and diculty of the task, the so-called hard±easy eect.

Several problems have been raised with regard to research on calibration, and in this commentary l would like to focus on three of them. First, calibration studies assume (implicitly or explicitly) that probabilities are subjective (e.g. yet evaluate them by a frequentistic criterion . The validity of such a procedure remains controversial.

A second problem concerns the possible tradeo between calibration and resolution. noted that calibration and resolution are not completely independent of each other, and Keren (1991) claimed that the requirements for maximizing calibration (i.e. minimizing the discrepancies between probability judgments and the corresponding reality) and achieving high resolution may often be incompatible. A similar point has been recently made by , who studied the evaluation of interval judgments.

A third problem concerns the analysis and interpretation of calibration studies. Speci®cally, Erev, Wallsten, and Budescu (1994) have eloquently described the importance of regression toward the mean in interpreting calibration studies. Similar conclusions have been reached independently by . In a nutshell, the contribution of the papers by Erev et al. and Pfeifer is in pointing out that both overcon®dence and the hard±easy eect may, at least to some degree, be an artifact due to regression toward the mean.

In re¯ecting on the articles in this special volume, I will focus on these three issues and examine how they are treated by the dierent authors. I will end this commentary by raising the question of what has been learned from thirty years of research on calibration of probabilities, and will oer a brief (and somewhat skeptical) answer to the question.

RANDOM ERROR MODELS

A common underlying thread of several papers to which this commentary is addressed (i.e. Budescu, Erev, Wallsten (Parts I and II); Juslin, Olsson, and BjoÈ rkman; Wallsten, Budescu, Erev, and Diederich) is the phenomenon of regression-toward-the-mean (or in the more general case, reversion to the mean). They cite, and heavily hinge on, the paper by . Notwithstanding, and certainly not undermining, the importance of the contribution by , it is important to stress two points.