✦ LIBER ✦

Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures

✍ Scribed by Jeanne A. Teresi; Marjorie Kleinman; Katja Ocepek-Welikson

Publisher: John Wiley and Sons
Year: 2000
Tongue: English
Weight: 257 KB
Volume: 19
Category: Article
ISSN: 0277-6715
DOI: 10.1002/(sici)1097-0258(20000615/30)19:11/12<1651::aid-sim453>3.0.co;2-h

No coin nor oath required. For personal study only.

✦ Synopsis

Cognitive screening tests and items have been found to perform di!erently across groups that di!er in terms of education, ethnicity and race. Despite the profound implications that such bias holds for studies in the epidemiology of dementia, little research has been conducted in this area. Using the methods of modern psychometric theory (in addition to those of classical test theory), we examined the performance of the Attention subscale of the Mattis Dementia Rating Scale. Several item response theory models, including the two-and three-parameter dichotomous response logistic model, as well as a polytomous response model were compared. (Log-likelihood ratio tests showed that the three-parameter model was not an improvement over the two-parameter model.) Data were collected as part of the ten-study National Institute on Aging Collaborative investigation of special dementia care in institutional settings. The subscale KR-20 estimate for this sample was 0.92. IRT model-based reliability estimates, provided at several points along the latent attribute, ranged from 0.65 to 0.97; the measure was least precise at the less disabled tail of the distribution. Most items performed in similar fashion across education groups; the item characteristic curves were almost identical, indicating little or no di!erential item functioning (DIF). However, four items were problematic. One item (digit span backwards) demonstrated a large error term in the con"rmatory factor analysis; item-"t chi-square statistics developed using BIMAIN con"rm this result for the IRT models. Further, the discrimination parameter for that item was low for all education subgroups. Generally, persons with the highest education had a greater probability of passing the item for most levels of . Model-based tests of DIF using MULTILOG identi"ed three other items with signi"cant, albeit small, DIF. One item, for example, showed non-uniform DIF in that at the impaired tail of the latent distribution, persons with higher education had a higher probability of correctly responding to the item than did lower education groups, but at less impaired levels, they had a lower probability of a correct response than did lower education groups. Another method of detection identi"ed this item as having DIF (unsigned area statistic"3.05, p(0.01, and 2.96, p(0.01). On average, across the entire score range, the lower education group's probability of answering the item correctly was 0.11 higher than the higher education group's probability. A crossvalidation with larger subgroups con"rmed the overall result of little DIF for this measure. The methods used for detecting di!erential item functioning (which may, in turn, be indicative of bias) were applied to a neuropsychological subtest. These methods have been used previously to examine bias in screening measures across education and ethnic and racial subgroups. In addition to the important epidemiological applications of ensuring that screening measures and neuropsychological tests used in diagnoses are free of bias so that more culture-fair classi"cations will result, these methods are also useful for the examination of site di!erences in large multi-site clinical trials. It is recommended that these methods receive wider attention in the medical statistical literature.