๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

A visual exploration of the orderliness of TREC relevance judgments

โœ Scribed by Rorvig, Mark


Publisher
John Wiley and Sons
Year
1999
Tongue
English
Weight
152 KB
Volume
50
Category
Article
ISSN
0002-8231

No coin nor oath required. For personal study only.

โœฆ Synopsis


TREC topic specification statements 1-50 are converted to a similarity matrix, scaled, and plotted. Two close topics and two distant topics are selected from within the topic visual field. Subsequent scaling and visualization of documents associated with the close topics reveals a strong mixing of documents from both topic sets. Scaling and visualization of documents associated with the distant topics reveals a bifurcated distribution of documents from both topic sets. Relevant documents in both cases present near the center of both visualizations. Scaling and visualization of documents by multidimensional scaling using a maximum likelihood estimation method is shown to accurately model token similarity relationships among topic specification statements. The implications of these findings for prior critical arguments regarding IR test collections generally, and TREC specifically, by other scholars is examined.


๐Ÿ“œ SIMILAR VOLUMES


Images of similarity: A visual explorati
โœ Rorvig, Mark ๐Ÿ“‚ Article ๐Ÿ“… 1999 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 458 KB

Multiple similarity measures for five TREC topic-document sets from the LDC TREC Collection Disk 1 are derived from the full text of documents. Each measure on each set is scaled using SAS MDS under ordinal, interval, and MLE assumptions. The resulting 75 permutations are ploted. It is suggested tha

Order effects: A study of the possible i
โœ Eisenberg, Michael ;Barry, Carol ๐Ÿ“‚ Article ๐Ÿ“… 1988 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 909 KB

Studies concerned with the evaluation of information systems have typically relied on judgments of relevance as the fundamental measure in determining system performance. In most cases, subjects are asked to assign a relevance score using some category rating scale (l-4, l-11, or simply relevant/non