✦ LIBER ✦

Bayesian classification using a noninformative prior and mislabeled training data

✍ Scribed by Robert S. Lynch Jr; Peter K. Willett

Publisher: Elsevier Science
Year: 1999
Tongue: English
Weight: 207 KB
Volume: 336
Category: Article
ISSN: 0016-0032
DOI: 10.1016/s0016-0032(99)00006-x

No coin nor oath required. For personal study only.

✦ Synopsis

The average probability of error is used to demonstrate the performance of a Bayesian classi"cation test (referred to as the Combined Bayes Test (CBT)) when the training data of each class are mislabeled. The CBT combines the information in discrete training and test data to infer symbol probabilities, where a uniform Dirichlet prior (i.e., a noninformative prior of complete ignorance) is assumed for all classes. Using the CBT, classi"cation performance is shown to degrade when mislabeling exists in the training data, and this occurs with a severity that depends upon the mislabeling probabilities. With this, it is shown that as the mislabeling probabilities increase MH, which is the best quantization "neness related to the Hughes phenomenon of pattern recognition, also increases. Notice, that even when the actual mislabeling probabilities are known by the CBT it is not possible to achieve the classi"cation performance obtainable without mislabeling. However, the negative e!ect of mislabeling can be diminished, with more success for smaller mislabeling probabilities, if a data reduction method called the Bayesian Data Reduction Algorithm (BDRA) is applied to the training data.

📜 SIMILAR VOLUMES

Classification of semiconductor defects

Classification of semiconductor defects using a small number of training data and qualitative knowledge

✍ Shohei Shimomura; Hajime Igarashi; Takashi Hiroi; Naoki Hosoya; Yasuo Nakagawa 📂 Article 📅 2008 🏛 Wiley (John Wiley & Sons) 🌐 English ⚖ 559 KB