Abstract
This paper proposes a methodology for identifying data samples that are likely to be mislabeled in a c-class classification problem (dataset). The methodology relies on an assumption that the generalization error of a model learned from the data decreases if a label of some mislabeled sample is changed to its correct class. A general classification model used in the paper is OP-ELM; it also provides a fast way to estimate the generalization error by PRESS Leave-One-Out. It is tested on two toy datasets, as well as on real life datasets for one of which expert knowledge about the identified potential mislabels has been sought.
| Original language | English |
|---|---|
| Peer-reviewed scientific journal | Neurocomputing |
| Volume | 159 |
| Issue number | July |
| Pages (from-to) | 242-250 |
| Number of pages | 9 |
| ISSN | 0925-2312 |
| DOIs | |
| Publication status | Published - 14.02.2015 |
| MoE publication type | A1 Journal article - refereed |
Keywords
- 512 Business and Management
- Mislabels
- Extreme Learning Machine
- Classification