Abstract
This paper proposes a methodology for identifying data samples that are likely to be mislabeled in a c-class classification problem (dataset). The methodology relies on an assumption that the generalization error of a model learned from the data decreases if a label of some mislabeled sample is changed to its correct class. A general classification model used in the paper is OP-ELM; it also provides a fast way to estimate the generalization error by PRESS Leave-One-Out. It is tested on two toy datasets, as well as on real life datasets for one of which expert knowledge about the identified potential mislabels has been sought.
Original language | English |
---|---|
Peer-reviewed scientific journal | Neurocomputing |
Volume | 159 |
Issue number | July |
Pages (from-to) | 242-250 |
Number of pages | 9 |
ISSN | 0925-2312 |
DOIs | |
Publication status | Published - 14.02.2015 |
MoE publication type | A1 Journal article - refereed |
Keywords
- 512 Business and Management
- Mislabels
- Extreme Learning Machine
- Classification