Arbitrary Category Classification of Websites Based on Image Content

Anton Akusok, Yoan Miche, Juha Karhunen, Kaj-Mikael Björk, Rui Nian, Amaury Lendasse

Research output: Contribution to journalArticleScientificpeer-review

23 Citations (Scopus)

Abstract

This paper presents a comprehensive methodology for general large-scale image-based classification tasks. It addresses the Big Data challenge in arbitrary image classification and more specifically, filtering of millions of websites with abstract target classes and high levels of label noise. Our approach uses local image features and their color descriptors to build image representations with the help of a modified k-NN algorithm. Image representations are refined into image and website class predictions by a two-stage classifier method suitable for a very large-scale real dataset. A modification of an Extreme Learning Machine is found to be a suitable classifier technique. The methodology is robust to noise and can learn abstract target categories; website classification accuracy surpasses 97% for the most important categories considered in this study.
Original languageEnglish
Peer-reviewed scientific journalIEEE Computational Intelligence Magazine
Volume10
Issue number2
Pages (from-to)30-41
Number of pages12
DOIs
Publication statusPublished - 09.04.2015
MoE publication typeA1 Journal article - refereed

Keywords

  • 512 Business and Management
  • Classification
  • Large-scale systems
  • Image representation
  • Noise measurement
  • Big data
  • Image classification
  • Image color analysis

Fingerprint

Dive into the research topics of 'Arbitrary Category Classification of Websites Based on Image Content'. Together they form a unique fingerprint.

Cite this