Abstract
This paper presents a fast algorithm and an accelerated toolbox1 for data visualization. The visualization is stated as an assignment problem between data samples and the same number of given visualization points. The mapping function is approximated by an Extreme Learning Machine, which provides an error for a current assignment. This work presents a new mathematical formulation of the error function based on cosine similarity. It provides a closed form equation for a change of error for exchanging assignments between two random samples (called a swap), and an extreme speed-up over the original method even for a very large corpus like the MNIST Handwritten Digits dataset. The method starts from random assignment, and continues in a greedy optimization algorithm by randomly swapping pairs of samples, keeping the swaps that reduce the error. The toolbox speed reaches a million of swaps per second, and thousands of model updates per second for successful swaps in GPU implementation, even for very large dataset like MNIST Handwritten Digits.
Original language | English |
---|---|
Peer-reviewed scientific journal | Neurocomputing |
Volume | 205 |
Issue number | September |
Pages (from-to) | 247-263 |
Number of pages | 17 |
ISSN | 0925-2312 |
DOIs | |
Publication status | Published - 2016 |
MoE publication type | A1 Journal article - refereed |
Keywords
- 512 Business and Management
- Visualization
- Nonlinear Dimensionality Reduction
- Cosine Distance
- Extreme Learning Machines
- Big Data
- Projection