TY - JOUR
T1 - Generating insights through data preparation, visualization, and analysis
T2 - Framework for combining clustering and data visualization techniques for low-cardinality sequential data
AU - Nestorov, Svetlozar
AU - Jukić, Boris
AU - Jukić, Nenad
AU - Sharma, Abhishek
AU - Rossi, Sippo
N1 - Publisher Copyright:
© 2019
PY - 2019/8
Y1 - 2019/8
N2 - In this paper, we introduce a novel approach for identifying and testing relationships and patterns on the types of sequential data that are broadly present in a number of different real-world scenarios and environments. The proposed two-phase framework combines data preparation, data visualization and clustering techniques in an innovative way. The first phase of the framework explores the large amount of sequential data in stages that can be undertaken iteratively. Those stages include data preparation, counting and value-based ordering, distribution visualization, and subsequence length determination, confirmation and re-visualization. The second phase of the framework explores sequence differences, based on motifs, between data cohorts that are created using descriptive attributes, and visualizes the changes over time and different attribute values. To illustrate the analytical power of the proposed framework, we present a comprehensive example that applies the framework on a large formally-maintained research data set collected and managed by the US Census Bureau. The framework, and the presented example, utilize visualization as an analytics tool and not just a presentation accessory.
AB - In this paper, we introduce a novel approach for identifying and testing relationships and patterns on the types of sequential data that are broadly present in a number of different real-world scenarios and environments. The proposed two-phase framework combines data preparation, data visualization and clustering techniques in an innovative way. The first phase of the framework explores the large amount of sequential data in stages that can be undertaken iteratively. Those stages include data preparation, counting and value-based ordering, distribution visualization, and subsequence length determination, confirmation and re-visualization. The second phase of the framework explores sequence differences, based on motifs, between data cohorts that are created using descriptive attributes, and visualizes the changes over time and different attribute values. To illustrate the analytical power of the proposed framework, we present a comprehensive example that applies the framework on a large formally-maintained research data set collected and managed by the US Census Bureau. The framework, and the presented example, utilize visualization as an analytics tool and not just a presentation accessory.
KW - 512 Business and Management
KW - clustering
KW - data preparation
KW - data visualization
KW - low-cardinality data
KW - motifs
KW - sequential data
UR - http://www.scopus.com/inward/record.url?scp=85070745669&partnerID=8YFLogxK
U2 - 10.1016/j.dss.2019.113119
DO - 10.1016/j.dss.2019.113119
M3 - Article
AN - SCOPUS:85070745669
SN - 0167-9236
VL - 125
JO - Decision Support Systems
JF - Decision Support Systems
M1 - 113119
ER -