This paper proposes a methodology using a fast variable selection as a modified version of the Forward-Backward algorithm. This methodology is adapted to the specificities of the data used: very small number of samples and high number of variables. Such data is generated using underlying dependencies and seasonality assumptions, from Meme phrases volume data. By the use of a resampling technique along with the proposed variable selection scheme, significant results are obtained, and the test Normalized Mean Square Error performances are improved. The results indicate that with the assumptions made on the data structure, variable selection is desirable. Also, the obtained information on the selected variables seem to cluster the time series in two very different classes: a set of approximately 600 series, which yield good NMSE, and seem to require very similar sets of variables for the prediction; and another set of 300 - 400 series, for which only the previous series value is of interest for the prediction. This first analysis clearly illustrates the future need to perform a more thorough analysis of the selected variables for each of the batch of series. Also, taking a close look at the possible dependences between the series inside a batch should give information as to why and how they are similar and have found themselves to be grouped under the same batch.
|Title of host publication||5th International Conference on PErvasive Technologies Related to Assistive Environments, PETRA 2012 - Conference Program|
|Publication status||Published - 01.12.2012|
|MoE publication type||A4 Article in conference proceedings|
|Event||5th International Conference on PErvasive Technologies Related to Assistive Environments, PETRA 2012 - Heraklion, Crete, Greece|
Duration: 06.06.2012 → 08.06.2012