The problem of time series classification (TSC), where we consider any real-valued ordered data a time series, presents a specific machine learning challenge, since the ordering of the variables is often crucial in finding the best discriminating features.

One of the most promising recently proposed approaches is to find shapelets within a data set. A shapelet is a time series subsequence that is identified as being representative of class membership. The original research in this field embedded the procedure of finding shapelets within a decision tree [1]. A description of the UCR research our work is built on is here.

Shapelets are extracted from 1-D series which are often themselves generated from image outlines. Two shapelets that differentiate the outlines of beetles and flies are shown below.

beetle figurefly

The image on the left demonstrates the extraction of the outline into a 1 dimensional series of distances to the centre. The shapelet discovery algorithm finds the subsequences that best differentiate between classes. The examples above show the top shapelets for discriminating between the Beetle and Fly MPEG7 images (marked in blue on the images). The data available here.

In [2] we propose disconnecting the process of finding shapelets from the classification algorithm by performing a shapelet transformation. We describe a means of extracting the k best shapelets from a data set in a single pass, and then use these shapelets to transform data by calculating the distances from a series to each shapelet. We demonstrate that transformation into this new data space can improve classification accuracy whilst still retaining the explanatory power provided by shapelets. In [3] we assess alternative quality measures for shapelets and in [4] we introduce a post transform clustering algorithm to reduce the number of shapelets in the transformed data set and bring together all the experimental results. On this page we make all our code and 10 new data sets available.

References (full Shapelet bibliography)

[1]Ye, L and Keogh, E., Time series shapelets: a new primitive for data mining Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009.

[2]Lines, J, Davis, L, Hills, J and Bagnall, A, A Shapelet Transform for Time Series Classification In: Proceedings of the 18th International Conference on Knowledge Discovery in Data and Data Mining. preprint pdf, 2012.

[3]Lines, J and Bagnall, A, Alternative Quality Measures for Time Series Shapelets. Lecture Notes in Computer Science, 7435. pp. 475-483. preprint pdf, 2012.

[4]Hills, J, Lines, J, Baranauskas, E, Mapp, J, and Bagnall, A, Time Series Classification with Shapelets, Journal of Data Mining and Knowledge Discovery. online first preprint.pdf, 2013.

Research Team

Dr. Tony Bagnall, Jason Lines, Dr. Jon Hills