Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. Clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets.

In these publications we demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments and theoretical results illustrate how clipped series can be used in increasingly complex ways to achieve better results than with other popular techniques. The usefulness of the representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable run length encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.

References

  1. Bagnall, A.J. and Janacek, G.J., Clustering time series with clipped data, Machine Learning, volume 58(2), pp. 151-178, 2005.
  2. Bagnall, A.J. and Smith, G.D., A Multi-Agent Model of the UK Market in Electricity, IEEE Transactions on Evolutionary Computation, volume 9(5), 2005.
  3. Bagnall, A.J. and Toft, I.E., Zero Intelligence Plus and Gjerstad-Dickhaut Agents for, Workshop on Trading Agent Design and Analysis, part of the, New York, USA, pp. 59-64, 2004.
  4. Bagnall, A.J. and Cawley, G.C., Learning classifier systems for data mining: A comparison, Proceedings of the IEEE/INNS International Joint, volume 3, Portland, Oregon, USA, pp. 1802-1807, 2003.
  5. Bagnall, A.J. and Janacek, G.J. and Zhang, M., Clustering Time Series from Mixture Polynomial Models, 24th Sept 2003.

Research Team

Dr Tony Bagnall, Dr. Gareth Janacek