This page contains data and code related to the Journal of Classification paper This page contains data and code related to the Journal of Classification paper

A. Bagnall and G. Janacek, A run length transformation for discriminating between auto regressive time series, Journal of Classification, Online First, 2013

Abstract: We describe a simple time series transformation to detect differences in series that can be accurately modelled as stationary autoregressive (AR) processes. The transformation involves forming the histogram of above and below the mean run lengths. The run length (RL) transformation has the benefits of being very fast, compact and updatable for new data in constant time. Furthermore, it can be generated directly from data that has already been highly compressed. We first establish the theoretical asymptotic relationship between run length distributions and AR models through consideration of the zero crossing probability and the distribution of runs. We benchmark our transformation against two alternatives: the truncated Autocorrelation function (ACF) transform and the AR transformation, which involves the standard method of fitting the partial autocorrelation coefficients with the Durbin-Levinson recursions and using the Akaike Information Criterion stopping procedure. Whilst optimal in the idealised scenario, representing the data in these ways is time consuming and the representation cannot be updated online for new data. We show that for classification problems the accuracy obtained through using the run length distribution tends towards that obtained from using the full fitted models. We then propose three alternative distance measures for run length distributions based on Gower's general similarity coefficient, the likelihood ratio and dynamic time warping (DTW). Through simulated classification experiments we show that a nearest neighbour distance based on DTW converges to the optimal faster than classifiers based on Euclidean distance, Gower's coefficient and the likelihood ratio. We experiment with a variety of classifiers and demonstrate that although the RL transform requires more data than the best performing classifier to achieve the same accuracy as AR or ACF, this factor is at worst non-increasing with the series length, $m$, whereas the relative time taken to fit AR and ACF increases with $m$. We conclude that if the data is stationary and can be suitably modelled by an AR series, and if time is an important factor in reaching a discriminatory decision, then the run length distribution transform is a simple and effective transformation to use.

The Runlength Transform

Is very simple and can be explained with the following picture.

Our basic message is that crucial information relating to autocorrelation is retained in the run lengths histogram.

The Data

The case study we use is from the paper  Peng et al, Exaggerated Heart Rate Oscillations During Two Meditation Techniques, International Journal of Cardiology 70, (1999) 

and was downloaded from PhysioNet

 

It can also be found here.

The Code

The code to reproduce the experiments presented in the paper is available here.

The code is password protected. To get the password, please read the points below, and if you agree to them, email Tony

  • Do not share the password with others.
  • The classes we have added to the weka toolkit are released under the GNU General Public Usage licence. http://www.gnu.org/licenses/gpl.html
  • Do not modify any of our classes. If you want to alter the algorithm, please extend one of our classes or use the code as a basis for your own class. If you want to add code to the release please get in touch.
  • If you are a postgraduate student/post-doc, you must discuss this with your supervisor first and CC him/her when requesting the password.
  • You do not mind being listed as someone who has downloaded the code.
  • If you use the code, please reference 

A. Bagnall and G. Janacek, A run length transformation for discriminating between auto regressive time series, Journal of Classification, Online First, 2013