There is now widespread recognition that it is possible to extract previously unknown knowledge from large datasets using machine learning techniques. As the use of machine learning for exploratory data analysis has increased, so have the sizes of the datasets they must face (Giga and Terabyte datasets are common place) and the sophistication of the algorithms themselves. For this reason there is a growing body of research concerned with the use of parallel computing for data mining.

The aim of this project is to produce a super-computing data mining resource for use by the UK academic community which utilises a number of advanced machine learning and statistical algorithms for large datasets. In particular, a number of evolutionary computing-based algorithms and the ensemble machine approach will be used to exploit the large-scale parallelism possible in super-computing. This purpose is embodied in the following objectives:

  1. to develop a massively parallel approach for commonly used statistical and machine learning techniques for exploratory data analysis
  2. to develop a massively parallel approach to the use of evolutionary computing techniques for feature creation and selection
  3. to develop a massively parallel approach to the use of evolutionary computing techniques for data modelling
  4. to develop a massively parallel approach to the use of ensemble machines for data modelling consisting of many well-known machine learning algorithms;
  5. to develop an appropriate super-computing infra-structure to support the use of such advanced machine learning techniques with large datasets.

UWE will develop the evolutionary computing-based components, UEA will develop the statistical and machine learning components and oversee the implementation of ensemble techniques, Manchester will develop the underlying infrastructure to interface the learning algorithms to the large datasets, and all partners will be involved in the system evaluation phases.

References

  1. Whittley, I.M. and Bagnall, A.J. and Bull, L. and, Attribute Selection Methods for Filtered Attribute, Feature Selection for Data Mining Workshop, Part of the, 2006 Download file (PDF, 101KB).
  2. Bull, L. and Studley, M. and Bagnall, A.J. and Whittley, On the use of Rule Sharing in Learning Classifier System, Proceedings of the 2005 Congress on Evolutionary, 2005 Download file (PDF, 249 KB).

Research Team

Dr Tony Bagnall, Dr. Ian Whittley