The research was concerned with developing efficient algorithms for finding classification rules from large databases. Current methods for classification rule induction work by placing constraints on the search space to keep the problem tractable. This research was funded by the ESPRC, grant number GR/T04298/01.
For example, only simple conjunctive rules using nominal attributes may be constructed. Alternatively, the search space may be constrained with the use of minimum support and confidence constraints. These approaches may lead to loss of information and weaker class descriptions. There are problems of scalability for large databases. Additionally, the number of rules found is frequently large, and many of them may describe the same population.
We developed multi-objective evolutionary algorithms to perform a search for non-overlapping strong rules containing numerical or nominal attributes. Our results are very promising. An extension of the techniques to find non-overlapping strong rules has also been developed to provide a unique rule induction mechanism; this coupled with an emphasis on scalability and efficiency has contributed to the development of new state-of-the-art data mining algorithms. Adaptation to cope with missing or uncertain data, and with complex data will provide an invaluable tool for medical data mining, where data often presents those characteristics.
The objectives of the project were:
- Develop efficient algorithms for finding sets of strong rules (i.e. those of high accuracy and coverage) in large databases by using multi-objective optimisation techniques.
- Extend the techniques to find sets of non-overlapping strong rules (i.e. those that cover different records within the database).
- Extend the rule finding capabilities of the algorithm by interfacing with All Rule Algorithms (ARA) to deliver all non-overlapping strong rules in certain areas of the search space delimited by initial search with the multi-objective algorithm.
- Extend the techniques to deal with uncertain data effectively.
- Extend the techniques to deal with ontologies.
- Enhance the scalability of the algorithm by incorporation of automatic Feature Subset Selection and sampling mechanisms.
- Apply the new algorithms to a number of available case studies containing medical data.
- Write several peer-reviewed papers describing both technical aspects and clinical application.
How the objectives were met (pdf 38 KB).
- de la Iglesia, B., Richards, G., Philpott, M.S. and Rayward-Smith, V.J. The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. EJORS, 169:3, pp 898-917, 2006.
- Reynolds, A., and de la Iglesia, B., A Multi-Objective GRASP for Partial Classification, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 13(3). pp.227-243
- Reynolds, A.P., Richards, G., de la Iglesia, B., Rayward-Smith, V.J. Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms. Journal of Mathematical Modelling and Algorithms, 5:4, pp 475-504 , 2006.
- de la Iglesia, B., Reynolds, A., Rayward-Smith, V.J. Developments on a Multi-objective Metaheuristic (MOMH) Algorithm for Finding Interesting Sets of Classification Rules, Lecture Notes in Computer Science, Volume 3410, Pages 826 – 840, 2005.
- Reynolds,A.P. and de la Iglesia, B., Rule Induction Using Multi-Objective Metaheuristics: Encouraging Rule Diversity (Winner of Best Session), IJCNN 2006, pp 6375-6382, 2006.
- Reynolds, A.P. and de la Iglesia, B.Rule Induction for Classification Using Multi-Objective
- Genetic Programming. Proceedings of the 4th International Evolutionary Multi-Criterion Optimization Conference (EMO 2007), LNCS 4403, pp. 516-530, Matsushima, Japan, 2007.
- Reynolds, A.P. and de la Iglesia, B., Managing Population Diversity Through the Use of Weighted Objectives, Proceedings of the 2007 IEEE Symposium on Computational, pp. 99-106, 2007.
- de la Iglesia, B. Application of Multi_objective Metaheuristic Algorithms in Data Mining, Proceedings of the Third UK Knowledge Discovery and Data Mining Symposium (Invited Talk), Expert Update, Expert Update, Autumn 2007, Vol. 9, No. 3, ISSN: 1465-4091,43-48.
MOMH algorithms for partial classification - results files by Alan Reynolds on 12/11/2007
"A Multi-Objective GRASP for Partial Classification", gives only a brief summary of results. Further details are provided here.
Dr Beatriz de la Iglesia, Dr. Alan Reynolds