Feature construction using genetic programming is carried out to study the effect on the performance of a range of classification algorithms with the inclusion of a single attribute evolved using the GP.
Four different fitness functions are used in the genetic program, based on information gain, the gini index, Chi-squared test and a combination of these. The classification algorithms used are three classification tree algorithms, namely C5, CART, CHAID and an MLP neural network. The intention of the research is to ascertain if the decision tree classification algorithms benefit more using features constructed using a GP whose fitness function incorporates the same fundamental learning mechanism as the splitting criteria of the associated decision tree.
Take for example the decision tree opposite. This tree has been induced from the Wine data set, which has 13 attributes (f1 to f13) and 3 classes.
The evolved attribute appears at the root of the tree, being the most predictive. In almost all cases, the evolved attribute separates samples that belong to Class 2 from the rest, while f7 perfectly separates samples belonging to Class 1 from those belonging to Class 3.
The induced tree opposite achieved 100% accuracy on both training and test sets.
Here we see the tree that was induced, using either C5 or CART, for the Balnce data set.
Once again the evolved attribute appears at the root. In fact, it is used twice in the tree and no original attribute appears. This reflects the method that was used to generate the data in the first place.
This tree also achieves 100% accuracy in both training and test sets.
- Muharram, M.A. and Smith, G.D., Evolutionary Constructive Induction, IEEE Trans. On Knowledge and Data Engineering, volume 17(11), pp. 1518-1528, 2005, 1041-4347
- Muharram, M.A. and Smith, G.D., Evolutionary feature construction using information gain, 7th European Conference on Genetic Programming, Euro GP, volume LNCS 3003, Edited by Keijzer, M. and O'Reilly, U-M. and Lucas, S. and Costa, E., Springer, Berlin, Coimbra, Portugal, pp. 379-388, 2004
- Muharram, M.A. and Smith, G.D., The Effect of Evolved Attributes on Classification, 16th Australian Conference on AI, volume LNAI 2903, Edited by Gedeon, T.D. and Fung, L.C.C, Perth, Australia, pp. 933-941, 2003
Dr George Smith, Dr. Mohammed Muharram, Prof Vic Rayward Smith