Identifying and quantifying relevance of input features are particularly useful in data mining when dealing with real-world high dimensional data defined problems.

The conventional methods, such as statistics and correlation analysis, appear to be less effective because the data of such type of problems usually contains high-level noise and the actual distributions of attributes are unknown. This research aims to develop machine learning based methods to identify relevant input features and quantify their general and specified relevance, and then select the relevant features for further modelling analyses including classification, regression, prediction and clustering. We have so far developed two novel methods: neural-net clamping and decision tree path scoring, and applied to them to some real world problems including identifying the risk factors for osteoporosis (see picture) and achieved better results than the conventional methods.  

Research Team

Dr. Wenjia Wang, Geoffrey R. Guile, Jamil Al Shaqsi, Richard Harrison