The sharing of data for the purposes of data mining or the sharing of data mining results is highlighting concerns about privacy issues.  It is very important to develop methods that enable organisations to share data with third parties for the purpose of analysis without disclosing any private information.  This has motivated the study of Privacy-Preserving Data Mining (PPDM) methods. 

Through this project we have studied the concerns of privacy preservation in the context of data mining. After examining current methods, we have established non-metric Multidimensional Scaling (MDS) as a method for data perturbation that can preserve the utility of the data from the point of view of distance-based data mining algorithms (e.g. Nearest Neighbour, SVM, clustering) while hiding all the private information.  Fig. 1 shows an example of some perturbed data using non-metric MDS. 

Fig. 1: An example shows the effect of the non-metric MDS perturbation on the geometry of ‘'Nefertiti'' face at different dimensions. The top left is the original face. The following faces are the perturbed faces at n-5, n-10, n-20, n-30, n-40, n-50 and n-60 dimensions, respectively.

The project has tested various attack scenarios to establish how much privacy could be compromised depending on the information that is known apriori by the attacker. The project has also tested the accuracy of data mining models generated on original and perturbed data. The results show that Non-metric MDS is able to deliver very good models (very close in performance to those obtained with the original data) while disclosing very little information to an attacker, even when the attacker has some knowledge about the original or perturbed data.  This is because the non-metric transformation is based on the rank-order of distances between objects and not on the distance information itself.  This increases the uncertainty on the placement of the original points in the perturbed low-dimensional space thereby hiding the detail of the original data values or their inter-distances. 

References

1. Alotaibi, Khaled and De La Iglesia, Beatriz (2013) Privacy-Preserving. SVM Classification using Non-metric MDS. In: SECURWARE 2013, the Seventh International Conference on Emerging Security Information, Systems and Technologies. IARIA, pp. 30-35.

2. Alotaibi, K., Rayward-Smith, V. J., Wang, W. and De La Iglesia, B. (2012). Non-linear dimensionality reduction for privacy-preserving data classification. In: Proceedings - 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust and 2012 ASE/IEEE International Conference on Social Computing, SocialCom/PASSAT 2012. UNSPECIFIED, pp. 694-701. ISBN 9780769548487

3. Alotaibi, K, Rayward-Smith, V and De La Iglesia, B (2011). Non-metric Multidimensional Scaling for Privacy-Preserving Data Clustering. In: Intelligent Data Engineering and Automated Learning - IDEAL 2011. Spring, pp. 287-298.

4. Alotaibi, K., Rayward-Smith, V. and de la Iglesia, B. (2012). Non-metric  Multidimensional scaling: A perturbation model for privacy-preserving data clustering. Under review.

Research Team

Dr Beatriz de la Iglesia, Khaled Alotaibi