It is estimated that 150 million people have diabetes world wide, and that this number may double by the year 2025. Most people with diabetes in developed countries will be aged 65 years or more, in developing countries most will be in the 45-64 year age bracket and affected in their most productive years. Diabetes increases the risk of heart disease, stroke, amputation, kidney failure, blindness and early mortality.

There is no cure for diabetes. However, the condition can be managed and early treatment can minimise the complications described. A key factor in providing early treatment is to identify those most at risk of complications at an early stage. The data mining group at UEA has been working in this area for some time on a collaborative project with St. Thomas' Hospital, London. The work was concerned with identifying those patients most at risk of early mortality. Computerised clinical records on all diabetic patients referred to St. Thomas' Hospital, London since 1973 are stored in the Diabeta 3 clinical information system. At the time of this study there were data on over 21,000 patients collected over 27 years. Conventional hypothesis testing methods can be used to analyse large clinical databases such as Diabeta 3 but it is likely that such databases contain a wealth of ‘hidden' information that may not be found using traditional techniques. Therefore, it was proposed that automated, non hypothesis-driven, data mining methods be used to search for patterns in the data. We used the DataLamp KDD software (developed by the UEA) for rule discovery.

In this study we wished to identify factors that were associated with early mortality, i.e. we wanted rules with the conclusion "died young". Rules extracted showed clearly that those with peripheral neuropathy were most at risk of premature death. This result was quite unexpected and original. Current research and teaching on outcome in people with diabetes identifies cardiac risk factors as being the most likely indicators of early mortality. The data mining study occurred in parallel with the independent analysis of a cohort of 1,000 patients with diabetes re-examined after ten years. This analysis also identified peripheral neuropathy as the most important risk factor for premature death. The study was limited to a small portion of the available data and identified only associations with early mortality. There is huge scope for further valuable research in this area in terms of widening the objectives of the analysis and using more of the data in the analysis. Further databases on diabetes are available locally but to develop an integrated approach to both data sets may require a diabetes ontology.


Richards, G., Rayward-Smith, V. J., Sönksen, P. H., Carey, S. and Weng, C. (2001) Data mining for Indicators of Early mortality in a Database of Clinical Records. Journal of Artificial Intelligence in Medicine, 22 (3). pp. 215-231. ISSN 0933-3657

Research Team

Dr. Graeme Richards, Prof. Vic Rayward-Smith


Professor Peter Sonksen, St. Thomas' Hospital