The use of routinely collected medical data for clinical studies is becoming more topical. Millions of records collected on patients and their interactions with the Hospital Information Systems (HISs) can contain untapped information on disease progression, efficacy of different treatments or cost-effectiveness of competing treatments. We aim to enable the data mining analysis of such data.

We have work extensively in various medical data mining project. In our latest project, in collaboration with the Norwich and Norfolk University Hospital, we have developed a methodology for collecting and preparing for analysis routinely collected medical data and we have applied the methodology to a case study on prostate cancer.

The first part of the project consisted of the development of a methodology for retrieving and collating patient-centric data from multiple HIS for the purpose of creating a research database. The methodology, depicted on fig. 1, enabled the collection of data from different HISs to create meta-databases and study databases specific to a disease of interest.

The methodology was put to the test by the retrieval of data pertaining to prostate cancer patients.  Through this specific case study, data was collected on over 1,900 patients diagnosed with prostate cancer.  Data was extracted from a large UK hospital from eight different HIS, validated and complemented with information from the local cancer registry.    We then created a framework for the construction, visualisation and quality assessment of anonymised clinical pathways for prostate cancer patients. The result was an individual clinical pathway for each of the 1,900+ patients containing information relating to the clinical biomarker, PSA, together with information on demographics, diagnosis, treatment, and outcome. The software produced to compile the pathways enables interactive visualizations of each individual pathway as shown in fig. 2.

Some of our other work has demonstrated the suitability of data mining techniques to uncover clinically usable information from routinely collected data.  For example, we have applied simple rule induction techniques to extract rules for the assessment of risk of orthostatic hypotension. The rules extracted can easily be applied in a clinical setting and may help in the prevention of falls and fractures in elderly patients at high risk of orthostatic hypotension. 

The methodology for data collection has also been used to create a comprehensive stroke database, enabling further research in this area.

We will now embark on some data mining/process mining analysis of the prostate pathway data that may include looking at similarity of pathways, compliance with NICE published pathway for prostate cancer, survival analysis for different types of pathways, correlation of PSA trends and outcomes and others. 

Previous health informatics projects

We have worked on a number of projects analysing medical data.An important project was concerned with the analysis of primary care data to investigate models of Cardiovascular Disease Risk.  This project highlighted the rich data resource that is now available through primary care anonymised databases such as THIN.

Previous projects have also demonstrated the ability of data mining techniques to extract patterns from medical data. For example, a case study in diabetes found indicators of early death risk for diabetic patients that were also found simultaneously by more traditional medical research studies.

We have also used standard and text mining techniques to data mine gastroenterology data from the local Norwich and Norfolk University Hospital.  The analysis of textual sources (e.g. discharge summaries or procedural reports) is also very topical and has great application in the medical domain.


  1. Bettencourt-Silva, J, De La Iglesia, B, Donell, S and Rayward-Smith, V (2011) On creating a patient-centric database from multiple Hospital Information Systems in a National Health Service secondary care setting. Methods of Information in Medicine. pp. 6730-6737. ISSN 0026-1270
  2. De La Iglesia, B, Ong, ACL, Potter, JF, Metcalf, AK and Myint, PK (2012) Predictors of orthostatic hypotension in patients attending a transient ischaemic attack clinic: Database study. Blood pressure, Epub ahead of print. pp. 1-8. ISSN 1651-1999
  3. Bettencourt-Silva, JH, Clark, J, Cooper, CS, Mills, R, Rayward-Smith, VJ, De la Iglesia, B (2013) Building data-driven pathways from routinely collected hospital data: a case study on prostate cancer. Manuscript in preparation.

Research team

Dr Beatriz de la Iglesia, Joao H. Bettencourt-Silva


  • Dr. Clark, Jeremy (School of Biological Sciences, University of East Anglia, Norwich)
  • Prof. Cooper, Colin S (School of Biological Sciences, University of East Anglia, Norwich)
  • Mr. Mills, Robert (Urology Department, Norfolk & Norwich University Hospital, Norwich)
  • Prof. Rayward-Smith, Victor John (School of Computing Sciences)