Since clinical gastroenterological endoscopy began to develop in the late 1960s, there has been a very large increase on the number of procedures being performed each year. It is estimated that in the UK over 1% of the adult population undergoes an upper gastrointestinal endoscopic procedure every year. Other commonly performed gastrointestinal procedures include flexible sigmoidoscopy, colonoscopy and ERCP (endoscopic retrograde cholangio-pancreatography).

This project was a collaboration with the endoscopy unit at the Norfolk and Norwich University Hospital (NNUH), one of the busiest units in East Anglia. In 2003, they conducted over 9250 procedures (6091 gastroscopies, 1693 colonoscopies, 1144 flexible sigmoidoscopies and 361 ERCPs).  

National prospective audits with 30 day morbidity and mortality figures for both upper and lower GI endoscopy have suggested that over 50% of the deaths and serious complications are a) cardiopulmonary and b) relate to the dose of sedation used. The evidence also suggests that elderly patients require only a fraction of the dosages of sedative and analgesic drugs that fit younger patients need and that the effects of combination of benzodiazepines and opioids are synergistic rather than additive. The most vulnerable group are sick elderly patients in whom much evidence suggests that sedative dosage is in many units dangerously high. An initial data mining study focused on the use of analgesics and discovered wide variation between clinicians.
Fig 1 data mining in gastroenterology
Figure 1: Decision tree for describing sedation dosage classification

Another key projects involved text mining. A large amount of crucial clinical information is at the moment presented in the form of unstructured text reports that are attached to other patient data in a number of legacy systems. Most of that information is at present unused in clinical research because of the difficulty and lack of tools for the analysis of textual data. Colonoscopy reports, for example, typically include: used medications and dosages, findings (e.g. presence of polyps, diverticula etc.), description of difficulties in carrying out the procedure (eg. looping in the colon), patient's level of comfort, disposition (e.g. follow-up colonoscopy), etc. Such data could hide some interesting information: relationships between patient age and presence of polyps, influence of medications and dosages on procedure success and safety, findings on follow-up colonoscopy after polyp detection, etc.

Research on classification of colonoscopy reports has led to a number of innovative algorithms combining clustering as a pre-processing step with document classification to produce highly accurate document classification.  The combination of clustering and classification has significantly increased classification accuracy in other domains, where we have also tested its efficacy. 
Efforts are now underway to secure funding to continue working on this very promising area.
Fig 2 data mining in gastroenterology
Figure 2: Automatically calculated success rate for colonoscopy procedures using text mining versus gold standard classification performed manually. 
 

References

  1. Saad, Fathi H., Bell, G. Duncan and de la Iglesia, Beatriz (2008) Classification techniques with minimal labelling effort and application to medical reports. International Journal of Data Mining and Bioinformatics, 2 (3). pp. 268-287. ISSN 1748-5673
  2. F. H. Saad, B. de la Iglesia, and G. D. Bell, Effect of Document Representation on the Performance of Medical Document Classification, Proceedings of the 2006 International Conference on Data Mining(DMIN-06), Las Vegas, USA, 2006. 
  3. F. H. Saad, B. de la Iglesia, and G. D. Bell, A Comparison of Two Document Clustering Approaches for Clustering Medical Documents, Proceedings of the 2006 International Conference on Data Mining(DMIN-06), Las Vegas, USA, 2006.
  4. F. Saad, B. de la Iglesia and  G. D. Bell. Comparison of Document Classification Techniques to Classify Medical Reports, W. K. Ng, M. Kitsuregawa and J. Li (Eds.): PAKDD 2006, Lecture Notes in Computing Science 3918, pp. 285-291.
  5. Reynolds, A. P., de la Iglesia, B., Bell, G. D., Cook, V. J. and Tighe, R. (2005) To be or not to be sedated? The effect of age and gender on an individual patient's likely decision. In: British Society of Gastroenterology Annual Meeting, 2006-03-20 - 2006-03-23, Birmingham.
  6. Reynolds, A. P., de la Iglesia, B., Bell, G. D., Sheikh, K., Cook, V. J. and Tighe, R. (2005) Monitoring colonoscopy success rates and detecting changes in sedation practice using data mining and statistical techniques: Figures from a regional training centre. In: British Society of Gastroenterology Annual Meeting, 2006-03-20 - 2006-03-23, Birmingham.
  7. Sheikh, K., Reynolds, A. P., de la Iglesia, B., Bell, G. D. and Tighe, R. (2005) Data mining techniques can be used to rapidly interrogate an endoscopy database and calculate 'adjusted' colonoscopy success or failure rates - but what criteria should be used to define such success? In: British Society of Gastroenterology Annual Meeting, 2006-03-20 - 2006-03-23, Birmingham.
  8. de la Iglesia, B., Hsu, C., Bell, G. D. and Rayward-Smith, V. J. (2004) Data mining techniques applied to an Endoscopy Database: What Additional Information Might it Generate? In: British Society of Gastroenterology Annual Meeting, 2004-03-21 - 2004-03-24, Glasgow, Scotland.

Research Team

Dr. Fathi H Saad, Prof. G. D. Bell, Dr. Alan Reynolds, Dr. Beatriz de la Iglesia