The Health Improvement Network (THIN) is a scheme for collecting anonymised patient data from General Practices across the UK.  In this project such data was used to assess cardiovascular disease risk models.

There are more than 330 GP practices that have joined the scheme providing data on over 5.5 million patients.  Of those, there are about 2.5 million patients that are actively registered with the practices and can be prospectively followed.  Most of the contributing practices have recorded over 15 years of data on their system.  The information recorded includes demographic data such as sex, age and family identifiers, diagnoses (included some secondary care information), prescriptions, and information on life style (e.g. alcohol intake and smoking habits), tests and lab results.  In addition, each patient is associated with a census evaluation area that provides information on socioeconomic conditions including ethnicity and environmental standards.

The potential to use such large open cohort studies for medical research is great and could unlock important clinical answers for a number of conditions such as Cardio Vascular Disease (CVD), asthma, diabetes, obesity, osteoporosis, etc.

fig 1 resub

Primary care databases, where routine data collection takes place for purposes other than particular clinical investigations, have characteristics that require novel analysis techniques.  Techniques that have been proposed under the new discipline of data mining often include variable selection as part of the modeling process, and can perform hypothesis generation from the data.  They are specifically designed to cope with large, messy and uncertain databases.

The Project

We performed an initial study, at the request of the Information Centre for Health and Social Care.  The study involved the application of current CVD risk models to the THIN dataset.   CVD remains a major cause of illness and death worldwide. Many of the risk factors predisposing people to CVD are known, and there are a number of survival models to identify those who could benefit from preventive treatment.  We have assessed 3 models for CVD risk using the THIN database: the standard Framingham model, and the more recent ASSIGN and QRISK models. The latest model, QRISK, was developed using a primary care database and our work has reinforced the current debate over analysis methodology in such studies. 


The project has represented an important collaboration from a number of experts:

  • John Potter, Professor of Ageing & Stroke (MED, UEA) 
  • Dr Jane Skinner  (MED, UEA)
  • Neil Poulter, Professor of Preventive Cardiovascular Medicine (International Centre for Circulatory Health, National Heart & Lung Institute, Imperial College, London)
  • The Information Centre for Health and Social Care have participated.


De La Iglesia, B, Potter, JF, Poulter, NR, Robins, MM and Skinner, J (2011) Performance of the ASSIGN cardiovascular disease risk score on a UK cohort of patients from general practice. Heart, 97 (6). pp. 491-499. ISSN 1355-6037

Research Team

Dr. Beatriz de la Iglesia, Dr. Margaret Robins