This is a joint research proposal that brings together expertise from the Centre for Vision Speech and Signal Processing (CVSSP) at the University of Surrey (UniS), the School of Computing Sciences (CMP) at the University East Anglia (UEA) and the Home Office Scientific Development Branch (HOSDB) to tackle a pertinent and challenging programme of work.
This project will build upon the state-of-the-art in computer vision and speech recognition to investigate and evaluate automated lip-reading from video. The goal is to develop tools and techniques to allow automatic, language independent lipreading of subjects from video streams. The project will also seek to quantify both human ability and automatic ability over varying viewpoints and discourse complexity.
Automatic lip reading presents a number of demanding scientific challenges. This project will address three key scientific questions: what is the relationship between facial gesture and perceived speech? how is that relationship affected by the language of the speaker and the context of the discourse? what is the effect of language, the pose of the speaker and the context of the discourse on the recognition accuracy? To answer these questions the project seeks to fulfil a number of objectives:
- Develop robust tracking and feature extraction methods which can provide consistent levels of recognition regardless of pose
- Develop new algorithms for visual speech classification
- Investigate the inclusion of context, expression and gesture in recognition
- Investigate transferability across language
- Quantify classification across a range of factors such as pose, video quality, language and context for both human and machine.