Computing Sciences
Currently available projects
A New Approach to Audio-Visual Speech Recognition
- School:
Computing Sciences
- Primary Supervisor:
Professor Stephen Cox
Information
- Start date: October 2013
- Programme: PhD
- Mode of Study: Full Time
- Studentship Length: 3 years
How to Apply
- Deadline: 28 February 2013
- Apply online
Fees & Funding
- Funding Status: Competition Funded Project (EU Students Only)
Further Details - Funding Source: Funding is available from a number of different sources
- Funding Conditions:
Funding is available to EU students. If funding is awarded for this project it will cover tuition fees and stipend for UK students. EU students may be eligible for full funding, or tuition fees only, depending on the funding source.
Funding is available to EU students. If funding is awarded for this project it will cover tuition fees and stipend for UK students. EU students may be eligible for full funding, or tuition fees only, depending on the funding source.
- Fees: Fees Information (Opens in new window)
Entry Requirements
- Acceptable First Degree:
Computer Science, Mathematics, any Science subject that has included study of programming
- Minimum Entry Standard: 2:1
Project Description
Audio-visual speech recognition (AVSR) has the potential to improve the quality of speech recognition when the speech is uttered in a noisy environment, and is timely now that cameras are ubiquitous on mobile devices. Our Lab has been researching lip-reading for several years, and we have a good understanding of techniques that are useful for extracting speech information from visual signals: so far, these have not been integrated into an audio-visual speech recognition system.
Research in AVSR has tended to concentrate on how to combine the audio and video feature streams or how to use the outputs from separate audio and video recognisers. Little attention has been paid to the fact that much of the information in the audio signal is actually missing in the visual signal, and so it makes little sense to apply traditional audio recognition techniques to a visual signal. Recent work from our Lab suggests that a better strategy for using the visual signal in speech recognition would be to make use of "islands of certainty" or "landmarks" in the signal where the lips can provide useful complementary information to the audio signal. It appears that lip readers make use of such speech cues e.g. lip closures, lip-rounding, as well as visual patterns for high frequency words such as "yes", "okay"), etc. This approach has similarities to recent work in speech recognition which is also concerned itself with the detection of reliable landmarks in the speech signal and utilizing these with advanced machine-learning techniques to decode the signal.
References
Landmark-based speech recognition: report of the 2004 Johns Hopkins Summer Workshop. Mark Haswgawa-Johnson et al. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638080/ Accessed October 31st 2012
Apply online


