Thu, 10 Sep 2009
A new study by the University of East Anglia suggests computers are now better at lip-reading than humans.
The peer-reviewed findings will be presented for the first time at the eighth International Conference on Auditory-Visual Speech Processing (AVSP) 2009, held at the
A research team from the
Furthermore, they found that machines are able to exploit very simplistic features that represent only the shape of the face, whereas human lip-readers require full video of people speaking.
The study also showed that rather than the traditional approach to lip-reading training, in which viewers are taught to spot key lip-shapes from static (often drawn) images, the dynamics and the full appearance of speech gestures are very important.
Using a new video-based training system, viewers with very limited training significantly improved their ability to lip-read monosyllabic words, which in itself is a very difficult task. It is hoped this research might lead to novel methods of lip-reading training for the deaf and hard of hearing.
“This pilot study is the first time an automated lip-reading system has been benchmarked against human lip-readers and the results are perhaps surprising,” said the study’s lead author Sarah Hilder.
“With just four hours of training it helped them improve their lip-reading skills markedly. We hope this research will represent a real technological advance for the deaf community.”
Agnes Hoctor, campaigns manager at the RNID, said: “This research confirms how difficult the vital skill of lip-reading is to learn and why RNID is campaigning for people who are deaf or hard of hearing to have improved access to classes. We would welcome the development of video-based or online training resources to supplement the teaching of lip-reading. Hearing loss affects 55 per cent of people over 60 so, with the ageing population, demand to learn lip-reading is only going to increase.”
The AVSP conference is being held in the
As part of the conference, delegates will take part in a Visual Speech Synthesis Challenge in which a number of visual speech synthesizers, or ‘talking heads’, will battle it out to determine the most intelligible and visually appealing system.
AVSP runs as a satellite conference to Interspeech 2009 which will be held in
Keynote speakers will be Dr Peter Bull of the
Comparison of human and machine-based lip-reading by Sarah Hilder, Richard Harvey and Barry-John Theobald is published in the Proceedings of the International Conference on Auditory-Visual Speech Processing (AVSP) 2009 on Thursday September 10 2009.
The research will be presented on Saturday September 12 at the International Conference on Auditory-Visual Speech Processing (AVSP) 2009 at the
For more information about the conference, please visit www.avsp2009.co.uk.
Part of the lip-reading test used to compare the performance of the machine-based lip-reading system and human lip-readers can be downloaded here: http://www.jtuk.com/training/part1.html
Send this page to your mobile phone by scanning this code using a 2D barcode (QR Code) reader. These can be installed on most modern Smart Phones.