Find us on: University of East Anglia on Facebook Follow University of East Anglia news on Twitter University of East Anglia's photostream University of East Anglia's YouTube channel
Course Search:

The Virtual Human Face as a Signal Transducer

Faculty Members: Professor J A Bangham
Dr M H Fisher
Dr R W Harvey
Dr B J Theobald
Researchers: Mr Alejandro Butron Guillen (MPhil/PhD Student)
Mr Xiaoqiang Huang (MPhil/PhD Student)
Collaborator: Dr Iain Matthews (Carnegie Mellon University, USA)

 

This project represents a long standing collaboration with the Speech Group.

Charles Darwin drew attention to the role of the face as a primative communication device. We perceive the human face to be a complex device able to communicate both a range of emotions and, particularly when trained, delivering sufficient detail to allow a lip-reader to follow speech. The questions are: how does it work? How complex is it and can it be simulated? These problems can be considered from several viewpoints: as a graphics problem, as a low-bandwidth communication coding problem, and as an information transducer. Work at UEA started on the information transducer problem in a study of lip-reading speech by computer, lipreading with the conclusion was that mouth shape alone is not sufficient. Incorporating additional information coded from changes in apparent skin texture and tone associated with the detailed changes in surface as the face shape changes, doubled recognition rates.

Edwards explored the possibilities of using AAM's to capture the characteristics of a talking head and points to a way forwards. However, for synthesis of both audio-visual speech and sign expressions there are several significant problems associated with driving AAM's backwards. For example, what is the best way to extract, select and interpolate between visemes?

AAM's generate two dimensional moving image sequences so a further the question arises: how should pose be handle? One is to extend the 2D AAM. Cootes built a set of AAM's each representing a different pose then selected the appropriate AAM automatically. This produces a good appearance of a moving head. However, it is possible that the limited fidelity associated with lip movements results from noise associated with the very large pose components of the model that swamp the, low variance, detail associated with small regions of the mouth and other regions of particular importance. This appears to limit the potential of handling pose within the statistical model itself.

The proposed approach combines an AAM (shape and appearance model) with a 3D mesh model. Thus, the mesh model handles pose and provides a surface on which the AAM can be displayed. This allows the face to be animated, pose free. In other words, the complex problem of understanding and re-animating facial gestures is broken down into a set of smaller, linear, sub-problems. Pose is handled by a mesh model, shape by a pose independent shape model, face surface by a pose and shape independent texture model, and fine detail of eyes and mouth by further pose, shape and face surface independent texture models. Animation is achieved by updating output from all the models for each animation frame. Synthesis is achieved by selecting and interpolating between concatenated 'visemes' in the appropriate subspace: pose, shape and texture(s).

A key result established that good quality facial expressions could be replayed at 1.5 kbits/s. More recently, the work has been extended from to the full face, the features needed for effective lip reading must be preserved when generating low-bandwidth talking heads.

The video-real virtual avatar has numerous applications. It forms a part of the virtual signing project. It has application in the film and video game industries and to research in the understanding of expression and emotion. The new video compression standard, MPEG-4, acknowledges that further gains in compression can only take place with a computer model, however, the standard will need to be extended to incorporate the advantages of this new system.

QR code for The Virtual Human Face as a Signal Transducer

Send this page to your mobile phone by scanning this code using a 2D barcode (QR Code) reader. These can be installed on most modern Smart Phones.