Audiovisual speech synthesiser (AVTTS) aims at generating an acoustic speech signal and facial animation of a person speaking to improve intelligibility and naturalness of the synthesiser. Example applications for this work include computer animation for movies and games, improved accessibility of computer systems, and very low bandwidth videoconferencing.
Most traditional AVTTS systems are two-phase synthesiser. In general a two-phase system will first synthesise the acoustic speech with a text-to-speech system, then synthesise the accompanying video sequence with a visual synthesiser. The major drawback is the audiovisual speech can lacks synchrony. Hence, modern research has concerned single-phase synthesis to generate the highest possible coherence between the audio and the visual modalities.
Here we show two animated speech sequences driven by phoneme and static-viseme units.
 Static Viseme