The last few years have witnessed a considerable change in the way speech is carried transmitted across telephony networks. The rapid growth of both mobile and IP networks means that much speech data is now sent over these networks.

This presents many problems to speech recognition systems designed to operate over traditional fixed networks such as additional acoustic noise, distortion from low bit-rate speech codecs, packet loss and compression. A major advance in this area has been distributed speech recognition (as proposed by the ETSI aurora group) where the feature extraction component of the speech recogniser resides on the terminal device and the decoding takes place at a remote recogniser. This removes distortion resulting from the use of low bit-rate speech codecs and can also improve the noise robustness of the system.

However, both IP and mobile networks are unreliable in their delivery of data and may suffer from packet loss and hence lose valuable speech information. This project aims to analyse the effect of packet loss in terms of its effect on the speech features and also how this degrades recognition performance. Schemes for improving recognition performance in packet loss are also being developed.

In order to measure the performance of speech recognition systems in packet loss it is important to accurately model the process of packet loss. Work in this area has used a 3 state Markov chain to allow the percentage of packet loss and the average burst length to be specified.

For short duration bursts of packet loss, successful compensation can be achieved by interpolating the speech feature vectors across periods of loss . Experiments have shown that for longer duration bursts (>50ms) there is insufficient correlation in the feature vector stream to enable accurate estimate of missing vectors using interpolation. However, by incorperating interleaving into the packetisation process, sufficient target vectors remain in the received feature vector stream to enable interpolation to operate successfully. This combination of interleaving and interpolation gives considerable performance gains at burst lengths up to 400ms. Further methods for packet loss compensation are also being considered and include modifications to the Viterbi decoding procedure within the HMMs to allow for the loss of speech vectors. This will remove the need to make estaimtes of the missing feature vectors.

Research Team

Dr. Ben Milner, Mr. Alastair James