The aim of speech enhancement is to reduce or remove the effect of noise on speech in terms of its quality and intelligibility. Conventional speech enhancement methods apply a two-stage procedure to remove the noise. The noise signal is estimated from the noisy speech and this noise estimate is then filtered out of the noisy speech. Such methods are effective at removing stationary noises however in non-stationary noises annoying artefacts, known as musical noise, remain in the signal where the noise has been under or over-estimated.

This project does
not
attempt
to
filter
noisy
speech
to
remove
noise.
Instead the aim
is
to
reconstruct a clean speech
signal
from
a
set
of
acoustic
speech
features extracted from the noisy speech. There are two challenges to such an approach. First, a suitable model of speech, driven by a set of acoustic speech features, must be developed. The speech model should only be able to reconstruct speech, not noise, and must not reduce quality or intelligibility. Next, a method of robust feature extraction is required to obtain the acoustic features of clean speech from the noisy speech.

Results have shown this method to be very effective at processing speech affected by a variety of noises including: babble (many background speakers talking at once), street noise, in-car noise and machine gun noise. The following examples compare the model-based approach to a state of the art conventional method (log MMSE) for the task of removing street noise mixed with speech at 5dB SNR.

References

  1. Harding, P. and Milner, B. Enhancing Speech by Reconstruction from Robust Acoustic Features. In Thirteenth Annual Conference of the International Speech Communication Association, 2012
  2. Harding, P. and Milner, B. On the use of Machine Learning Methods for Speech and Voicing Classification. In Thirteenth Annual Conference of the International Speech Communication Association, 2012
  3. Harding, P. and Milner, B., Speech enhancement by reconstruction from cleaned acoustic features. In Twelfth Annual Conference of the International Speech Communication Association, 2011

Downloads

examples_clean [,png/.wav] - clean speech

examples_noisy [.png / .wav] - speech mixed with street noise at 5dB SNR

examples-logmmse [.png/ .wav] - noisy speech processed by log MMSE

examples_model [.png/ .wav] - noisy speech processed by model-based enhancement method

png files are narrowband spectrograms whilst wav files are  audio examples corresponding to the spectrograms

Research Team

Mr. Philip Harding, Dr. Ben Milner, Prof. Stephen Cox