Stephen Cox trained firstly as a physicist and then as an electronic engineer, and began his career at the UK Government Communications Centre developing signal-processing algorithms. He then joined British Telecoms's research laboratories to work on speech recognition, and spent two years at the speech research unit of the Royal Signals and Radar Establishment (now Qinetiq) at Malvern, where he researched into adaptation of speech recognition algorithms to new speakers. He returned to BT to lead a team of researchers developing speech recognition algorithms for use on the UK telephone network. He joined the School of Computing Sciences at UEA as a lecturer in 1991 and was appointed professor in 2003. His research interests include speech recognition, music processing, audio identification and automatic lip-reading and he is the author and co-author of over 100 publications in these fields. He was an invited consultant at AT&T Bell Labs, New Jersey in 1994, a visiting scientist at Nuance Communications Inc., CA, in 2000, and an invited researcher at Apple Inc., CA, in 2010. He has acted as a consultant and reviewer for several national governments as well as the European Commission, and also consults for industry. He is a senior member of the Institute of Electrical and Electronic Engineers and an ex committee member of the IEEE Speech and Language Technical Committee.
For a full list of my publications, most downloadable, go to http://www2.cmp.uea.ac.uk/~sjc/
<- Page 1 of 4 ->
Visual units and confusion modelling for automatic lip-reading
in Image and Vision Computing
pp. 1-12Full Text UEA Repository
Tennis Ball Tracking using a Two-Layered Data Association Approach
in IEEE Transactions on Multimedia
pp. 145-156Full Text UEA Repository
Speaker-independent machine lip-reading with speaker-dependent viseme classifiersUEA Repository
Automatic annotation of tennis games: An integration of audio, vision, and learning
in Image and Vision Computing
pp. 896-903Full Text UEA Repository
Supervised and Unsupervised Adaptation to Regional Accented Speech using Limited Data for Automatic Speech Recognition
A two-layered data association approach for ball trackingUEA Repository
Confusion modelling for automated lip-reading using weighted finite-state transducers.UEA Repository
Native accent classification via I-Vectors and speaker compensation fusionUEA Repository
Recent developments in automated lip-readingUEA Repository
Language Identification Using Visual Features
in IEEE Transactions on Audio, Speech and Language Processing
pp. 1936-1947Full Text UEA Repository
Detection of Ball Hits in a Tennis Game using audio and visual informationUEA Repository
Improved Audio Event Detection by use of Contextual NoiseUEA Repository
Is automated conversion of video to text a reality?
ISBN 9780819492876Full Text UEA Repository
Iterative Classification Of Regional British Accents In I-Vector SpaceUEA Repository
An Accurate and Robust Gender Identification AlgorithmUEA Repository
Improved Detection of Ball Hit Events in a Tennis Game Using Multimodal InformationUEA Repository
Iterative Improvement of Speaker Segmentation using High-level KnowledgeUEA Repository
Learning Score Structure from Spoken Language for A Tennis GameUEA Repository
Inferring the Structure of a Tennis Game using Audio Information
in IEEE Transactions on Audio, Speech, and Language Processing
pp. 1925-1937Full Text UEA Repository
Using High-Level Information to Detect Key Audio Events in a Tennis GameUEA Repository
<- Page 1 of 4 ->
Key Research Interests
Stephen Cox is part of the Speech, Language and Audio Processing Group
His principal research interest is in speech processing, especially automatic speech recognition. Current research projects are in the use of speaker adaptation for speech recognition, speech synthesis, confidence measures for speech recognisers and automatic routing of telephone enquiries. He is the author of over 60 papers in the field of speech processing.
Caballero Morales, S.O. and Cox, S.J., Modelling Confusion-Matrices to Improve Speech Recognition Accuracy, with an Application to Dysarthric Speech. Proc. 10th International Conference on Spoken Language Processing (Interspeech), Antwerp, August 2007
Read, I. and Cox, S.J., Automatic Pitch Accent Prediction for Text-To-Speech Synthesis. Proc. 10th International Conference on Spoken Language Processing (Interspeech), Antwerp, August 2007
Cox, S.J., On Estimation of Speakers’ Confusion Matrices from Sparse Data. Proc. 11th International Conference on Spoken Language Processing (Interspeech), Brisbane, September 2008
Caballero Morales, S.O. and Cox, S.J., Application of Weighted Finite-State Transducers to Improve Recognition Accuracy for Dysarthric Speech. Proc. 11th International Conference on Spoken Language Processing (Interspeech), Brisbane, September 2008
Cox, S.J., Harvey, R., Lan, Y., Newman, J.L. and Theobald, B.J., The challenge of multispeaker lip-reading. Proc. International Conference on Auditory-Visual Speech Processing 2008, Tangalooma, Australia.
Theobald, B.J., Harvey, R., Cox, S.J., Lewis, C. and Owen, G.P., Lip-reading enhancement for law enforcement. Proc. SPIE conference on Optics and Photonics for Counterterrorism and Crime Fighting, G. Owen and C. Lewis, Eds., vol. 6402, September 2006, pp. 205–9.
Newman, J.L. and Cox, S.J., Automatic Visual-Only Language Identification: A Preliminary Study. Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Taiwan, 2009.
Caballero Morales, S.O. and Cox, S.J., Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers. EURASIP Journal on Advances in Signal Processing. Volume 2009 (2009), Article ID 308340, 14 pages. doi:10.1155/2009/308340.
Caballero Morales, S.O. and Cox, S.J., On the Estimation and the Use of Confusion-Matrices for Improving ASR Accuracy. Proc. 12th International Conference on Spoken Language Processing (Interspeech), Brighton, September 2009.
Watkins, C. and Cox,S.J.,Example-Based Speech Recognition using Formulaic Phrases. Proc. 12th International Conference on Spoken Language Processing (Interspeech), Brighton, September 2009.
Newman, J.L. and Cox, S.J., Speaker Independent Visual-Only Language Identification. Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Dallas, 2010.
Huang, Q. and Cox, S.J., Hierarchical Language Modeling for Audio Events Detection in a Sports Game. Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Dallas, 2010.
Read, I. and Cox, S. J., Stochastic and Syntactic Techniques for Predicting Phrase Breaks. Computer Speech and Language, Volume 21, Issue 3, Page(s) 519-542, 2007.
Huang, Q. and Cox, S. J., Task-Independent Call-Routing. Speech Communication, Volume 48, Issues 3-4, Page(s) 374-389, 2006.
Cox, S. J., Lincoln, M., Nakisa, M., Wells, M., Tutt, M. and Abbott, S., The Development and Evaluation of a Speech to Sign Translation System to Assist Transactions. Int. Journal of Human Computer Interaction, Volume 16, Issue 2, Page(s) 141-161, 2003.
Cox, S. J. and Dasmahapatra, S., High Level Approaches to Confidence Estimation in Speech Recognition. IEEE Transactions on Speech and Audio, Volume 10, Issue 7, Page(s) 460-471, 2002.
External Activities and Indicators of Esteem
- Member IEEE Speech and Language Processing Technical Committee, 2006
- Chairman, UK Institute of Acoustics Speech Group, 1998-2003
- Keynote speech, AMI Workshop on “Multimodal Interaction and Related Machine Learning Algorithms”, Martigny, Switzerland, 2004
Director of Graphics, Vision and Speech Language Laboratory
Head of Speech Language Group
Director of Research