Prof Stephen Cox
| Job Title | Contact | Location |
|---|---|---|
| Professor of Computing Science |
S dot J dot Cox at uea dot ac dot uk
Tel: +44 (0)1603 59 2582 |
Biology 2.20 |
Career
For a full list of my publications, most downloadable, go to http://www2.cmp.uea.ac.uk/~sjc/
PhD Projects
The following opportunity exists to pursue a PhD with Prof. Cox on the projects:
Multimodal Understanding of Events
Reducing the Footprint and Increasing the Robustness of Speech Recognition Systems
Website
Key Research Interests
Stephen Cox is part of the Speech, Language and Music Group
His principal research interest is in speech processing, especially automatic speech recognition. Current research projects are in the use of speaker adaptation for speech recognition, speech synthesis, confidence measures for speech recognisers and automatic routing of telephone enquiries. He is the author of over 60 papers in the field of speech processing.
Publications:
Caballero Morales, S.O. and Cox, S.J., Modelling Confusion-Matrices to Improve Speech Recognition Accuracy, with an Application to Dysarthric Speech. Proc. 10th International Conference on Spoken Language Processing (Interspeech), Antwerp, August 2007
Read, I. and Cox, S.J., Automatic Pitch Accent Prediction for Text-To-Speech Synthesis. Proc. 10th International Conference on Spoken Language Processing (Interspeech), Antwerp, August 2007
Cox, S.J., On Estimation of Speakers’ Confusion Matrices from Sparse Data. Proc. 11th International Conference on Spoken Language Processing (Interspeech), Brisbane, September 2008
Caballero Morales, S.O. and Cox, S.J., Application of Weighted Finite-State Transducers to Improve Recognition Accuracy for Dysarthric Speech. Proc. 11th International Conference on Spoken Language Processing (Interspeech), Brisbane, September 2008
Cox, S.J., Harvey, R., Lan, Y., Newman, J.L. and Theobald, B.J., The challenge of multispeaker lip-reading. Proc. International Conference on Auditory-Visual Speech Processing 2008, Tangalooma, Australia.
Theobald, B.J., Harvey, R., Cox, S.J., Lewis, C. and Owen, G.P., Lip-reading enhancement for law enforcement. Proc. SPIE conference on Optics and Photonics for Counterterrorism and Crime Fighting, G. Owen and C. Lewis, Eds., vol. 6402, September 2006, pp. 205–9.
Newman, J.L. and Cox, S.J., Automatic Visual-Only Language Identification: A Preliminary Study. Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Taiwan, 2009.
Caballero Morales, S.O. and Cox, S.J., Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers. EURASIP Journal on Advances in Signal Processing. Volume 2009 (2009), Article ID 308340, 14 pages. doi:10.1155/2009/308340.
Caballero Morales, S.O. and Cox, S.J., On the Estimation and the Use of Confusion-Matrices for Improving ASR Accuracy. Proc. 12th International Conference on Spoken Language Processing (Interspeech), Brighton, September 2009.Watkins, C. and Cox, S.J., Example-Based Speech Recognition using Formulaic Phrases. Proc. 12th International Conference on Spoken Language Processing (Interspeech), Brighton, September 2009.
Newman, J.L. and Cox, S.J., Speaker Independent Visual-Only Language Identification. Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Dallas, 2010.
Huang, Q. and Cox, S.J., Hierarchical Language Modeling for Audio Events Detection in a Sports Game. Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, Dallas, 2010.
Selected Publications:
Read, I. and Cox, S. J., Stochastic and Syntactic Techniques for Predicting Phrase Breaks. Computer Speech and Language, Volume 21, Issue 3, Page(s) 519-542, 2007.
Huang, Q. and Cox, S. J., Task-Independent Call-Routing. Speech Communication, Volume 48, Issues 3-4, Page(s) 374-389, 2006.
Cox, S. J., Lincoln, M., Nakisa, M., Wells, M., Tutt, M. and Abbott, S., The Development and Evaluation of a Speech to Sign Translation System to Assist Transactions. Int. Journal of Human Computer Interaction, Volume 16, Issue 2, Page(s) 141-161, 2003.
Cox, S. J. and Dasmahapatra, S., High Level Approaches to Confidence Estimation in Speech Recognition. IEEE Transactions on Speech and Audio, Volume 10, Issue 7, Page(s) 460-471, 2002.
Past Research Projects and Grants
| Project Title | Start Date | End Date | Funding Body | Project Members |
|---|---|---|---|---|
| LILiR2 Language Independent Lip Reading | 31/5/2007 | 29/9/2010 | EPSRC | Richard Harvey, Stephen Cox, Barry Theobald |
| Classification of anaesthesia breath sounds (COABS) -AU076 | 1/2/2003 | 1/10/2003 | Ipswich NHS | N/A |
| Essential sign language | 1/10/2002 | 6/9/2004 | CEC (Framework 1-5) | Ralph Elliott, Stephen Cox, John Glauert |
| Annotatiuon of speech database (Nuance 3) | 1/4/2002 | 30/6/2002 | Nuance Communications Inc. | Stephen Cox, Ben Milner |
| VISTA - Virtual interface for a set-top box agent (LINK) | 12/3/2002 | 11/9/2003 | ESRC | Stephen Cox, Caroline Rose, Hugh Graham, Nicholas Wilkinson |
| Annotation of speech database | 14/1/2002 | 22/2/2002 | Nuance Communications Inc. | Stephen Cox, Ben Milner |
| Annotation of a large speech database | 1/6/2001 | 11/1/2002 | Nuance Communications Inc. | Stephen Cox, Ben Milner, Gavin Cawley |
| Telephone speech recognition prototype for deaf users | 31/3/2001 | 31/8/2002 | RNID | N/A |
| Virtual Signing: Capture, Animation, Storage and Transmission (VISICAST) | 1/1/2000 | 31/12/2002 | CEC (Framework 1-5) | Ralph Elliott, Stephen Cox, Andrew Bangham, John Glauert |
| Virtual signing, capture, animation storage and transmission | 1/1/2000 | 31/12/2002 | Post Office Counters Ltd | N/A |
| Confidence measures for speech recognition | 16/2/1998 | 15/8/2001 | EPSRC | Stephen Cox, Gavin Cawley |
Article
Huang, Q and Cox, S (2010) Inferring the Structure of a Tennis Game using Audio Information. IEEE Transactions on Audio, Speech, and Language Processing, 19 (7). pp. 1925-1937. ISSN 1558-7916
West, K. and Cox, S. (2010) Incorporating cultural representations of features into audio music similarity estimation. IEEE Transactions on Audio, Speech, and Language Processing, 18 (3). pp. 625-637. ISSN 1558-7916
Caballero Morales, Santiago-Omar and Cox, Stephen (2009) Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers. EURASIP Journal on Advances in Signal Processing, 2009, Article ID 308340. ISSN 1687-6172
Read, Ian and Cox, Stephen J. (2007) Stochastic and Syntactic Techniques for Predicting Phrase Breaks. Computer Speech and Language, 21 (3). pp. 519-542. ISSN 0885-2308
Glauert, J. R. W., Elliot, R., Cox, S. J., Tryggvason, J. and Sheard, M. (2006) VANESSA - A system for communication between deaf and hearing people. Technology and Disability, 18 (4). pp. 207-216. ISSN 1055-4181
Huang, Q. and Cox, S.J. (2006) Task-Independent Call-Routing. Speech Communication, 48 (3-4). pp. 374-389. ISSN 0167-6393
Cox, S. J. and Vinagre, L. (2004) Modelling of Confusions in Aircraft Call Signs. Speech Communication, 42 (3-4). pp. 289-312. ISSN 0167-6393
Wray, A., Cox, S. J., Lincoln, M. and Tryggvason, J. (2004) A formulaic approach to translation at the post office: reading the signs. Language and Communication, 24 (1). pp. 59-75. ISSN 0271-5309
Cox, S. J., Lincoln, M., Tryggvason, J., Nakisa, M., Wells, M., Tutt, M. and Abbott, S. (2003) The Development and Evaluation of a Speech to Sign Translation System to Assist Transactions. International Journal of Human-Computer Interaction, 16 (2). pp. 141-161. ISSN 1044-7318
Cox, S. J. and Dasmahapatra, S. (2002) High-Level Approaches to Confidence Estimation in Speech Recognition. IEEE Transactions on Speech and Audio Processing, 10 (7). pp. 460-471. ISSN 1063-6676
Cox, S. J., Marshall, I. and Safar, E. (2002) What are the difficulties of translating English to Sign language? The Linguist, 41 (1). pp. 6-10. ISSN 0268-5965
Matthews, I, Cootes, TF, Bangham, JA, Cox, SJ and Harvey, RW (2002) Extraction of Visual Features for Lipreading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2). pp. 198-213. ISSN 0162-8828
Theobald, Barry, Cox, Stephen, Cawley, Gavin and Milner, Ben (1999) Fast Method of Channel Equalisation for Speech Signals and its Implementation on a DSP. IEE Electronics Letters, 35 (16). pp. 1309-1311. ISSN 0013-5194
Cox, Stephen (1995) Predictive Speaker Adaptation in Speech recognition. Computer Speech and Language, 9 (1). pp. 1-17.
Cox, Stephen J. (1992) Speaker adaptation in Speech Recognition using Linear Regression Techniques. Electronic Letters, 28 (22). pp. 2093-2094. ISSN 0013-5194
Monograph
Linford, PW, Tattersall, GD, Cox, SJ and Rush, SA (1992) Matching Algorithms to Hardware. Research Report. UEA/BT.
Conference or Workshop Item
DeMarco, A and Cox, SJ (2011) An Accurate and Robust Gender Identification Algorithm. In: 14th International Conference on Spoken Language Processing (Interspeech), August 2011, Florence.
Huang, Q and Cox, SJ (2011) Iterative Improvement of Speaker Segmentation using High-level Knowledge. In: 14th International Conference on Spoken Language Processing (Interspeech), August 2011, Florence.
Huang, Q and Cox, SJ (2011) Learning Score Structure from Spoken Language for A Tennis Game. In: 14th International Conference on Spoken Language Processing (Interspeech), August 2011, Florence.
Huang, Q and Cox, SJ (2011) Improved Detection of Ball Hit Events in a Tennis Game Using Multimodal Information. In: International Conference on Auditory-visual Speech Processing, 2011, Volterra.
Huang, Qiang and Cox, Stephen (2010) Using High-Level Information to Detect Key Audio Events in a Tennis Game. In: 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), September 26-30. 2010, Makuhari, Chiba, Japan.
Huang, Qiang and Cox, Stephen (2010) Hierarchical language modeling for audio events detection in a sports game. In: IEEE International Conference on Acoustics, Speech, and Signal Processing , 14-19 March 2010, Dallas, TX.
Newman, Jacob and Cox, Stephen (2010) Speaker independent visual-only language identification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing , 14-19 March 2010, Dallas, TX.
Huang, Q and Cox, S (2010) Shallow Parsing of a Tennis Game from Audio Events. In: Fourth International Conference on Intelligent Information Technology Application (IITA 2010), November 5 - 7, 2010, Qinhuangdao, China. (In Press)
Newman, Jacob, Theobald, Barry and Cox, Stephen (2010) Limitations of Visual Speech Recognition. In: Proceedings of the International Conference on Auditory-Visual Speech Processing, Hakone, Kanagawa, Japan.
Caballero Morales, Omar and Cox, Stephen (2009) On the Estimation and the Use of Confusion-Matrices for Improving ASR Accuracy. In: 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), September 6-10, 2009, Brighton, United Kingdom.
Watkins, Christopher and Cox, Stephen (2009) Example-Based Speech Recognition Using Formulaic Phrases. In: 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), September 6-10, 2009, Brighton, United Kingdom.
Newman, Jacob and Cox, Stephen (2009) Automatic visual-only language identification: A preliminary study. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 19-24 April 2009, Taipei, Taiwan.
Caballero Morales, Omar and Cox, Stephen (2008) Application of Weighted Finite-State Transducers to Improve Recognition Accuracy for Dysarthric Speech. In: 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), September 22-26, 2008, Brisbane, Australia.
Cox, Stephen (2008) On Estimation of a Speaker's Confusion Matrix from Sparse Data. In: 9th Annual Conference of the International Speech Communication Association (INTERSPEECH), September 22-26, 2008, Brisbane, Australia.
Cox, Stephen, Harvey, Richard, Lan, Yuxuan, Newman, Jacob and Theobald, Barry (2008) The Challenge of Multispeaker Lip-Reading. In: International Conference on Auditory-Visual Speech Processing, September 26-29, 2008, Queensland, Australia.
Caballero-Morales, Omar and Cox, Stephen (2007) Modelling Confusion-Matrices to Improve Speech Recognition Accuracy, with an Application to Dysarthric Speech. In: 8th Annual Conference of the International Speech Communication Association (Interspeech), August 27-31, 2007, Antwerp, Belgium.
Jenkins, Marie-Claire, Churchill, Richard, Cox, Stephen J. and Smith, Dan J. (2007) Analysis of user interaction with service oriented chatbot systems. In: Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments, Beijing, China.
Read, Ian and Cox, Stephen J. (2007) Automatic Pitch Accent Prediction for Text-To-Speech Synthesis. In: 8th Annual Conference of the International Speech Communication Association (INTERSPEECH-2007), August 27-31, 2007, Antwerp, Belgium.
Theobald, Barry, Harvey, Richard, Cox, Stephen, Lewis, Colin and Owens, Gari (2006) Lip-reading enhancement for law enforcement. In: Proceedings of SPIE 6402 Conference on Optics and Photonics for Counterterrorism and Crime Fighting, 11 September 2006, Stockholm, Sweden.
West, K., Cox, S. J. and Lamere, P. (2006) Incorporating Machine Learning into Music Similarity Estimation. In: Proceedings of the 1st ACM workshop on Audio and music computing multimedia, October 23 - 27, 2006, Santa Barbara, California, USA.
Cox, S. J. (2005) A Discriminative Approach to Phrase Break Modelling. In: Proceedings of 9th European Conference on Speech Communication and Technology (INTERSPEECH'05), September 4-8, 2005, Lisbon, Portugal.
West, K. and Cox, S. J. (2005) Finding an Optimal Segmentation for Audio Genre Classification. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), 11 - 15 September 2005, London, UK.
Read, Ian and Cox, Stephen J. (2005) Stochastic and Syntactic Techniques for Predicting Phrase Breaks. In: Proceedings 9th European Conference on Speech Communicaiton and Technology (INTERSPEECH-2005), September 4-8, 2005, Lisbon, Portugal.
Huang, Q. and Cox, S. J. (2004) Mixture Language Models for Call Routing. In: 8th International Conference on Spoken Language Processing (Interspeech 2004), October 4-8, 2004, Jeju Island, Korea.
Read, I. and Cox, S. J. (2004) Using Part-Of-Speech Tags for Predicting Phrase Breaks. In: 8th International Conference on Spoken Language Processing (Interspeech 2004), October 4-8, 2004, Jeju Island, Korea.
Huang, Q. and Cox, S. J. (2004) Automatic Call Routing with Multiple Language Models. In: Proceedings of the HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing, 2-7 May 2004, Boston, Massachusetts, USA.
Huang, Q. and Cox, S. J. (2004) Improving Phoneme Recognition of Telephone Quality Speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 17-21 May 2004.
Cox, S. J. (2004) Using Context to Correct Phone Recognition Errors. In: 8th International Conference on Spoken Language Processing (Interspeech 2004), October 4-8, 2004, Jeju Island, Korea.
Tan, L., Cox, S. J., Bell, G. D. and Mansfield, M. (2004) Can we modify existing automatic speech recognition technology to reliably and safely monitor respiration in patients sedated with Propofol? In: British Society of Gastroenterology Annual Meeting, 21 to 24 March, 2004, Scottish Exhibition and Conference Centre, Glasgow.
West, K. and Cox, S. J. (2004) Features and Classifiers for the Automatic Classification of Musical Audio Signals. In: Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR 2004), 10-15 October 2004, Barcelona, Spain.
Cox, Stephen J. and Cawley, Gavin C. (2003) The Use of Confidence Measures in Vector Based Call-Routing. In: 8th European Conference on Speech Communication and Technology (EUROSPEECH 2003), September 1-4, 2003, Geneva, Switzerland.
Huang, Qiang and Cox, Stephen J. (2003) Automatic Call-Routing Without Transcriptions. In: Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH 2003), September 1-4, 2003, Geneva, Switzerland.
Lambert, T., Breen, Andrew P., Eggleton, Barry, Cox, Stephen J. and Milner, Ben P. (2003) Unit Selection in Concatenative TTS Synthesis Systems Based on Mel Filter Bank Amplitudes and Phonetic Context. In: Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH 2003), September 1-4, 2003, Geneva, Switzerland.
Shao, Xu, Milner, Ben P. and Cox, Stephen J. (2003) Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications. In: Eurospeech-2003 — 8th European Conference on Speech Communication and Technology, September 1-4, 2003, Geneva, Switzerland.
Cox, S. J. (2003) Discriminative Techniques in Call Routing. In: IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP '03), 6-10 April 2003, Hong Kong.
Lincoln, M. and Cox, S. J. (2003) A Comparison of Language Processing Techniques for a Constrained Speech Translation System. In: IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP '03), 6-10 April 2003, Hong Kong.
Cox, Stephen J. (2002) Speech and Language Processing for a Constrained Speech Translation System. In: 7th International Conference on Spoken Language Processing, September 16-20, 2002, Denver, Colorado, USA.
Cox, Stephen J., Lincoln, Michael, Tryggvason, Judy, Nakisa, Melanie, Wells, Mark, Tutt, Marcus and Abbott, Sanja (2002) TESSA, a system to aid communication with deaf people. In: ASSETS 2002, Fifth International ACM SIGCAPH Conference on Assistive Technologies, 8-10th July 2002, Edinburgh, Scotland.
Cox, S. J. and Shahshahani, B (2001) A Comparison of some Different Techniques for Vector Based Call-Routing. In: Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH-2001), September 3-7, 2001, Aalborg, Denmark.
Cox, S.J. and Shahshahani, B (2001) Improved Techniques for automatic call-routing. In: Institute of Acoustics Workshop on Innovation in Speech Processing (WISP-2001), 2-3 April 2001, Stratford-upon-Avon, UK.
Cox, S. J. and Dasmahapatra, S. (2000) A Semantically-Based Confidence Measure for Speech Recognition. In: Sixth International Conference on Spoken Language Processing (ICSLP 2000), October 16-20, 2000, Beijing, China.
Cox, Stephen (2000) Speaker Normalisation in the MFCC Domain. In: Sixth International Conference on Spoken Language Processing (ICSLP 2000), October 16-20, 2000, Beijing, China.
Dasmahapatra, S. and Cox, S. J. (2000) Meta-Models for Confidence Estimation in Speech Recognition. In: IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP '00), 5-9 June 2000, Istanbul, Turkey.
Bangham, JA, Cox, SJ, Elliott, R, Glauert, JRW, Marshall, I, Rankov, S and Wells, M (2000) Virtual Signing: Capture, Animation, Storage and Transmission - an overview of the ViSiCAST Project. In: IEE Colloquium on Speech and Language Processing for the Disabled and Elderly, 4 April 2000, London, UK.
Bangham, JA, Cox, SJ, Lincoln, M, Marshall, I, Tutt, M and Wells, M (2000) Signing for the deaf using virtual humans. In: IEE Colloquium on Speech and Language processing for Disabled and Elderly.
External Activities and Indicators of Esteem
- Member IEEE Speech and Language Processing Technical Committee, 2006
- Chairman, UK Institute of Acoustics Speech Group, 1998-2003
- Keynote speech, AMI Workshop on “Multimodal Interaction and Related Machine Learning Algorithms”, Martigny, Switzerland, 2004
Key Responsibilities
Head of Speech Language Group
Director of Research


