Exemplar-based noise robust automatic speech recognition using modulation spectrogram features

Abstract

We propose a novel exemplar-based feature enhancement method for automatic speech recognition which uses coupled dictionaries: an input dictionary containing atoms sampled in the modulation (envelope) spectrogram domain and an output dictionary with atoms in the Mel or full-resolution frequency domain. The input modulation representation is chosen for its separation properties of speech and noise and for its relation with human auditory processing. The output representation is one which can be processed by the ASR back-end. The proposed method was investigated on the AURORA-2 and AURORA-4 databases and improved word error rates (WER) were obtained when compared to the system which uses Mel features in the input exemplars. The paper also proposes a hybrid system which combines the baseline and the proposed algorithm on the AURORA-2 database which in turn also yielded improvement over both the algorithms.

Publication
IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, NV, USA
Deepak Baby
Deepak Baby
Applied Scientist

My research interests include speech recognition, enhancement and deep learning.

Tuomas Virtanen
Tuomas Virtanen
Professor

Professor at Tampere University of Technolog

Tech Lead, Autonomous Systems at Apple Inc., Cupertino, USA

Tech Lead, Autonomous Systems at Apple Inc., Cupertino, USA

Hugo Van hamme
Hugo Van hamme
Professor

Professor at KU Leuven, Belgium

comments powered by Disqus

Related