Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint


Exemplar-based nonnegative models, where the noisy speech is decomposed as a sparse nonnegative linear combination of the speech and noise exemplars stored in a dictionary, have been successfully used for speech denoising. This paper extends this technique for the single-channel speech enhancement in noisy reverberant environments using a novel approximation of the noisy reverberant speech in the frequency domain and nonnegative matrix deconvolution. In the proposed model, the room impulse response (RIR) in the magnitude short-time Fourier transform domain is defined such that its decaying structure can also be estimated from the test data itself, whereas the existing models used a suboptimal binwise clamping procedure to impose such a decaying structure that does not hold in a typical RIR. This paper presents multiplicative updates for estimating the RIR, its decay, and the underlying anechoic speech and noise. The proposed model is evaluated on a synthetically created dataset created by convolving TIMIT recordings with RIRs measured from different rooms and varying speaker-and-microphone locations, and adding background noises taken from the CHiME corpus. Simulation results show that the proposed model results in a better RIR estimate over the existing model and improves various instrumental speech quality measures.

IEEE/ACM Transactions on Audio, Speech and Language Processing
Deepak Baby
Deepak Baby
Applied Scientist

My research interests include speech recognition, enhancement and deep learning.

Hugo Van hamme
Hugo Van hamme

Professor at KU Leuven, Belgium

comments powered by Disqus