Speech Dereverberation using Variational Autoencoders

June 2021

Abstract

This paper presents a statistical method for single-channel speech dereverberation using a variational autoencoder (VAE) for modelling the speech spectra. One popular approach for modelling speech spectra is to use non-negative matrix factorization (NMF) where learned clean speech spectral bases are used as a linear generative model for speech spectra. This work replaces this linear model with a powerful nonlinear deep generative model based on VAE. Further, this paper formulates a unified probabilistic generative model of reverberant speech based on Gaussian and Poisson distributions. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the VAE and estimating the room impulse response for both probabilistic models. Evaluation results show the superiority of the proposed VAE-based models over the NMF-based counterparts.

Type

Conference paper

Publication

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada

Speech Dereverberation using Variational Autoencoders

Abstract

Deepak Baby

Applied Scientist

Related