Abstract:
In this thesis, a speech enhancement method based on generative adversarial network in wavelet domain is presented. A deep neural network based generator model is designed to provide an estimate of the clean speech coefficients from the noisy speech coefficients. A discriminator network is also designed that assesses the outputs from the generator and provide feedback on how close this clean estimates are to the real data distribution. Generator learns from this feedback and updates the function to provide better estimates of the clean speech coefficients, which is used to produce an enhanced speech frame. The complete network is trained using speech signals from a publicly available dataset. The proposed method outperforms some recent methods of speech enhancement under different noisy conditions at different levels of SNR in terms of objective performance metrics, spectrogram analysis and subjective evaluation.