Abstract:
In order to handle the practical situations of real-life applications, a speech enhance-
ment method is needed to be capable of producing optimum results with improved
overall speech quality with maximized intelligibility particularly under low levels of
SNRs. For solving this open problem, this thesis presents a speech enhancement
approach, where an adaptive threshold is statistically determined using the Tea-
ger energy (TE) operated perceptual wavelet packet (PWP) coefficients of noisy
speech. A frame of noisy speech signal is analyzed first in PWP transform domain
to obtain a set of PWP coefficients. TE operation is performed on the PWP coeffi-
cients to increase the separability between clean speech and noise coefficients. The
TE operated PWP coefficients with better time and frequency resolution are then
used to determine an appropriate adaptive threshold based on different statistical
models, namely Gaussian, Laplace, Rayleigh, Poisson and Student t distributions.
The threshold thus obtained is applied upon the PWP coefficients by employing a
custom thresholding function, which is designed based on the presence of noise in
the noisy speech signal. A couple of custom thresholding functions designed in this
thesis can be viewed as a linear combination of the modified hard or μ-law thresh-
olding function and the semisoft thresholding function. The enhanced speech frame
is synthesized by performing the inverse PWP transform on the thresholded PWP
coefficients obtained using the statistically determined threshold and the designed
custom thresholding function. The final enhanced speech signal is reconstructed by
using the standard overlap-and-add method. Extensive Simulations using NOIZEUS
database are carried out considering the presence of car and multi-talker babble
noises to evaluate the performance of the proposed method in terms of standard ob-
jective metrics and subjective listening tests. It is shown that the proposed method
outperforms the reported state-of the-art methods with superior efficacy at high as
well as low levels of SNRs.