Abstract:
Formants are the distinguishing frequency components of human speech, which play an
important role in characterizing di erent voiced sounds. Formant based speech synthe-
sis and coding are widely used in several real life applications. such as voice operated
controls and telecommunication.In almost all practical applications speech signals are
a ected by di erent kinds of background noise and estimation of formants under severe
background noise is a di cult task. In this thesis e cient formant estimation is in-
vestigated and methods for formant estimation are devised with a view to improve the
estimation performance under severe noisy conditions. In order to extract the formant
frequencies, rst a strongly voiced portion of the given speech utterance is extracted based
on the energy measure. Instead of considering the whole duration of a voiced sound at
a time, frame by frame analysis is performed. Within a frame of voiced speech sig-
nal, formants can be estimated by using di erent time or frequency domain approaches.
Correlation based methods are the most common time domain approaches to estimate
formants from speech signals . In linear predictive coding (LPC) based methods, from
the autocorrelation function (ACF) of the given speech utterance, Yule-Walker equations
are constructed and from their solutions formants can be obtained. Spectral peak pick-
ing is another extremely popular method of formant estimation, where both parametric
and non parametric spectral estimation techniques are used. Recently cepstrum domain
methods has been used in formant estimation . In the presence of heavy background
noise, spurious peaks appear in the speech spectrum making the task of accurate formant
estimation very di cult. The estimation performance of both time and frequency domain
methods deteriorates drastically under heavy noisy conditions.The main goal here is to develop a formant estimation scheme which provides satisfactory performance even at low
levels of signal to noise ratio (SNR). In order to reduce the e ect of noise the strength
of dominant pole pairs on the spectrum of noisy speech needs to be enhanced. With a
view to achieve this objective a spectral domain ramp cepstrum model of autocorrelation
function of speech signal is developed. The model utilizes the advantageous property of
the ACF that provides better noise immunity in comparison to the noisy signal directly.
Transforming to cepstral domain from time domain o ers the advantage of homomorphic
deconvolution which can reduce the e ect of pitch in speech analysis. In order to avoid
the rapid cepstral decay, instead of cepstrum, ramp cepstrum is used. Since, the pole
preserving property of the ramp cepstrum (RC) is better exploited via spectral peaks,
the spectrum of RC of the ACF of speech is proposed as the desired model. In order to
extract the formants from the observed noisy speech signal utilizing the derived model,
model matching scheme is introduced. In the model matching technique, instead of rely-
ing on the peak picking, tting error is minimized over a wider peak zone resulting more
accurate formant frequency estimation. Finally, the estimated formants are used in vowel
recognition scheme as potential features. The linear discriminant based algorithm is used
for the purpose of recognition. Extensive experimentation is carried out considering dif-
ferent male and female vowel utterances from standard speech database under di erent
noisy conditions. It is found that the proposed methods provide a high degree of formant
estimation accuracy in comparison to that obtained by some state of the art methods,
especially at very low levels of SNR.