Abstract:
Formants frequencies of the voiced utterance represent the free resonances of the human
vocal tract system. They are one of the fundamental properties of human voiced speech,
and for the purpose of speech analysis or speech recognition, formant frequencies play
a dominant role. In this thesis, effective methods for formant estimation are developed,
which work well even in the presence of significant background noise. In real life applica-
tions, very often human speech is affected by environmental noises from different sources.
Hence noise robustness of formant estimation methods is a key factor. Accurate estima-
tion of formants from given noise corrupted speech is a very difficult task. The major
objective of this research is to develop an algorithm that can successfully estimate the
formants in the presence of noise, overcoming the limitations of conventional methods.
The autocorrelation operation on the speech signal can be viewed as a mean to overcome
the adverse effects of noise, since it offers advantageous property of strengthening the
dominant formant peaks, leading to better formant estimation accuracy in noise. One
major idea in this research, unlike the conventional spectral domain peak picking is to
develop a spectral model of autocorrelated speech signal and thereby introduce a model
fitting scheme to find out the model parameters which are directly related to formants.
Based on the spectral peak strengthening property of the autocorrelation operation by
introducing new poles on the formant location, the idea of repeated autocorrelation is
presented. The effects of repeated autocorrelation in time and frequency domains are
investigated in detail, especially in noisy environments. It is observed that that in com-
parison to single autocorrelation, double autocorrelation function of a signal exhibits
more noise immunity. A spectral model is further developed to incorporate the effects of double autocorrelation. Finally the effect of spectral band limiting of the speech signal
before performing the autocorrelation operation is investigated. It is shown that formant
estimation from each band further improves the estimation performance. In order to
utilize this property, a band limiting approach is developed that can adaptively filter the
frequency zones where a formant frequency is most likely to be present. Spectral model
for the double autocorrelation function of the band limited signal is proposed and em-
ployed in a model matching approach for estimating the formants. Several vowel sounds
taken from the naturally spoken continuous speech signal are tested in the presence of
noise. Vowel sounds from synthetic speech as well as naturally spoken isolated words are
also considered. The experimental results demonstrate superior performance obtained
by the proposed scheme in comparison to some of the existing methods at low levels
of signal-to-noise ratio. The estimated formants are used in a basic vowel recognition
scheme utilizing a linear discriminant analysis based classifier along with Mel frequency
cepstral coefficients (MFCC), and the results demonstrate a good degree of noise robust-
ness compared to the methods using formant values estimated using traditional formant
estimation schemes.