Abstract:
In the age of information technology Human-Computer interaction has gained
importance. Speech is the primary mode of communication among human being and
people expect to exchange natural dialect with computer. This expectation can be
achieved due to recent development of speech technology. Speech recognition is the
process of extracting necessary information from input speech signal to make correct
decision and can be applied in automation of operator assisted services, dictation,
interactive voice response, medical transcription, pronunciation in computer aided
language learning application, data entry etc. To achieve this at first word separation
from continuous speech is needed. An isolated speech recognition system requires that a
speaker offers clear signature between words but continuous speech consists of
continuous utterance which is the representative of a real speech.
The thesis develops a word separation algorithm named Prosody based word separation
algorithm (PWSA) for recognition of continuous Bangla speech based on prosodic
features. Bangla is a bound stress language i.e. it has stress which is high on initial word
of a sentence and becomes low at the end of sentences. Based on relative fundamental
frequency estimation, PWSA is developed to separate words from continuous Bangla
speech. Mel Frequency Cepstral Coefficient (MFCC) is used to extract feature from each
separated word. Vector Quantization is used to build codebook of each word. Codebook
of all words make database of Bangla speech. For recognition of unknown speech at first
PWSA is applied to separate words. MFCC features are extracted from unknown words
and compared with database. Experimental result shows that the proposed word
separation algorithm using stress information with energy performs excellent.
Experiment was performed on 1755 words with 98% accuracy which is 32% better than
the existing algorithm . Fo r recognition, the system obtained accuracy of 82%.