Abstract:
This thesis deals with the development of a Bengali character recognition system. Very
few investigations have been conducted in this direction so far. The main philosophy of the
technique is to obtain a correspondence between mathematical prediction and actual
recognition. The recognition of hand written characters is a special kind of complex
mathematical problem, because the different distortions present in the hand written character
set make it difficult to produce a distinct set identification. The present dissertation tries to
address the problems concerning the difficulty of recognition of characters of different physical
sizes, different fonts, identification of irregular patterns which are not classified as any
acceptable character, overcoming the irregularities and discontinuities incorporated due to rough
paper surfaces, handling of noise introduced due to the presence of sharp edges and irregular
thickness in the pattern. To offset these distortions, detail preprocessing had to be performed on
the raw image files for smoothing and filtering these images.
In this investigation a hybrid approach is adopted. The method applies a combination of
structural analysis and template matching techniques. The characters are classified into
different subgroups depending on some distinguishable structural features. Image compression
techniques have also been applied to transform these images into a standard form. Some
structural information have been extracted from the characters which formed the basis for their
recognition. These structural information or 'the structural signatures' have been stored in coded
form to generate the match dictionary. When a new character is to be identified, its structural
signature is generated and compared with the stored signatures. A similarity measure have
been devised to compare the signatures of the test and prototype characters. The present
technique has succeeded in attaining the recognition rate of about 66% which is quite good for a
first ever attempt in this particular discipline. Some modifications to the original similarity
measure have been proposed and successfully implemented which have improved the
recognition rate to the present 66% from the initial recognition rate of 50%. Statistical analysis
have also been done on the individual character recognition rates and comments on the result of
this analysis was incorporated. The thesis also gives an elaborate guideline for future research
work to proceed in this direction.