dc.description.abstract |
Retrieving textual information from natural scene images is an active researchareainthefieldofcomputervisionwithnumerouspracticalapplications. Detectingtext regions and extracting text from signboards is a challenging problem due tospecialcharacteristicslikereflectinglights,unevenillumination,orshadowsfoundin real-life natural scene images. With the advent of deep learning-based methods,different sophisticated techniques have been proposed for text detection and textrecognition from the natural scene. Though a significant amount of effort has beendevotedtoextractingnaturalscenetextforresourcefullanguageslikeEnglish,littlehas been done for low-resource languages like Bangla. In this research work, wehave proposed an end-to-end system with deep learning-based models for efficientlydetecting, recognizing, correcting, and parsing address information from Banglasignboards. We have created manually annotated datasets and synthetic datasets totrain signboard detection, address text detection, address text recognition, addresstext correction, and address text parser models. We have conducted a comparativestudy among different CTC-based and Encoder-Decoder model architectures forBangla address text recognition. Moreover,we have designed a novel addresstextcorrectionmodelusingasequence-to-sequencetransformer-basednetworkto improve the performance of Bangla address text recognition model by post-correction.Finally, we have developed a Bangla address text parser using thestate-of-the-arttransformer-basedpre-trainedlanguagemodel. |
en_US |