DSpace Repository

Neural language modeling for context based word suggestion, sentence completion and spelling correction in Bangla

Show simple item record

dc.contributor.advisor Ali, Dr. Mohammed Eunus
dc.contributor.author Rahman, Chowdhury Rafeed
dc.date.accessioned 2022-05-10T09:52:59Z
dc.date.available 2022-05-10T09:52:59Z
dc.date.issued 2021-08-05
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5991
dc.description.abstract Though there has been a large body of recent works in language modeling for high resource languages such as English and Chinese, the area is still unexplored for low resource languages like Bangla and Hindi. We propose an end to end trainable memory efficient convolutional neural network (CNN) architecture CoCNN to handle specific characteristics such as high inflection, morphological richness, flexible word order and phonetical spelling errors of Bangla and Hindi. In particular, we introduce two learnable convolutional sub-models at word and sentence level. We show that state-of-the-art Transformer models do not necessarily yield the best performance for Bangla and Hindi. CoCNN outperforms pretrained BERT with 16X less parameters and 10X less training time, while it achieves much better performance than state-of-the-art long short term memory (LSTM) models on multiple real-world datasets. The word level CNN sub-model SemanticNet of CoCNN architecture has shown its potential as an effective Bangla spell checker. We explore this potential and develop a state-of-the-art Bangla spell checker. Bangla typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. We propose a specialized BERT model, BSpell targeted towards word for word correction in sentence level. BSpell contains CNN sub-model SemanticNet being motivated from CoCNN along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bangla vocabulary in the presence of spelling errors. We further propose hybrid pretraining scheme for BSpell combining word level and character level masking. Utilizing this pretraining scheme, BSpell achieves 91.5% accuracy on real life Bangla spelling correction validation set. en_US
dc.language.iso en en_US
dc.publisher Department of computer Science and Engineering en_US
dc.subject Spelling correction en_US
dc.title Neural language modeling for context based word suggestion, sentence completion and spelling correction in Bangla en_US
dc.type Thesis-MSc en_US
dc.contributor.id 1018052013 en_US
dc.identifier.accessionNumber 118519
dc.contributor.callno 005.52/RAF/2021 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account