Neural language modeling for context based word suggestion, sentence completion and spelling correction in Bangla

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

Neural language modeling for context based word suggestion, sentence completion and spelling correction in Bangla

Rahman, Chowdhury Rafeed

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5991

Date: 2021-08-05

Abstract:

Though there has been a large body of recent works in language modeling for high resource languages such as English and Chinese, the area is still unexplored for low resource languages like Bangla and Hindi. We propose an end to end trainable memory efficient convolutional neural network (CNN) architecture CoCNN to handle specific characteristics such as high inflection, morphological richness, flexible word order and phonetical spelling errors of Bangla and Hindi. In particular, we introduce two learnable convolutional sub-models at word and sentence level. We show that state-of-the-art Transformer models do not necessarily yield the best performance for Bangla and Hindi. CoCNN outperforms pretrained BERT with 16X less parameters and 10X less training time, while it achieves much better performance than state-of-the-art long short term memory (LSTM) models on multiple real-world datasets. The word level CNN sub-model SemanticNet of CoCNN architecture has shown its potential as an effective Bangla spell checker. We explore this potential and develop a state-of-the-art Bangla spell checker. Bangla typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. We propose a specialized BERT model, BSpell targeted towards word for word correction in sentence level. BSpell contains CNN sub-model SemanticNet being motivated from CoCNN along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bangla vocabulary in the presence of spelling errors. We further propose hybrid pretraining scheme for BSpell combining word level and character level masking. Utilizing this pretraining scheme, BSpell achieves 91.5% accuracy on real life Bangla spelling correction validation set.

Show full item record

Files in this item

Name: Full Thesis.pdf

Size: 18.91Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Department of Computer Science and Engineering
Post graduate dissertations (Theses) of Computer Science Engineering (CSE)

Neural language modeling for context based word suggestion, sentence completion and spelling correction in Bangla

Neural language modeling for context based word suggestion, sentence completion and spelling correction in Bangla

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account