DSpace Repository

Bangla text sentiment analysis based on extended lexicon dictionary using supervised machine learning and deep learning algorithms

Show simple item record

dc.contributor.advisor Mondal, Dr. Md. Rubaiyat Hossain
dc.contributor.author Bhowmik, Nitish Ranjan
dc.date.accessioned 2024-01-20T08:33:52Z
dc.date.available 2024-01-20T08:33:52Z
dc.date.issued 2022-02-19
dc.identifier.uri http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6561
dc.description.abstract With the Internet’s social digital content proliferation, sentiment analysis (SA) has gained a wide research interest in natural language processing (NLP). A little significant research has beendone intheBanglalanguagedomainbecauseofhavingintricategrammaticalstructuresinthetext.This paper focuses on SA in the context of the Bangla language. Firstly, a specific domain-based cat- egorical weighted lexicon data dictionary (LDD) is developed to analyze Bangla text sentiments. This LDD is developed by applying the concepts of normalization, tokenization, and stemmingto two Bangla datasets available in the GitHub repository. Secondly, a novel rule-based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed to detect sentence polarity. This al- gorithm considers parts of speech tagger words and special characters to generate a word score and extract polarity from a sentence and a blog. The BTSC algorithm, with the help of LDD is appliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeature matricesaredevelopedbyapplyingthetermfrequency-inversedocumentfrequency(tf-idf)tothe two datasets and the corresponding BTSC scores. Next, supervised machine learning classifiers are applied to the feature matrices. In the deep learning part, these polarities are then fed into the hybrid neural network and the preprocessed text as training samples. The preprocessed texts are formatted as a vectorization of words of unique numbers of pre-trained word embedding models. Word2Vec matrix with the top highest probability word is applied on the embedding layer as a weighted matrix to fit the DL models. This paper also presents a remarkably detailed analysis of selectiveDLmodelswithfine-tuning.Thefine-tuningincludestheuseofdropout,optimizerreg- ularization,learningrate,multiplelayers,filters,attentionmechanism,capsulelayers,transformer xvii xviii withprogressivetrainingalongwithvalidationandtestingaccuracy,precision,recallandF1-score. Experimental results indicate that the proposed new long short-term memory (LSTM) models are highlyaccurateinperformingSAtasks.Experimentalresultscorroborateourtheoreticalclaimand showtheefficiencyofourproposedapproachinbothmachinelearninganddeeplearningapproach. ResultsshowthatforthecaseofBiGramfeature,supportvectormachine(SVM)achievesthebest classification accuracy of 82.21%. For our proposed hierarchical attention-based LSTM (HAN- LSTM),DynamicroutingbasedcapsuleneuralnetworkwithBi-LSTM(D-CAPSNET-Bi-LSTM) and bidirectional encoder representations from Transformers (BERT) with LSTM (BERT-LSTM) model we achieved accuracy values of 78.52%, 80.82% and 84.18%respectively. en_US
dc.language.iso en en_US
dc.publisher Institute of Information and Communication Technology (IICT) en_US
dc.subject Natural language processing (Computer science) en_US
dc.title Bangla text sentiment analysis based on extended lexicon dictionary using supervised machine learning and deep learning algorithms en_US
dc.type Thesis-MSc en_US
dc.contributor.id 1017312022 en_US
dc.identifier.accessionNumber 119243
dc.contributor.callno 005.45/BHO/2022 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search BUET IR


Advanced Search

Browse

My Account