Bangla text sentiment analysis based on extended lexicon dictionary using supervised machine learning and deep learning algorithms

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Institute of Information and Communication Technology
→
View Item

Bangla text sentiment analysis based on extended lexicon dictionary using supervised machine learning and deep learning algorithms

Bhowmik, Nitish Ranjan

URI: http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6561

Date: 2022-02-19

Abstract:

With the Internet’s social digital content proliferation, sentiment analysis (SA) has gained a wide research interest in natural language processing (NLP). A little significant research has beendone intheBanglalanguagedomainbecauseofhavingintricategrammaticalstructuresinthetext.This paper focuses on SA in the context of the Bangla language. Firstly, a specific domain-based cat- egorical weighted lexicon data dictionary (LDD) is developed to analyze Bangla text sentiments. This LDD is developed by applying the concepts of normalization, tokenization, and stemmingto two Bangla datasets available in the GitHub repository. Secondly, a novel rule-based algorithm termed as Bangla Text Sentiment Score (BTSC) is developed to detect sentence polarity. This al- gorithm considers parts of speech tagger words and special characters to generate a word score and extract polarity from a sentence and a blog. The BTSC algorithm, with the help of LDD is appliedtoextractsentimentsbygeneratingscoresofthetwoBangladatasets.Thirdly,twofeature matricesaredevelopedbyapplyingthetermfrequency-inversedocumentfrequency(tf-idf)tothe two datasets and the corresponding BTSC scores. Next, supervised machine learning classifiers are applied to the feature matrices. In the deep learning part, these polarities are then fed into the hybrid neural network and the preprocessed text as training samples. The preprocessed texts are formatted as a vectorization of words of unique numbers of pre-trained word embedding models. Word2Vec matrix with the top highest probability word is applied on the embedding layer as a weighted matrix to fit the DL models. This paper also presents a remarkably detailed analysis of selectiveDLmodelswithfine-tuning.Thefine-tuningincludestheuseofdropout,optimizerreg- ularization,learningrate,multiplelayers,filters,attentionmechanism,capsulelayers,transformer xvii xviii withprogressivetrainingalongwithvalidationandtestingaccuracy,precision,recallandF1-score. Experimental results indicate that the proposed new long short-term memory (LSTM) models are highlyaccurateinperformingSAtasks.Experimentalresultscorroborateourtheoreticalclaimand showtheefficiencyofourproposedapproachinbothmachinelearninganddeeplearningapproach. ResultsshowthatforthecaseofBiGramfeature,supportvectormachine(SVM)achievesthebest classification accuracy of 82.21%. For our proposed hierarchical attention-based LSTM (HAN- LSTM),DynamicroutingbasedcapsuleneuralnetworkwithBi-LSTM(D-CAPSNET-Bi-LSTM) and bidirectional encoder representations from Transformers (BERT) with LSTM (BERT-LSTM) model we achieved accuracy values of 78.52%, 80.82% and 84.18%respectively.

Show full item record

Files in this item

Name: Full Thesis.pdf

Size: 1.919Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Dissertations/Theses - Institute of Information and Communication Technology
Post graduate dissertations (Theses) of Institute of Information and Communication Technology (IICT)

Bangla text sentiment analysis based on extended lexicon dictionary using supervised machine learning and deep learning algorithms

Bangla text sentiment analysis based on extended lexicon dictionary using supervised machine learning and deep learning algorithms

Abstract:

Files in this item

This item appears in the following Collection(s)

Search BUET IR

Browse

All of IR

This Collection

My Account