Ensemble approach with insightful features for spoiler detection

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Islam, Dr. Md. Monirul
dc.contributor.author	Noor, Sabah Binte
dc.date.accessioned	2019-02-17T04:46:12Z
dc.date.available	2019-02-17T04:46:12Z
dc.date.issued	2018-03-21
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/5119
dc.description.abstract	Suspense is an important element to absorb an audience into a story. Early revealing of plot twists, climax, or endings may eliminate that suspense and therefore impair the audience enjoyment. Any content that have such critical information regarding an art of fiction is considered as a spoiler. Due to the heavy use of internet and smartphones, it has become impossible to prevent oneself from spoilers posted in popular social networks. The aim of this study is to develop an effective machine learning model to detect spoilers in text. Extracting relevant features that represent the concept of text efficiently is one of the major challenges regarding this problem. Therefore, we employ syntactically related word pairs, along with traditional bag-of-words, in our feature extraction technique. Naturally, the number of spoilers are significantly low in datasets compared to that of spoiler free texts. To tackle this imbalance in data distribution, we propose a novel distribution-based amalgam minority oversampling technique (DAMOT). It oversamples the dataset by a combination of original and synthetic minor instances based on the distribution over their classes. We also employ adaboost algorithm to enhance the performance of our model. Our proposed models have been tested extensively on IMDb (Internet Movie Database) reviews and DAMOT, with our feature extraction technique outperformed the baseline methods on a significant scale by bringing balance in different performance metrics.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of computer Science and Engineering	en_US
dc.subject	Machine learning	en_US
dc.title	Ensemble approach with insightful features for spoiler detection	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	0413052049	en_US
dc.identifier.accessionNumber	116816
dc.contributor.callno	006.31/SAB/2018	en_US