Abstract:
Suspense is an important element to absorb an audience into a story. Early revealing of plot twists, climax, or endings may eliminate that suspense and therefore impair the audience enjoyment. Any content that have such critical information regarding an art of fiction is considered as a spoiler. Due to the heavy use of internet and smartphones, it has become impossible to prevent oneself from spoilers posted in popular social networks. The aim of this study is to develop an effective machine learning model to detect spoilers in text. Extracting relevant features that represent the concept of text efficiently is one of the major challenges regarding this problem. Therefore, we employ syntactically related word pairs, along with traditional bag-of-words, in our feature extraction technique. Naturally, the number of spoilers are significantly low in datasets compared to that of spoiler free texts. To tackle this imbalance in data distribution, we propose a novel distribution-based amalgam minority oversampling technique (DAMOT). It oversamples the dataset by a combination of original and synthetic minor instances based on the distribution over their classes. We also employ adaboost algorithm to enhance the performance of our model. Our proposed models have been tested extensively on IMDb (Internet Movie Database) reviews and DAMOT, with our feature extraction technique outperformed the baseline methods on a significant scale by bringing balance in different performance metrics.