Abstract:
The social media which enables an ample communication opportunitiesyis also increases the risk and threatening situations online especially for the young community all over the world. Automatic detection of potentially harmful messages is the paramount weapon for successful prevention. Though mentionable amount of research work has been performed for English content, non-English content, especially Bangla content was always aside. But, with being 7th most spoken language and with the popularity of Unicode system and growing use of Internet, the use of Bangla over social media is increasing. But, very few works have been done on Bangla text for social media activity monitoring due to a lack of a large number of annotated corpora, named dictionaries and morphological analyzer, which demands in-depth analysis on Bangladesh’s perspective. To combat such issues on online posts or conversations, use of machine learning algorithms and user specific data analysis shows better accuracy. Various machine learning based techniques are proposed in literature for English language. But, solving the issue by applying available techniques is very content specific, which means that false detection can occur if contents changed from formal English to verbal abuse or sarcasm. Also, performance may vary due to linguistic differences between English and non-English contents and the socio-emotional behavior of the study population. This thesis studyprojectexplores the performance and accuracy of some widely used machine learning approach of English text on Bangla text. Besides, impact of user specific information, i.e., location, age, gender, no. of like, no. of comments etc., is analyzed for Bangla cyber bullying detection. Experimental resultsevident shows that, when only post or comments is used to classify, Support Vector Machine (SVM) is the best performing algorithm for Bangla bullying detection with 9795.40% accuracy. On the other hand, KNN(3- Nearest) achieves best accuracy with 97.73% while combining user specific data with user’s post and for the same case Support Vector Machine (SVM) achieves 97.27% accuracy which is very close to the best one. As SVM performs better in both cases, hence, SVM is chosen for implementing the model on social media. As the project outcome, a java web based solution has been developed and validated the accuracy of thesystem generated indicates that the results are close to the result byhuman observer and found accurate in most of the cases.