Abstract:
In the rapidly evolving landscape of social media platforms, valuable insights into users’ mental well-being have emerged, particularly within the Bangla-speaking community. Against the backdrop of a global mental health predicament where roughly 21% of adults face a mental disorder and over half remain untreated, this thesis introduces a novel technique to detect mental health disorder risks from Bengali social media content. The research utilized a dataset of 7,131 Bengali expressions obtained from various social media platforms like Facebook, YouTube, Twitter, and Reddit. These terms were strongly associated with mental health and were confirmed by clinical experts. The research combined traditional machine learning methods such as Naïve Bayes, Decision Trees, Random Forests, SVM, and Logistic Regression with state-of-the-art deep learning techniques like LSTM, BiLSTM, and BERT. The research proposes a weighted ensemble of transformer models, including XLM-R, Bangla-BERT, and m-BERT, as key classifiers for accurately identifying mental health disorders in Bengali. This new model evaluates the classifiers’ softmax probabilities according to their initial outputs. The model achieves a noteworthy weighted f1-score of 97% in detecting mental health disorders, outperforming other established ML and DL standards with this sophisticated weighting methodology. Various feature extraction techniques, including BOW, Tf-IDF, word embedding, and transformer-based contextual word embedding, have been seamlessly integrated to extend the scope of analysis. This research paves a novel path for detecting mental health issues within Bengali social media, signaling timely and essential interventions.