A Deep Ensemble Approach of Anger Detection From Audio-Textual Conversation

BUET ILS
BUET Institutional Repository: Home
→
Dissertations/Theses
→
Dissertations/Theses - Department of Computer Science and Engineering
→
View Item

dc.contributor.advisor	Bayzid, Dr. Md. Shamsuzzoha
dc.contributor.author	Nahar, Mahjabin
dc.date.accessioned	2023-08-08T04:27:42Z
dc.date.available	2023-08-08T04:27:42Z
dc.date.issued	2022-05-15
dc.identifier.uri	http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6427
dc.description.abstract	Anger detection from conversations has many real-life applications that include improving interpersonal communications, providing customer services, and enhancing workplace performance. Despite its numerous applications in a variety of domains, anger is one of the least studied basic human emotions. The existing works on anger detection mostly deal with audio-only data, though text transcriptions can be directly obtained from spoken conversations. In this thesis, we propose novel deep learning-based approaches for offline and online anger detection from audio-textual data obtained from real-life conversations. Offline anger detection deals with detecting anger from a pre-collected audio-textual conversation, while online anger detection predicts anger in the subsequent utterances of a conversation from the previous utterances. For offline anger detection, we introduce an ensemble approach that combines handcrafted acoustic features, SincNet-based raw waveform features, and BERT-based textual features in a mid-level fusion scheme within an attention-based CNN architecture. In addition, the model includes a gender classifier to incorporate gender information into offline anger detection. On the other hand, for online anger detection, which predicts the anger of future conversational utterances from current (and past) utterances, we propose a transformer-based technique that combines audio and textual features in a mid-level fusion scheme, utilizing an ensemble-based downstream classifier. We demonstrate the efficacy of our proposed approaches using two data sets: the Bengali call-center data set and the IEMOCAP data set. Experimental results show that our proposed approaches outperform the state-of-the-art baselines by a significant margin. For offline anger recognition, our model achieves an F1 score of 85.5% on the Bengali call-center data set and 91.4% on the IEMOCAP data set. For online anger recognition, our model yields an F1 score of 66.9% on the Bengali call-center data set and 67.7% on the IEMOCAP data set. Additionally, we vary different utterance parameters, such as the numbers of input and output utterances and observe their effect on the performance of anger detection.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering (CSE)	en_US
dc.subject	Machine learning	en_US
dc.title	A Deep Ensemble Approach of Anger Detection From Audio-Textual Conversation	en_US
dc.type	Thesis-MSc	en_US
dc.contributor.id	1017052003	en_US
dc.identifier.accessionNumber	119097
dc.contributor.callno	006.31/MAH/2022	en_US