Automated Deep Learning Approaches for Multimodal Emotion Recognition: A Review of Fusion Strategies, Modalities and Architectures

Raja Abdulrahman; Aleena Jamil; Adeen Amjad; SHAFIQ HUSSAIN; Muhammad Azhar; Zunaira Aslam; Ifra Shabbir; Waqar Ahmad; Arslan Ali Mansab; Muhammad Hamza Akbar; Muhammad Waqas

doi:10.66108/mna.v4i3.103

Authors

Raja Abdulrahman University of Sahiwal, Sahiwal, 57000, Pakistan
Aleena Jamil University of Sahiwal, Sahiwal, 57000, Pakistan
Adeen Amjad University of Sahiwal, Sahiwal, 57000, Pakistan
Shafiq Hussain University of Sahiwal, Sahiwal, 57000, Pakistan
Muhammad Azhar Hong Kong Shue Yan University, Hong Kong SAR, China
Zunaira Aslam University of Sahiwal, Sahiwal, 57000, Pakistan
Ifra Shabbir Comsats University Islamabad, Islamabad, 44000, Pakistan
Waqar Ahmad University of Sahiwal, Sahiwal, 57000, Pakistan
Arslan Ali Mansab University of Sahiwal, Sahiwal, 57000, Pakistan
Muhammad Hamza Akbar University of Sahiwal, Sahiwal, 57000, Pakistan
Muhammad Waqas University of Sahiwal, Sahiwal, 57000, Pakistan

DOI:

https://doi.org/10.66108/mna.v4i3.103

Keywords:

Multimodal Emotion Recognition, Deep Learning, Transformers, Fusion Strategies, Affective Computing

Abstract

Emotion recognition is one of the fields of artificial intelligence that has garnered significant attention and is one of the fast-moving branches due to the increasing demand of emotionally intelligent systems to improve Human-Computer Interaction (HCI). The initial studies in this field were mainly based on unimodal models and manually constructed feature models, which restrict their capabilities of accountability of human expressiveness of emotions and their contextual variability. The development of deep learning has radically changed the idea of emotion recognition by providing automatic learning of features and sound modeling of multifaceted affective behaviors. The given paper is a thorough review of Multimodal Emotion Recognition (MER) development history, specifically the combination of speech, textual, and facial modalities. We critically synthesize the separate models of each modality, and study how deep learning models have evolved over time since Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to state-of-the-art Transformer-based models that are able to capture long-range dependencies and cross-modal interactions. Moreover, we explore multimodal fusion techniques, including early and late fusion methods as well as advanced hybrid or attention-based fusion systems that integrate complementary knowledge in several modalities in a dynamic manner. Particular attention is given to recent findings that are connected to the issues related to low-resource and multilingual settings where the lack of data and the linguistic variation is an important impediment. This paper brings up the latest development in architectures and fusion methodology and proposes the latest trends, performance improvements, and the gaps to be addressed in MER that can offer important insights to the construction of robust, scalable and inclusive emotion-aware systems.

Downloads

Download data is not yet available.

Automated Deep Learning Approaches for Multimodal Emotion Recognition: A Review of Fusion Strategies, Modalities and Architectures

Authors

DOI:

Keywords:

Abstract

Downloads

Additional Files

Published

How to Cite

Issue

Section

Categories

License

ISSN

Sidebar1

Information

flagcounter

Indexing