Automated Deep Learning Approaches for Multimodal Emotion Recognition: A Review of Fusion Strategies, Modalities and Architectures
DOI:
https://doi.org/10.66108/mna.v4i3.103Keywords:
Multimodal Emotion Recognition, Deep Learning, Transformers, Fusion Strategies, Affective ComputingAbstract
Emotion recognition is one of the fields of artificial intelligence that has garnered significant attention and is one of the fast-moving branches due to the increasing demand of emotionally intelligent systems to improve Human-Computer Interaction (HCI). The initial studies in this field were mainly based on unimodal models and manually constructed feature models, which restrict their capabilities of accountability of human expressiveness of emotions and their contextual variability. The development of deep learning has radically changed the idea of emotion recognition by providing automatic learning of features and sound modeling of multifaceted affective behaviors. The given paper is a thorough review of Multimodal Emotion Recognition (MER) development history, specifically the combination of speech, textual, and facial modalities. We critically synthesize the separate models of each modality, and study how deep learning models have evolved over time since Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to state-of-the-art Transformer-based models that are able to capture long-range dependencies and cross-modal interactions. Moreover, we explore multimodal fusion techniques, including early and late fusion methods as well as advanced hybrid or attention-based fusion systems that integrate complementary knowledge in several modalities in a dynamic manner. Particular attention is given to recent findings that are connected to the issues related to low-resource and multilingual settings where the lack of data and the linguistic variation is an important impediment. This paper brings up the latest development in architectures and fusion methodology and proposes the latest trends, performance improvements, and the gaps to be addressed in MER that can offer important insights to the construction of robust, scalable and inclusive emotion-aware systems.
Downloads
Additional Files
Published
How to Cite
License
© This work is published by Machines and Algorithms and licensed under the terms of Creative Commons Attribution 4.0 International License (CC BY 4.0).
