Low-rank Multimodal Fusion Algorithm Based on Context Modeling

Zongwen Bai,
Xiaohuan Chen,
Meili Zhou,
Tingting Yi,
Wei-Che Chien,

Abstract


As an important part of human daily life, video contains rich emotion information. Therefore, it is a current research trend to find efficient approaches to conducting emotional analysis on videos. Based on tensor fusion, we propose a low-rank multimodal fusion context modeling. At the beginning, modality information is preprocessed by GRU (Gate Recurrent Unit) in Recurrent Neural Network. We construct semantic dependencies to convey contextual information in the context of the video. The proposed model improves performance of applied emotion classification. Additionally, LMF (Low-rank Tensor Multimodal Fusion) with the advantage of end-to-end learning is implemented as a fusion mechanism to improve classification efficiency. We implemented the experiments on CMU-MOSI, POM, and IEMOCAP of multi-modal sentiment analysis, speaker traits and emotion recognition. And results show that our method improved the performance by a margin of 2.9%, 1.3%, and 12.2% respectively contrast with TFN (Tensor Fusion Network).


Citation Format:
Zongwen Bai, Xiaohuan Chen, Meili Zhou, Tingting Yi, Wei-Che Chien, "Low-rank Multimodal Fusion Algorithm Based on Context Modeling," Journal of Internet Technology, vol. 22, no. 4 , pp. 913-921, Jul. 2021.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com