The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Recentemente, o reconhecimento e a análise automatizados das emoções humanas têm atraído cada vez mais atenção de comunidades multidisciplinares. No entanto, é um desafio utilizar simultaneamente as informações emocionais de múltiplas modalidades. Estudos anteriores exploraram diferentes métodos de fusão, mas focaram principalmente na interação intermodal ou na interação intramodal. Nesta carta, propomos uma nova estratégia de fusão em dois estágios chamada fluxo de atenção de modalidade (MAF) para modelar as interações intra e intermodalidade simultaneamente em uma estrutura unificada de ponta a ponta. Os resultados experimentais mostram que a abordagem proposta supera os métodos de fusão tardia amplamente utilizados e atinge um desempenho ainda melhor quando o número de blocos MAF empilhados aumenta.
Dongni HU
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Chengxin CHEN
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Junfeng LI
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Yonghong YAN
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Qingwei ZHAO
Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Dongni HU, Chengxin CHEN, Pengyuan ZHANG, Junfeng LI, Yonghong YAN, Qingwei ZHAO, "A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 8, pp. 1391-1394, August 2021, doi: 10.1587/transinf.2021EDL8002.
Abstract: Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8002/_p
Copiar
@ARTICLE{e104-d_8_1391,
author={Dongni HU, Chengxin CHEN, Pengyuan ZHANG, Junfeng LI, Yonghong YAN, Qingwei ZHAO, },
journal={IEICE TRANSACTIONS on Information},
title={A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition},
year={2021},
volume={E104-D},
number={8},
pages={1391-1394},
abstract={Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.},
keywords={},
doi={10.1587/transinf.2021EDL8002},
ISSN={1745-1361},
month={August},}
Copiar
TY - JOUR
TI - A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1391
EP - 1394
AU - Dongni HU
AU - Chengxin CHEN
AU - Pengyuan ZHANG
AU - Junfeng LI
AU - Yonghong YAN
AU - Qingwei ZHAO
PY - 2021
DO - 10.1587/transinf.2021EDL8002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.
ER -