登录    注册    忘记密码

详细信息

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips  ( SCI-EXPANDED收录)  

文献类型:期刊文献

英文题名:Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

作者:Pan, Bei[1];Hirota, Kaoru[1];Jia, Zhiyang[1];Zhao, Linhui[2,3];Jin, Xiaoming[2,3];Dai, Yaping[1]

第一作者:Pan, Bei

通讯作者:Jia, ZY[1];Zhao, LH[2];Zhao, LH[3]

机构:[1]Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China;[2]Beijing Union Univ, Coll Robot, Beijing 100020, Peoples R China;[3]Beijing Engn Res Ctr Smart Mech Innovat Design Se, Beijing 100020, Peoples R China

第一机构:Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China

通讯机构:[1]corresponding author), Beijing Inst Technol, Sch Automat, Beijing 100081, Peoples R China;[2]corresponding author), Beijing Union Univ, Coll Robot, Beijing 100020, Peoples R China;[3]corresponding author), Beijing Engn Res Ctr Smart Mech Innovat Design Se, Beijing 100020, Peoples R China.|[1141739]北京联合大学机器人学院;[11417]北京联合大学;

年份:0

外文期刊名:JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING

收录:;WOS:【SCI-EXPANDED(收录号:WOS:000677939400001)】;

基金:This work was supported by the Open Foundation of Beijing Engineering Research Center of Smart Mechanical Innovation Design Service under Grant No. KF2019302, the General Projects of Science and Technology Plan of Beijing Municipal Commission of Education under Grant No. KM202011417005, and the National Talents Foundation under Grant No. WQ20141100198.

语种:英文

外文关键词:Emotion recognition; Multimodal fusion; Evolutionary optimization; Feature selection; Extreme learning machine

摘要:Multimodal fusion-based emotion recognition has attracted increasing attention in affective computing because different modalities can achieve information complementation. One of the main challenges for reliable and effective model design is to define and extract appropriate emotional features from different modalities. In this paper, we present a novel multimodal emotion recognition framework to estimate categorical emotions, where visual and audio signals are utilized as multimodal input. The model learns neural appearance and key emotion frame using a statistical geometric method, which acts as a preprocesser for saving computation power. Discriminative emotion features expressed from visual and audio modalities are extracted through evolutionary optimization, and then fed to the optimized extreme learning machine (ELM) classifiers for unimodal emotion recognition. Finally, a decision-level fusion strategy is applied to integrate the results of predicted emotions by the different classifiers to enhance the overall performance. The effectiveness of the proposed method is demonstrated through three public datasets, i.e., the acted CK+ dataset, the acted Enterface05 dataset, and the spontaneous BAUM-1s dataset. An average recognition rate of 93.53% on CK+, 91.62% on Enterface05, and 60.77% on BAUM-1s are obtained. The emotion recognition results acquired by fusing visual and audio predicted emotions are superior to both recognition of unimodality and concatenation of individual features.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心