登录    注册    忘记密码

详细信息

基于注意力机制的3D DenseNet人体动作识别方法    

Human Motion Recognition Method Based on Attention Mechanism of 3D DenseNet

文献类型:期刊文献

中文题名:基于注意力机制的3D DenseNet人体动作识别方法

英文题名:Human Motion Recognition Method Based on Attention Mechanism of 3D DenseNet

作者:张聪聪[1];何宁[2];孙琪翔[1];尹晓杰[2]

第一作者:张聪聪

机构:[1]北京联合大学北京市信息服务工程重点实验室,北京100101;[2]北京联合大学智慧城市学院,北京100101

第一机构:北京联合大学北京市信息服务工程重点实验室

年份:2021

卷号:47

期号:11

起止页码:313-320

中文期刊名:计算机工程

外文期刊名:Computer Engineering

收录:CSTPCD;;北大核心:【北大核心2020】;CSCD:【CSCD_E2021_2022】;

基金:国家自然科学基金(61872042,61572077);北京市教委科技重点项目(KZ201911417048);北京联合大学人才强校优选计划(BPHR2020AZ01,BPHR2020EZ01);北京联合大学研究生科研创新项目(YZ2020K001);“十三五”时期北京市属高校高水平教师队伍建设支持计划(CIT&TCD 201704069)。

语种:中文

中文关键词:动作识别;注意力机制;三维DenseNet;双流网络;特征融合

外文关键词:motion recognition;attention mechanism;3D DenseNet;two-stream network;feature fusion

摘要:传统人体动作识别算法无法充分利用视频中人体动作的时空信息,且识别准确率较低。提出一种新的三维密集卷积网络人体动作识别方法。将双流网络作为基本框架,在空间网络中运用添加注意力机制的三维密集网络提取视频中动作的表观信息特征,结合时间网络对连续视频序列运动光流的运动信息进行特征提取,经过时空特征和分类层的融合后得到最终的动作识别结果。同时为更准确地提取特征并对时空网络之间的相互作用进行建模,在双流网络之间加入跨流连接对时空网络进行卷积层的特征融合。在UCF101和HMDB51数据集上的实验结果表明,该模型识别准确率分别为94.52%和69.64%,能够充分利用视频中的时空信息,并提取运动的关键信息。
Traditional human motion recognition algorithms cannot fully utilize the spatial and temporal information of human motions in videos,and are limited in the recognition accuracy.To address the problem,a three-dimensional dense convolutional network is proposed for recognizing human motions in videos.The model takes the two-stream network as the basic framework,using the 3D dense network with an attention mechanism for the spatial network to extract the apparent information features of human motions in the videos.On this basis,the temporal network is also used to extract the motion information features of the optical flows in the continuous video sequence.Then the spatio-temporal features and the classification layer are fused to obtain the final motion recognition accuracy.To extract features more accurately and model the interactions between the spatio-temporal networks,cross-stream connections are added between the twostream networks to fuse the features at the convolutional layer of spatio-temporal networks.The experimental results show that the proposed model exhibits a recognition accuracy of 94.52%on the UCF101 dataset and 69.64%on the HMDB51 dataset.The model can make full use of the spatio-temporal information in the video to extract the key information of motions.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心