登录    注册    忘记密码

详细信息

基于时空注意力图卷积网络模型的人体骨架动作识别算法    

Human skeleton-based action recognition algorithm based on spatiotemporal attention graph convolutional network model

文献类型:期刊文献

中文题名:基于时空注意力图卷积网络模型的人体骨架动作识别算法

英文题名:Human skeleton-based action recognition algorithm based on spatiotemporal attention graph convolutional network model

作者:李扬志[1];袁家政[2];刘宏哲[1]

第一作者:李扬志

机构:[1]北京市信息服务工程重点实验室(北京联合大学),北京100101;[2]北京开放大学科研外事处,北京100081

第一机构:北京联合大学北京市信息服务工程重点实验室

年份:2021

卷号:41

期号:7

起止页码:1915-1921

中文期刊名:计算机应用

外文期刊名:journal of Computer Applications

收录:CSTPCD;;北大核心:【北大核心2020】;CSCD:【CSCD_E2021_2022】;

基金:国家自然科学资助基金项目(61871028,61871039,61906017,61802019);北京联合大学领军人才项目(BPHR2019AZ01);北京市教委项目(KM202111417001,KM201911417001);北京联合大学研究生科研创新项目(YZ2020K001)。

语种:中文

中文关键词:图卷积网络;人体骨架行为识别;注意力机制;人体关节点;视频行为理解;

外文关键词:Graph Convolutional Network(GCN);human skeleton-based action recognition;attention mechanism;human joint;video behavior understanding

摘要:针对现有的人体骨架动作识别算法不能充分发掘运动的时空特征问题,提出一种基于时空注意力图卷积网络(STA-GCN)模型的人体骨架动作识别算法。该模型包含空间注意力机制和时间注意力机制:空间注意力机制一方面利用光流特征中的瞬时运动信息定位运动显著的空间区域,另一方面在训练过程中引入全局平均池化及辅助分类损失使得该模型可以关注到具有判别力的非运动区域;时间注意力机制则自动地从长时复杂视频中挖掘出具有判别力的时域片段。将这二者融合到统一的图卷积网络(GCN)框架中,实现了端到端的训练。在Kinetics和NTU RGB+D两个公开数据集的对比实验结果表明,基于STA-GCN模型的人体骨架动作识别算法具有很强的鲁棒性与稳定性,与基于时空图卷积网络(ST-GCN)模型的识别算法相比,在Kinetics数据集上的Top-1和Top-5分别提升5.0和4.5个百分点,在NTURGB+D数据集的CS和CV上的Top-1分别提升6.2和6.7个百分点;也优于当前行为识别领域最先进(SOA)方法,如Res-TCN、STA-LSTM和动作-结构图卷积网络(AS-GCN)。结果表示,所提算法可以更好地满足人体行为识别的实际应用需求。
Aiming at the problem that the existing human skeleton-based action recognition algorithms cannot fully explore the temporal and spatial characteristics of motion,a human skeleton-based action recognition algorithm based on Spatiotemporal Attention Graph Convolutional Network(STA-GCN)model was proposed,which consisted of spatial attention mechanism and temporal attention mechanism.The spatial attention mechanism used the instantaneous motion information of the optical flow features to locate the spatial regions with significant motion on the one hand,and introduced the global average pooling and auxiliary classification loss during the training process to enable the model to focus on the non-motion regions with discriminability ability on the other hand.While the temporal attention mechanism automatically extracted the discriminative time-domain segments from the long-term complex video.Both of spatial and temporal attention mechanisms were integrated into a unified Graph Convolution Network(GCN)framework to enable the end-to-end training.Experimental results on Kinetics and NTU RGB+D datasets show that the proposed algorithm based on STA-GCN has strong robustness and stability,and compared with the benchmark algorithm based on Spatial Temporal Graph Convolutional Network(ST-GCN)model,the Top-1 and Top-5 on Kinetics are improved by 5.0 and 4.5 percentage points,respectively,and the Top-1 on CS and CV of NTU RGB+D dataset are also improved by 6.2 and 6.7 percentage points,respectively;it also outperforms the current State-Of-the-Art(SOA)methods in action recognition,such as Res-TCN(Residue Temporal Convolutional Network),STA-LSTM,and AS-GCN(Actional-Structural Graph Convolutional Network).The results indicate that the proposed algorithm can better meet the practical application requirements of human action recognition.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心