详细信息
融合内在拓扑与多尺度时间特征的骨架动作识别 ( EI收录)
Skeleton Action Recognition by Integrating Intrinsic Topology and Multi-Scale Time Features
文献类型:期刊文献
中文题名:融合内在拓扑与多尺度时间特征的骨架动作识别
英文题名:Skeleton Action Recognition by Integrating Intrinsic Topology and Multi-Scale Time Features
作者:王琪[1];何宁[2]
机构:[1]北京联合大学北京市信息服务工程重点实验室,北京100101;[2]北京联合大学智慧城市学院,北京100101
第一机构:北京联合大学北京市信息服务工程重点实验室
年份:2025
卷号:61
期号:4
起止页码:150-157
中文期刊名:计算机工程与应用
外文期刊名:Computer Engineering and Applications
收录:;EI(收录号:20251017993824);北大核心:【北大核心2023】;
基金:国家自然科学基金(62272049,62236006,62172045);北京市教委科技项目(KM202111417009);国家重点研发计划(2018AAA0100804)。
语种:中文
中文关键词:骨架动作识别;图卷积;内在拓扑;多尺度;信息融合
外文关键词:skeleton action recognition;graph convolution;intrinsic topology;multi-scale;information fusion
摘要:图卷积网络在基于骨架的人体动作识别任务中发挥着关键作用。为了解决现有的图卷积网络忽略内在关系,时间卷积功能受限,以及未能充分探索关节与骨骼之间潜在功能相关性等问题,提出一种融合内在拓扑与多尺度时间特征的骨架动作识别方法。为推断上下文内在拓扑关系,模型利用多头自注意力机制和共享拓扑构建内在拓扑空间图卷积模块;基于复杂的动作序列分析构建多尺度时间卷积模块,旨在扩展时间卷积结构并捕捉多尺度时间特征;模型搭建关节和骨骼信息交互桥梁,实现两者信息的有效传输和融合,以便更深入地探索它们之间的功能相关性。对所提出的方法进行验证,在NTU-RGB+D 60数据集上取得了CS基准91.5%和CV基准96.9%的识别准确率,在NTU-RGB+D 120数据集上分别取得了C-Sub基准89.0%和C-Set基准90.8%的准确率。实验结果表明所提出方法能够更加有效地提取骨架时空特征,进而提升识别精度。
Graph convolutional networks play a crucial role in skeleton based human action recognition tasks.In order to solve the problems of existing graph convolutional networks ignoring intrinsic relationships,limited time convolution function,and insufficient exploration of potential functional correlations between joints and bones,a skeleton action recog-nition method integrating intrinsic topology and multi-scale time features is proposed.In order to infer the intrinsic topolog-ical relationships of the context,the model utilizes multi-head self-attention mechanism and shared topology to construct an intrinsic topological space graph convolution module.A multi-scale time convolution module is constructed based on complex action sequence analysis,aiming to expand the time convolution structure and capture multi-scale time features.The model builds a bridge for the interaction of joint and bone information,achieving effective transmission and fusion of both information,in order to further explore the functional correlation between them.The proposed method is validated,on the NTU-RGB+D 60 dataset,achieving a recognition accuracy of 91.5%for CS benchmark and 96.9%for CV bench-mark,on the NTU-RGB+D 120 dataset,achieving an accuracy of 89.0%for C-Sub benchmark and 90.8%for C-Set benchmark,respectively.The experimental results show that the proposed method can more effectively extract skeleton spatio-temporal features and improve recognition accuracy.
参考文献:
正在载入数据...