详细信息
Self-attention enhanced dynamic semantic multi-scale graph convolutional network for skeleton-based action recognition ( SCI-EXPANDED收录 EI收录)
文献类型:期刊文献
英文题名:Self-attention enhanced dynamic semantic multi-scale graph convolutional network for skeleton-based action recognition
作者:Liu, Shihao[1];Xu, Cheng[1];Dai, Songyin[1];Li, Nuoya[1];Pan, Weiguo[1];Xu, Bingxin[1];Liu, Hongzhe[1]
通讯作者:Liu, HZ[1]
机构:[1]Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China
第一机构:北京联合大学北京市信息服务工程重点实验室
通讯机构:[1]corresponding author), Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China.|[11417103]北京联合大学北京市信息服务工程重点实验室;[11417]北京联合大学;
年份:2025
卷号:162
外文期刊名:IMAGE AND VISION COMPUTING
收录:;EI(收录号:20253719128481);Scopus(收录号:2-s2.0-105015137799);WOS:【SCI-EXPANDED(收录号:WOS:001568924600001)】;
基金:This work was supported, the National Natural Science Foundation of China (Grant No. 62171042, U24A20331), the Key Project of the National Language Commission, China (No. ZDI145-110), the R&D Program of Beijing Municipal Education Commission, China (Grant No. KZ202211417048), the Project of Construction and Support for highlevel Innovative Teams of Beijing Municipal Institutions, China (Grant No. BPHR20220121), Beijing Natural Science Foundation, China (Grant No. 4232026, 4242020), the Academic Research Projects of Beijing Union University, China (No. ZK20202514).
语种:英文
外文关键词:Skeleton-based action recognition; Graph convolutional network; Self-attention; Multi-scale modeling
摘要:Skeleton-based action recognition has attracted increasing attention due to its efficiency and robustness in modeling human motion. However, existing graph convolutional approaches often rely on predefined topologies and struggle to capture high-level semantic relations and long-range dependencies. Meanwhile, transformer-based methods, despite their effectiveness in modeling global dependencies, typically overlook local continuity and impose high computational costs. Moreover, current multi-stream fusion strategies commonly ignore low-level complementary cues across modalities. To address these limitations, we propose SAD-MSNet, a Self-Attention enhanced Multi-Scale dynamic semantic graph convolutional network. SAD-MSNet integrates a region-aware multi-scale skeleton simplification strategy to represent actions at different levels of abstraction. It employs a semantic-aware spatial modeling module that constructs dynamic graphs based on node types, edge types, and topological priors, further refined by channel-wise attention and adaptive fusion. For temporal modeling, the network utilizes a six-branch structure that combines standard causal convolution, dilated joint-guided temporal convolutions with varying dilation rates, and a global pooling branch, enabling it to effectively capture both short-term dynamics and long-range temporal semantics. Extensive experiments on NTU RGB+D, NTU RGB+D 120, and N-UCLA demonstrate that SAD-MSNet achieves superior performance compared to state-of-the-art methods, while maintaining a compact and interpretable architecture.
参考文献:
正在载入数据...
