登录    注册    忘记密码

详细信息

Multi-view Isolated sign language recognition based on cross-view and multi-level transformer  ( SCI-EXPANDED收录 EI收录)  

文献类型:期刊文献

英文题名:Multi-view Isolated sign language recognition based on cross-view and multi-level transformer

作者:Guan, Zhong[1,2];Hu, Yongli[1];Jiang, Huajie[1];Sun, Yanfeng[1];Yin, Baocai[1]

第一作者:关忠;Guan, Zhong

通讯作者:Hu, YL[1]

机构:[1]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China;[2]Beijing Union Univ, Special Educ Coll, 97 Beisihuan East Rd, Beijing 100101, Peoples R China

第一机构:Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China

通讯机构:[1]corresponding author), Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China.

年份:2025

卷号:31

期号:3

外文期刊名:MULTIMEDIA SYSTEMS

收录:;EI(收录号:20251918366700);Scopus(收录号:2-s2.0-105004224266);WOS:【SCI-EXPANDED(收录号:WOS:001479674600001)】;

语种:英文

外文关键词:Isolated sign language recognition; Multi-view recognition; Multi-level transformer; Multi-view sign language dataset

摘要:Sign language serves as a critical communication medium for the deaf community, yet existing single-view recognition systems are limited in interpreting complex three-dimensional manual movements from monocular video sequences. Although multi-view analysis holds potential for improved spatial understanding, current methods lack effective mechanisms for cross-view feature correlation and adaptive multi-stream fusion. To address these challenges, we propose the Cross-view and Multi-level Transformer (CMTformer), a novel framework for isolated sign language recognition that hierarchically models spatiotemporal dependencies across viewpoints. The architecture integrates transformer-based modules to simultaneously capture dense cross-view correlations and distill high-level semantic relationships through multi-scale feature abstraction. Complementing this methodological advancement, we establish the Multi-View Chinese Sign Language (MVCSL) dataset under real-world conditions, addressing the critical shortage of multi-view benchmarking resources. Experimental evaluations demonstrate that CMTformer significantly outperforms conventional approaches in recognition robustness, particularly in processing intricate gesture dynamics through coordinated multi-view analysis. This study advances sign language recognition via interpretable cross-view modeling while providing an essential dataset for developing viewpoint-agnostic gesture understanding systems.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心