详细信息

Multi-view Isolated sign language recognition based on cross-view and multi-level transformer ( SCI-EXPANDED收录 EI收录)

文献类型：期刊文献

英文题名：Multi-view Isolated sign language recognition based on cross-view and multi-level transformer

作者：Guan, Zhong[1,2];Hu, Yongli[1];Jiang, Huajie[1];Sun, Yanfeng[1];Yin, Baocai[1]

第一作者：Guan, Zhong;关忠

通讯作者：Hu, YL[1]

机构：[1]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China;[2]Beijing Union Univ, Special Educ Coll, 97 Beisihuan East Rd, Beijing 100101, Peoples R China

第一机构：Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China

通讯机构：[1]corresponding author), Beijing Univ Technol, Beijing Inst Artificial Intelligence, Sch Informat Sci & Technol, Beijing Key Lab Multimedia & Intelligent Software, 100 Pingleyuan, Beijing 100124, Peoples R China.

年份：2025

卷号：31

期号：3

外文期刊名：MULTIMEDIA SYSTEMS

收录：;EI(收录号：20251918366700);Scopus(收录号：2-s2.0-105004224266);WOS:【SCI-EXPANDED(收录号:WOS:001479674600001)】；

语种：英文

外文关键词：Isolated sign language recognition; Multi-view recognition; Multi-level transformer; Multi-view sign language dataset

摘要：Sign language serves as a critical communication medium for the deaf community, yet existing single-view recognition systems are limited in interpreting complex three-dimensional manual movements from monocular video sequences. Although multi-view analysis holds potential for improved spatial understanding, current methods lack effective mechanisms for cross-view feature correlation and adaptive multi-stream fusion. To address these challenges, we propose the Cross-view and Multi-level Transformer (CMTformer), a novel framework for isolated sign language recognition that hierarchically models spatiotemporal dependencies across viewpoints. The architecture integrates transformer-based modules to simultaneously capture dense cross-view correlations and distill high-level semantic relationships through multi-scale feature abstraction. Complementing this methodological advancement, we establish the Multi-View Chinese Sign Language (MVCSL) dataset under real-world conditions, addressing the critical shortage of multi-view benchmarking resources. Experimental evaluations demonstrate that CMTformer significantly outperforms conventional approaches in recognition robustness, particularly in processing intricate gesture dynamics through coordinated multi-view analysis. This study advances sign language recognition via interpretable cross-view modeling while providing an essential dataset for developing viewpoint-agnostic gesture understanding systems.

参考文献：

正在载入数据...

北京联合大学机构知识库

详细信息

Multi-view Isolated sign language recognition based on cross-view and multi-level transformer ( SCI-EXPANDED收录 EI收录)

参考文献：