登录    注册    忘记密码

详细信息

W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION  ( EI收录)  

文献类型:期刊文献

英文题名:W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION

作者:Li, Mengzhu[1]; Wu, Hongjun[1]; Liu, Yongcheng[2]; Liu, Hongzhe[1]; Xu, Cheng[1]; Li, Xuewei[1]

第一作者:Li, Mengzhu

机构:[1] Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, China; [2] Institute of Automation, Chinese Academy of Sciences, Beijing, China

第一机构:北京联合大学北京市信息服务工程重点实验室

年份:2022

卷号:2022-May

起止页码:2195-2199

外文期刊名:ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

收录:EI(收录号:20222312199221)

基金:This work was supported, the National Natural Science Foundation of China (Grant No. 61871039, 62171042), the Academic Research Projects of Beijing Union University(No. ZB10202003, ZK40202101, ZK120202104). *corresponding author: liuhongzhe@buu.edu.cn

语种:英文

外文关键词:Arts computing - Computer vision

摘要:Weakly-supervised temporal action localization (WTAL) is a long-standing and challenging research problem in video signal analysis. It is to localize the action segments in the video given only video-level labels. The key to this task is understanding how the diverse actions interact. In this paper, we propose W-ART, a relation Transformer to explicitly capture the relationships between action segments. We devise a new effective Transformer architecture and construct new training loss functions for WTAL. Further, we propose a dedicated query mechanism to satisfy the different feature preferences between classification and localization. Thanks to these designs, our W-ART can accurately localize the diverse actions even in weakly-supervised setting. Extensive evaluation and empirical analysis show that our method outperforms the state of the arts on two challenging benchmarks, Charades and THUMOS14. ? 2022 IEEE

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心