详细信息
W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION ( EI收录)
文献类型:期刊文献
英文题名:W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
作者:Li, Mengzhu[1]; Wu, Hongjun[1]; Liu, Yongcheng[2]; Liu, Hongzhe[1]; Xu, Cheng[1]; Li, Xuewei[1]
第一作者:Li, Mengzhu
机构:[1] Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, China; [2] Institute of Automation, Chinese Academy of Sciences, Beijing, China
第一机构:北京联合大学北京市信息服务工程重点实验室
年份:2022
卷号:2022-May
起止页码:2195-2199
外文期刊名:ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
收录:EI(收录号:20222312199221)
基金:This work was supported, the National Natural Science Foundation of China (Grant No. 61871039, 62171042), the Academic Research Projects of Beijing Union University(No. ZB10202003, ZK40202101, ZK120202104). *corresponding author: liuhongzhe@buu.edu.cn
语种:英文
外文关键词:Arts computing - Computer vision
摘要:Weakly-supervised temporal action localization (WTAL) is a long-standing and challenging research problem in video signal analysis. It is to localize the action segments in the video given only video-level labels. The key to this task is understanding how the diverse actions interact. In this paper, we propose W-ART, a relation Transformer to explicitly capture the relationships between action segments. We devise a new effective Transformer architecture and construct new training loss functions for WTAL. Further, we propose a dedicated query mechanism to satisfy the different feature preferences between classification and localization. Thanks to these designs, our W-ART can accurately localize the diverse actions even in weakly-supervised setting. Extensive evaluation and empirical analysis show that our method outperforms the state of the arts on two challenging benchmarks, Charades and THUMOS14. ? 2022 IEEE
参考文献:
正在载入数据...
