详细信息

TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION ( EI收录)

文献类型：期刊文献

英文题名：TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION

作者：Wu, Hongjun[1]; Li, Mengzhu[1]; Liu, Yongcheng[2]; Liu, Hongzhe[1]; Xu, Cheng[1]; Li, Xuewei[1]

第一作者：Wu, Hongjun

通讯作者：Liu, H.[1]

机构：[1] Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, China; [2] Institute of Automation, Chinese Academy of Sciences, Beijing, China

第一机构：北京联合大学北京市信息服务工程重点实验室

年份：2022

卷号：2022-May

起止页码：1965-1969

外文期刊名：ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

收录：EI(收录号：20222912369200)

基金：This work was supported, the National Natural Science Foundation of China (Grant No. 61871039, 62171042), the Academic Research Projects of Beijing Union University(No. ZB10202003, ZK40202101, ZK120202104). *corresponding author: liuhongzhe@buu.edu.cn

语种：英文

外文关键词：Classification (of information) - Computer vision

摘要：Multi-label video classification (MLVC) is a long-standing and challenging research problem in video signal analysis. Generally, there exist many complex action labels in real-world videos and these actions are with inherent dependencies at both spatial and temporal domains. Motivated by this observation, we propose TranSTL, a spatial-temporal localization Transformer framework for MLVC task. In addition to leverage global action label co-occurrence, we also propose a novel plug-and-play Spatial Temporal Label Dependency (STLD) layer in TranSTL. STLD not only dynamically models the label co-occurrence in a video by self-attention mechanism, but also fully captures spatial-temporal label dependencies using cross-attention strategy. As a result, our TranSTL is able to explicitly and accurately grasp the diverse action labels at both spatial and temporal domains. Extensive evaluation and empirical analysis show that TranSTL achieves superior performance over the state of the arts on two challenging benchmarks, Charades and Multi-Thumos. ? 2022 IEEE

参考文献：

正在载入数据...

北京联合大学机构知识库

详细信息

TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION ( EI收录)

参考文献：