详细信息
Multi-label video classification via coupling attentional multiple instance learning with label relation graph * ( SCI-EXPANDED收录 EI收录)
文献类型:期刊文献
英文题名:Multi-label video classification via coupling attentional multiple instance learning with label relation graph *
作者:Li, Xuewei[1];Wu, Hongjun[1];Li, Mengzhu[1];Liu, Hongzhe[1]
通讯作者:Liu, HZ[1]
机构:[1]Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China
第一机构:北京联合大学北京市信息服务工程重点实验室
通讯机构:[1]corresponding author), Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China.|[11417103]北京联合大学北京市信息服务工程重点实验室;[11417]北京联合大学;
年份:2022
卷号:156
起止页码:53-59
外文期刊名:PATTERN RECOGNITION LETTERS
收录:;EI(收录号:20221211805596);Scopus(收录号:2-s2.0-85126378924);WOS:【SCI-EXPANDED(收录号:WOS:000789226600008)】;
基金:This work was supported by the major project of the Social Science Fund of Bei-jing "Research on Urban Development in the era of Big Data for the Refined Governance of Beijing"(Grant no. 19ZDA05).
语种:英文
外文关键词:Multi-label video classification; Multiple instance learning; Attentional feature learning; Label relation graph
摘要:Multi-label video classification is a challenging problem in pattern recognition field, as it is difficult to grasp the occurring localizations of a huge number of labels in videos. To solve this problem, we propose a general framework named MALL-CNN, i.e., Multi-Attention Label Relation Learning Convolutional Neural Network. MALL-CNN not only builds the correspondences between labels and videos by an attention mechanism, but also captures label co-occurrence by a graph learning approach. Specifically, we introduce multiple instance learning to composite a set of frame-level features into a video-level feature. Then, video-level feature is mapped into the content-aware category representations in an improved attentional manner. Further, these representations are enhanced by a series of label relation graphs, which transform global label relationships to the label relationships of current video. With the three processes, frame feature aggregation, video feature mapping, and label relationship construction can be achieved in MALL-CNN for multi-label video classification. Extensive experiments on real-world scene benchmark Youtube-8M verify that MALL-CNN with only frame feature surpasses the state of the arts with multi modal features as well as ensemble models.(c) 2022 Elsevier B.V. All rights reserved.
参考文献:
正在载入数据...