详细信息
SMART: Semantic-Aware Masked Attention Relational Transformer for Multi-label Image Recognition ( SCI-EXPANDED收录 EI收录)
文献类型:期刊文献
英文题名:SMART: Semantic-Aware Masked Attention Relational Transformer for Multi-label Image Recognition
作者:Wu, Hongjun[1];Xu, Cheng[1];Liu, Hongzhe[1]
第一作者:Wu, Hongjun
通讯作者:Liu, HZ[1]
机构:[1]Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China
第一机构:北京联合大学北京市信息服务工程重点实验室
通讯机构:[1]corresponding author), Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China.|[11417103]北京联合大学北京市信息服务工程重点实验室;[11417]北京联合大学;
年份:2022
卷号:29
起止页码:2158-2162
外文期刊名:IEEE SIGNAL PROCESSING LETTERS
收录:;EI(收录号:20224413035387);Scopus(收录号:2-s2.0-85140769869);WOS:【SCI-EXPANDED(收录号:WOS:000878159300001)】;
基金:This work was supported in part by the National Natural Science Foundation of China under Grants 62171042, 62102033, 61871039, 62006020, and 61906017, in part by BeijingKey Science andTechnology Project under Grant KZ202211417048, in part by the BeijingMunicipal Commission of Education Project under Grants KM202111417001 and KM201911417001, in part by the Collaborative Innovation Center for Visual Intelligence under Grant CYXC2011, and in part by the Academic Research Projects of Beijing Union University under Grants BPHR2020DZ02, ZB10202003, ZK40202101, and ZK120202104. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Saurabh Prasad.
语种:英文
外文关键词:Transformers; Task analysis; Image recognition; Semantics; Correlation; Convolution; Visualization; Multi-label image recognition; transformer; label correlation; masked attention
摘要:As objects usually co-exist in an image, learning the label co-occurrence is a compelling approach to improving the performance of multi-label image recognition. However, the dependencies among the non-exist categories in an image cannot be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. Therefore, we proposed SMART, a Semantic-aware Masked Attention Relational Transformer for Multi-label Image Recognition for multi-label image recognition tasks. In addition to leveraging Transformer to model the inter-class dependencies, the proposed masked attention filters out the redundant dependencies among the non-exist categories. SMART is able to explicitly and accurately capture the label dependencies without extra word embeddings. Moreover, our method achieves new state-of-the-art results on two benchmarks for multi-label image recognition: MS-COCO 2014 and NUS-WIDE. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors.
参考文献:
正在载入数据...