详细信息
S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification ( SCI-EXPANDED收录 EI收录)
文献类型:期刊文献
英文题名:S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification
作者:Wu, Hongjun[1,2];Xu, Cheng[1,2];Liu, Hongzhe[1,2]
第一作者:Wu, Hongjun
通讯作者:Liu, HZ[1];Liu, HZ[2]
机构:[1]Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China;[2]Beijing Union Univ, Inst Brain & Cognit Sci, Beijing 100101, Peoples R China
第一机构:北京联合大学北京市信息服务工程重点实验室
通讯机构:[1]corresponding author), Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China;[2]corresponding author), Beijing Union Univ, Inst Brain & Cognit Sci, Beijing 100101, Peoples R China.|[11417]北京联合大学;[11417103]北京联合大学北京市信息服务工程重点实验室;
年份:2022
卷号:22
期号:14
外文期刊名:SENSORS
收录:;EI(收录号:20223712707194);Scopus(收录号:2-s2.0-85135116141);WOS:【SCI-EXPANDED(收录号:WOS:000833214400001)】;
基金:This work was supported by the National Natural Science Foundation of China (Grant No. 62171042, 62102033, 61871039, 62006020, 61906017), the R&D Program of Beijing Municipal Education Commission(KZ202211417048), the Beijing Municipal Commission of Education Project (No.KM202111417001, KM201911417001), the Collaborative Innovation Center of Chaoyang(Grant No. CYXC2203), the Academic Research Projects of Beijing Union University(No.BPHR2020DZ02, ZB10202003, ZK40202101, ZK120202104).
语种:英文
外文关键词:aerial scene classification; multi-label learning; redundancy removing; label correlation; semantic disentanglement
摘要:Multi-label aerial scene image classification is a long-standing and challenging research problem in the remote sensing field. As land cover objects usually co-exist in an aerial scene image, modeling label dependencies is a compelling approach to improve the performance. Previous methods generally directly model the label dependencies among all the categories in the target dataset. However, most of the semantic features extracted from an image are relevant to the existing objects, making the dependencies among the nonexistant categories unable to be effectively evaluated. These redundant label dependencies may bring noise and further decrease the performance of classification. To solve this problem, we propose S-MAT, a Semantic-driven Masked Attention Transformer for multi-label aerial scene image classification. S-MAT adopts a Masked Attention Transformer (MAT) to capture the correlations among the label embeddings constructed by a Semantic Disentanglement Module (SDM). Moreover, the proposed masked attention in MAT can filter out the redundant dependencies and enhance the robustness of the model. As a result, the proposed method can explicitly and accurately capture the label dependencies. Therefore, our method achieves CF1s of 89.21%, 90.90%, and 88.31% on three multi-label aerial scene image classification benchmark datasets: UC-Merced Multi-label, AID Multi-label, and MLRSNet, respectively. In addition, extensive ablation studies and empirical analysis are provided to demonstrate the effectiveness of the essential components of our method under different factors.
参考文献:
正在载入数据...