详细信息
Channel selection and local attention transformer model for semantic segmentation on UAV remote sensing scene ( SCI-EXPANDED收录 EI收录)
文献类型:期刊文献
英文题名:Channel selection and local attention transformer model for semantic segmentation on UAV remote sensing scene
作者:Liu, Da[1,2];Long, Hao[1,2];Liu, Zhenbao[3]
通讯作者:Long, H[1]
机构:[1]Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing, Peoples R China;[2]Beijing Union Univ, Coll Robot, Beijing, Peoples R China;[3]Northwestern Polytech Univ, Sch Civil Aviat, Xian, Peoples R China
第一机构:北京联合大学北京市信息服务工程重点实验室
通讯机构:[1]corresponding author), Beijing Union Univ, Coll Robot, Beijing, Peoples R China.|[1141739]北京联合大学机器人学院;[11417]北京联合大学;
年份:2024
外文期刊名:IET IMAGE PROCESSING
收录:;EI(收录号:20245017510927);Scopus(收录号:2-s2.0-85211371806);WOS:【SCI-EXPANDED(收录号:WOS:001374010400001)】;
基金:This work was supported by the National Key Research and Development Program (2022YFB4601104), the Academic Research Projects of Beijing Union University (Nos. ZK20202304, ZKZD202302), and the Premium Funding Project for Academic Human Resources Development in Beijing Union University (BPHR2020CZ03).
语种:英文
外文关键词:aircraft; computer vision; convolutional neural nets; feedforward neural nets; image segmentation
摘要:Compared with common urban landscape semantic segmentation, unmanned aerial vehicle (UAV) image semantic segmentation is more challenging because small targets have very low pixel percentages and multi-scale features due to the influence of flight altitude. Yet, the commonly used successive grid downsampling strategy in the current transformer-based methods omits some important features of small targets. Furthermore, due to the complex background interference, it can lead to even worse results. In reaction to this, existing strategies aim to maintain superior resolution. Nevertheless, the application of this method incurs considerable computational costs, which brings challenges for the practical applications of UAVs. So it is significant to design a novel framework to balance retaining more pixels representing small objects during downsampling and reducing computational costs. For this, the Channel Selection and the Local Attention Transformer Model (CSLFormer) are proposed. During the overlap patch embedding process of feature maps, the model allocates half of the important channels to global attention and local attention. These two types of attention focus on different aspects: one learns the relationships and importance among various patches, while the other emphasizes the features of individual patches. The method shows superior performance on two public datasets: AeroScapes and Vaihingen, achieving mean intersection over union (mIoU) of 75.57% and 78.93%, respectively. The proposed CSLFormer has been released on GitHub: .
参考文献:
正在载入数据...