详细信息
QTMS: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared HPC system ( SCI-EXPANDED收录 EI收录)
文献类型:期刊文献
英文题名:QTMS: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared HPC system
作者:Yan, Baicheng[1];Xiao, Limin[1];Qin, Guangjun[2];Yang, Zhang[3];Dong, Bin[4];Yu, Haonan[1];Wu, Hongyu[1]
第一作者:Yan, Baicheng
通讯作者:Xiao, LM[1];Qin, GJ[2]
机构:[1]Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China;[2]Beijing Union Univ, Smart City Coll, Beijing 100101, Peoples R China;[3]Inst Appl Phys & Computat Math, 2 East Fenghao Rd, Beijing 100094, Peoples R China;[4]Lawrence Berkeley Natl Lab, One Cyclotron Rd, Berkeley, CA 94720 USA
第一机构:Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
通讯机构:[1]corresponding author), Beihang Univ, Sch Comp Sci & Engn, Beijing 100191, Peoples R China;[2]corresponding author), Beijing Union Univ, Smart City Coll, Beijing 100101, Peoples R China.|[1141733]北京联合大学继续教育学院;[11417]北京联合大学;
年份:2020
卷号:94-95
外文期刊名:PARALLEL COMPUTING
收录:;EI(收录号:20202008643968);Scopus(收录号:2-s2.0-85084333222);WOS:【SCI-EXPANDED(收录号:WOS:000532790500003)】;
基金:The work presented in this study was supported by the National Key Research and Development Program of China under Grant No. 2017YFB1010000, the National Natural Science Foundation of China under Grant No. 61772053, Science Challenge Project, No. TZ2016002. The final version has benefited greatly from the many detailed comments and suggestions from the selfless reviewers. The authors heartily acknowledge these comments and suggestions.
语种:英文
外文关键词:Topology-aware process mapping; Communication optimization; Shared HPC system
摘要:Communication exacerbates the performance for parallel applications with thousands of CPU cores and quantities of data to exchange. The high communication cost is usually attributed to the mismatch between the communication patterns of parallel applications and the physical topology graphs of the computing resources (or the underlying network topologies). The topology-aware process mapping method can usually obtain a better embedding scheme with the aim to improve communication performance. Many existing heuristic-search based mapping methods have high execution time for large-scale applications. Some low-cost graph-partitioning based mapping methods depend on that the allocated resources form a regular structure, which is usually impractical in most high performance computing systems shared by multiple users and applications. This weakens their performance. Other graph-partitioning based mapping methods come at a high cost or require users to provide the network structure information. To address these issues, a quadratic time complexity topology-aware process mapping method is presented in this paper. The experimental results show that the proposed method often achieves a better application communication performance than several state-of-the-art mapping methods on a shared HPC system, while maintaining a significantly lower execution cost. Moreover, the real-world scientific application proxies gain an execution time reduction as large as 14.60% in the 512 process-scale compared to the system default process placement on the TianHe-2 HPC systems. (C) 2020 Elsevier B.V. All rights reserved.
参考文献:
正在载入数据...