首页 | 本学科首页   官方微博 | 高级检索  
     检索      

多GPU并行可压缩流求解器及其性能分析
引用本文:赖剑奇,李桦,张冉,常青.多GPU并行可压缩流求解器及其性能分析[J].航空学报,2018,39(9):121944-121953.
作者姓名:赖剑奇  李桦  张冉  常青
作者单位:国防科技大学 空天科学学院, 长沙 410073
基金项目:国家自然科学基金(11472004)
摘    要:为实现可压缩流问题的大规模高效数值求解,开展基于图形处理单元(GPU)的并行计算研究。在NVIDIA GTX 1070上建立了基于消息传递接口+统一计算设备架构(MPI+CUDA)的多GPU并行可压缩流求解器,该求解器基于结构网格有限体积法,空间离散采用AUSM+UP格式。采用一维区域分解法对计算网格进行划分,使得各GPU之间达到负载平衡。针对超声速进气道算例,对算法单GPU并行性能和多GPU可扩展性能进行分析。数值结果显示,单GPU并行计算可以获得37~46倍的加速比,极大地提高了计算效率;4块GPU并行计算加速比从47倍增加到143倍,并行效率维持在70%以上,说明并行算法具有良好的可扩展性。

关 键 词:图形处理单元(GPU)  统一计算设备架构(CUDA)  并行计算  加速比  并行效率  
收稿时间:2017-12-19
修稿时间:2018-02-08

Multi-GPU parallel compressible flow solver and its performance analysis
LAI Jianqi,LI Hua,ZHANG Ran,CHANG Qing.Multi-GPU parallel compressible flow solver and its performance analysis[J].Acta Aeronautica et Astronautica Sinica,2018,39(9):121944-121953.
Authors:LAI Jianqi  LI Hua  ZHANG Ran  CHANG Qing
Institution:College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China
Abstract:To achieve efficient numerical solutions for large-scale compressible flow problems, Graphics Processing Units (GPU)-based parallel computing is studied. A multi-GPU parallel compressible flow solver based on Message Passing Interface + Compute Unified Device Architecture (MPI+CUDA)is built on the NVIDIA GTX 1070. This solver is applicable to structured meshes, and an upwind finite volume scheme AUSM+UP is used for spatial discretization. A one-dimensional domain decomposition method is used to divide the computational grid into the same size, so as to obtain load balancing among GPUs. According to the case of the supersonic inlet, the parallel performance of single GPU and scalability of multi-GPU are analyzed for this solver. The numerical results show that for single GPU, parallel computing can get a speedup ratio of 37 to 46 times, greatly improving computational efficiency. For four GPUs, the speedup ratio increases from 47 to 143 times and parallel efficiency maintains above 70%, demonstrating good scalability of the solver.
Keywords:Graphics Processing Units (GPU)  Compute Unified Device Architecture (CUDA)  parallel computing  speedup ratio  parallel efficiency  
本文献已被 CNKI 等数据库收录!
点击此处可从《航空学报》浏览原始摘要信息
点击此处可从《航空学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号