多GPU并行可压缩流求解器及其性能分析 Multi-GPU parallel compressible flow solver and its performance analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

多GPU并行可压缩流求解器及其性能分析

引用本文：	赖剑奇,李桦,张冉,常青.多GPU并行可压缩流求解器及其性能分析[J].航空学报,2018,39(9):121944-121953.

作者姓名：	赖剑奇李桦张冉常青

作者单位：	国防科技大学空天科学学院, 长沙 410073

基金项目：	国家自然科学基金（11472004）

摘要：	为实现可压缩流问题的大规模高效数值求解，开展基于图形处理单元（GPU）的并行计算研究。在NVIDIA GTX 1070上建立了基于消息传递接口+统一计算设备架构（MPI+CUDA）的多GPU并行可压缩流求解器，该求解器基于结构网格有限体积法，空间离散采用AUSM⁺UP格式。采用一维区域分解法对计算网格进行划分，使得各GPU之间达到负载平衡。针对超声速进气道算例，对算法单GPU并行性能和多GPU可扩展性能进行分析。数值结果显示，单GPU并行计算可以获得37~46倍的加速比，极大地提高了计算效率；4块GPU并行计算加速比从47倍增加到143倍，并行效率维持在70%以上，说明并行算法具有良好的可扩展性。
关键词：	图形处理单元(GPU) 统一计算设备架构(CUDA) 并行计算加速比并行效率
收稿时间：	2017-12-19
修稿时间：	2018-02-08
Multi-GPU parallel compressible flow solver and its performance analysis

LAI Jianqi,LI Hua,ZHANG Ran,CHANG Qing.Multi-GPU parallel compressible flow solver and its performance analysis[J].Acta Aeronautica et Astronautica Sinica,2018,39(9):121944-121953.

Authors:	LAI Jianqi LI Hua ZHANG Ran CHANG Qing

Institution:	College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China

Abstract:	To achieve efficient numerical solutions for large-scale compressible flow problems, Graphics Processing Units (GPU)-based parallel computing is studied. A multi-GPU parallel compressible flow solver based on Message Passing Interface + Compute Unified Device Architecture (MPI+CUDA)is built on the NVIDIA GTX 1070. This solver is applicable to structured meshes, and an upwind finite volume scheme AUSM⁺UP is used for spatial discretization. A one-dimensional domain decomposition method is used to divide the computational grid into the same size, so as to obtain load balancing among GPUs. According to the case of the supersonic inlet, the parallel performance of single GPU and scalability of multi-GPU are analyzed for this solver. The numerical results show that for single GPU, parallel computing can get a speedup ratio of 37 to 46 times, greatly improving computational efficiency. For four GPUs, the speedup ratio increases from 47 to 143 times and parallel efficiency maintains above 70%, demonstrating good scalability of the solver.

Keywords:	Graphics Processing Units (GPU) Compute Unified Device Architecture (CUDA) parallel computing speedup ratio parallel efficiency
本文献已被 CNKI 等数据库收录！
	点击此处可从《航空学报》浏览原始摘要信息
	点击此处可从《航空学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏