首页 | 本学科首页   官方微博 | 高级检索  
     检索      

非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用
引用本文:王年华,常兴华,赵钟,张来平.非结构CFD软件MPI+OpenMP混合并行及超大规模非定常并行计算的应用[J].航空学报,2020,41(10):123859-123859.
作者姓名:王年华  常兴华  赵钟  张来平
作者单位:1. 中国空气动力研究与发展中心 空气动力学国家重点实验室, 绵阳 621000;2. 中国空气动力研究与发展中心 计算空气动力研究所, 绵阳 621000
基金项目:国家自然科学基金;国家重点研发计划
摘    要:常规工程应用中,非定常数值模拟(如多体分离)的计算量十分巨大,如果为了达到更高的计算精度,加密网格或者采用高精度方法将会使得计算量进一步增大,导致非定常数值模拟在CFD工程应用中成为十分耗时和昂贵的工作,因此,提高非定常数值模拟的可扩展性和计算效率十分必要。为充分发挥既有分布内存又有共享内存的多核处理器的性能和效率优势,对作者团队开发的非结构网格二阶精度有限体积CFD软件(HyperFLOW)进行了混合并行改造,在计算节点间采用MPI消息传递机制,在节点内采用OpenMP共享内存的MPI+OpenMP混合并行策略。首先分别实现了两种粒度(粗粒度和细粒度)的混合并行,并基于国产in-house集群采用CRM标模(约4 000万网格单元)定常湍流算例对两种混合并行模式进行了测试和比较。结果表明,粗粒度在进程数和分区数较少的小规模并行时具有效率优势,16线程时效率较高;而细粒度混合并行在大规模并行计算时具有优势,8线程时效率较高。其次,验证了混合并行在非定常计算情况下的可扩展性,采用机翼外挂物投放标模算例,分别生成3.6亿和28.8亿非结构重叠网格,采用对等的(P2P)网格读入模式和优化的重叠网格隐式装配策略,网格读入和重叠网格装配耗时仅需数十秒;采用3.6亿网格,完成了非定常状态效率测试及非定常分离过程的湍流流场计算,在in-house集群上12 288核并行效率达到90%(以768核为基准),在天河2号上12 288核并行效率达到70%(以384核为基准),数值模拟结果与试验结果符合良好。最后,在in-house集群上采用28.8亿非结构重叠网格进行了4.9万核的并行效率测试,结果显示,4.9万核并行效率达到55.3%(以4 096核为基准)。

关 键 词:MPI+OpenMP混合并行  并行效率  计算流体力学  重叠网格  非定常计算  
收稿时间:2020-02-02
修稿时间:2020-03-10

Implementation of hybrid MPI + OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations
WANG Nianhua,CHANG Xinghua,ZHAO Zhong,ZHANG Laiping.Implementation of hybrid MPI + OpenMP parallelization on unstructured CFD solver and its applications in massive unsteady simulations[J].Acta Aeronautica et Astronautica Sinica,2020,41(10):123859-123859.
Authors:WANG Nianhua  CHANG Xinghua  ZHAO Zhong  ZHANG Laiping
Institution:1. State Key Laboratory of Aerodynamics, China Aerodynamics Research and Development Center, Mianyang 621000, China;2. Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China
Abstract:In conventional engineering applications, the computational cost of unsteady flow simulation such as store separation is massive, and becomes even larger if higher accuracy is desired via refining grids or adopting higher order methods. Consequently, unsteady flow simulation is both time-consuming and expensive in CFD engineering applications. Therefore, it is necessary to improve the scalability and efficiency of unsteady flow simulation. To achieve the potential of multi-core CPU processors with both distributed and shared memories, Message Passing Interface (MPI) and OpenMP are adopted for inter-node communication and intra-node shared memory, respectively. This paper firstly implements the MPI+OpenMP hybrid parallelization, both coarse-grain and fine-grain, in our in-house code HyperFLOW. The Common Research Model (CRM) with about 40 million unstructured grid cells is employed to test the implementation on an in-house cluster. The results show that coarse-grain hybrid parallelization is superior at small scales and reaches the highest efficiency at 16 threads, whereas fine-grain is more suitable for large scale parallelization and reaches the highest efficiency at 8 threads. In addition, unstructured overset grids with 0.36 billion cells and 2.88 billion cells are generated for the wing store separation standard model. It only takes dozens of seconds to read the massive grids and complete the overset grids assembly by adopting the P2P (peer to peer) grid reading mode and the optimized overset implicit assembly method. The unsteady store separation process is simulated and parallel efficiency is calculated. The parallel efficiency of 12 288 cores is 90% (based on 768 cores) on the in-house cluster and 70% (based on 384 cores) on the Tianhe 2 supercomputer when 0.36 billion cells are used. The numerical 6 DOF (degree of freedom) results agree well with the experimental data. Finally, for the grid with 2.88 billion cells, parallel efficiency tests are conducted with 4.9×104 CPU cores on the in-house cluster, and the results show that the parallel efficiency reaches 55.3% (based on 4 096 cores).
Keywords:MPI+OpenMP hybrid parallelization  parallel efficiency  computational fluid dynamics  overset grids  unsteady simulation  
本文献已被 万方数据 等数据库收录!
点击此处可从《航空学报》浏览原始摘要信息
点击此处可从《航空学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号