适用于空间通信的LDPC码GPU高速译码架构 High-throughput GPU-based LDPC decoder architecture forspace communication期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

适用于空间通信的LDPC码GPU高速译码架构

引用本文：	侯毅,刘荣科,彭皓,赵岭,熊庆旭.适用于空间通信的LDPC码GPU高速译码架构[J].航空学报,2017,38(1).

作者姓名：	侯毅刘荣科彭皓赵岭熊庆旭

作者单位：	北京航空航天大学电子信息工程学院,北京,100083

基金项目：	国家自然科学基金，National Natural Science Foundation of China

摘要：	鉴于目前空间通信对高速、可重配置信道译码器的需求,利用图形处理器(GPU)的并行化运算特点,提出了一种低密度奇偶校验(LDPC)码软件高速译码架构。通过优化Turbo消息传递译码(TDMP)算法节点更新运算线程块内和块间并行度、减少非规则行重造成的线程分支、降低线程对节点更新信息存储资源的访问延时以及合理量化译码器存储信息来提升译码内核函数的执行效率。并在此基础上引入异步统一计算设备构架(CUDA)流处理机制,设计优化的译码器输入输出数据传输和内核函数之间的执行调度方式以及CUDA流上的译码线程资源配置方式,最大化译码吞吐率的同时降低译码延时。在Nvidia最新的Tesla K20和GTX980平台上对国际空间数据系统咨询委员会(CCSDS)遥测标准LDPC码进行的TDMP译码实验结果表明,本架构进行10次迭代译码的吞吐率最高可达约500 Mbps,平均译码延时约为2ms左右。与现有结果相比,本架构在保持软件架构配置灵活性的同时更加有效的兼顾了译码吞吐率和延时性能。
关键词：	低密度奇偶校验码图形处理器软件译码架构 Turbo消息传递译码算法高吞吐率低延时
High-throughput GPU-based LDPC decoder architecture forspace communication

HOU Yi,LIU Rongke,PENG Hao,ZHAO Ling,XIONG Qingxu.High-throughput GPU-based LDPC decoder architecture forspace communication[J].Acta Aeronautica et Astronautica Sinica,2017,38(1).

Authors:	HOU Yi LIU Rongke PENG Hao ZHAO Ling XIONG Qingxu

Abstract:	In view of the current requirements for high-speed reconfigurable channel decoder for space communications,a high-throughput low-density parity-check(LDPC) software decoding architecture is proposed by exploiting the graphics processing units (GPU)'s parallel operating characteristics.The efficiency of the decoding kernel functions is improved by optimizing the inter-block and intra-block thread parallelism for the nodes'updating operations in software decoding architecture;turbo-decoding message passing (TDMP) algorithm,reducing the thread branch induced by the irregularity of row-weight,lowering the memory access latency for the updating information by threads,and reasonably quantizing the stored information to.The asynchronous compute unified device architecture (CUDA) stream processing mechanism,which includes designing an optimized execution scheduling between decoder's input/output data transfers and kernel functions,and setting a thread resource allocation method on CUDA streams,is also introduced to maximize the decoding throughput and at the same time reduce the decoding latency.The experimental results from the decoding simulations of the Consultative Committee for Space Data System (CCSDS) telemetry standard's LDPC codes on the Nvidia's latest Tesla K20 and GTX980 platforms demonstrate that the proposed architecture achieves about 500 Mbps maximum throughput and about 2 ms average latency by using TDMP algorithm with 10 iterations.In comparison with the existing results,the proposed architecture can improve both the decoding throughput and latency performance,and maintain the configuration flexibility of software architecture.

Keywords:	low-density parity-check codes graphics processing units software decoding architecture Turbo-decoding message passing algorithm high-throughput low latency
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏