首页 | 官方网站   微博 | 高级检索  
     

在轨高效目标检测加速技术
引用本文:呼延烺,李映,蒋冬梅,张艳宁,周诠,魏佳圆,刘娟妮.在轨高效目标检测加速技术[J].宇航学报,2022,43(11):1544-1556.
作者姓名:呼延烺  李映  蒋冬梅  张艳宁  周诠  魏佳圆  刘娟妮
作者单位:1. 西北工业大学计算机学院,西安 710129; 2. 空间微波技术国家级重点实验室,西安 710100
基金项目:国家重点研发计划(2020YFB1808003);中国航天科技集团公司创新基金(Y20 JTKJCX 02);国家重点实验室基金(6142411432107);中国空间技术研究院人才基金(Y19 RCWYZ 04)
摘    要:针对深度卷积网络目标检测算法参数量大、计算量大以及受星上计算资源、存储资源及功耗的限制,难以实现在轨部署的问题,提出了一种在轨高效目标检测算法加速框架与实现方法。首先,设计了一种可以同时兼容三种卷积算子的计算引擎,有效提高了资源利用率;其次,从通道和卷积核两个维度将目标检测算法模型展开,实现了加速器的高度并行化和可扩展性;最后,在多种FPGA平台上实现了该加速器并对其性能进行了评估。实验结果表明:所提出的加速器计算性能可以达到1843.2 GFLOPs(每秒千兆次浮点运算),推理时间为0.22 ms。与同类加速器方案相比,所提出的加速器框架在性能、功耗、能效比及推理时间方面具有很大优势,适合部署在资源受限环境中,具有良好的星上应用前景和价值。

关 键 词:目标检测  模型量化  模型加速  计算强度  卷积神经网络  
收稿时间:2022-03-23

Efficient Acceleration Technology for On board Object Detection
HUYAN Lang,LI Ying,JIANG Dongmei,ZHANG Yanning,ZHOU Quan,WEI Jiayuan,LIU Juanni.Efficient Acceleration Technology for On board Object Detection[J].Journal of Astronautics,2022,43(11):1544-1556.
Authors:HUYAN Lang  LI Ying  JIANG Dongmei  ZHANG Yanning  ZHOU Quan  WEI Jiayuan  LIU Juanni
Affiliation:1. School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China;  2. National Key Laboratory of Science and Technology on Space Microwave, Xi’an 710100, China
Abstract:To solve the problem that deep convolutional neural network object detection algorithms are difficult to deploy on board due to their large number of parameters, large computation, limitations of onboard computing resources, storage resources, and power consumption, an efficient on board object detection algorithm acceleration framework and implementation method are proposed. First of all, a computing engine that can be compatible with three convolutional operators is designed, which effectively improves resource utilization. Secondly, the object detection algorithm model is expanded from the two dimensions of channel and convolution kernel, which realizes the high parallelization and scalability of the accelerator. Finally, the accelerator was implemented on multiple FPGA platforms and its performance was evaluated. Experimental results show that the proposed FPGA based accelerator can achieve up to 1843.2 GFLOPs throughput, and the inference time is 0.22 ms. Compared with accelerators proposed in related literature, the accelerator proposed in this paper has great advantages in terms of performance, power consumption, energy efficiency ratio, and inference time. It is suitable for deployment in resource constrained environments and has good application prospects and values on satellites.
Keywords:Object detection  Model quantization  Model acceleration  Computational intensity  Convolutional neural networks  
点击此处可从《宇航学报》浏览原始摘要信息
点击此处可从《宇航学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号