首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于3D-Winograd的快速卷积算法设计及FPGA实现
引用本文:林珂玉,姜宏旭,张永华,丛容子.基于3D-Winograd的快速卷积算法设计及FPGA实现[J].北京航空航天大学学报,2021,47(9):1900-1907.
作者姓名:林珂玉  姜宏旭  张永华  丛容子
作者单位:北京航空航天大学 数字媒体北京市重点实验室, 北京 100083
基金项目:航天科学技术基金190109国家自然科学基金61872017
摘    要:近年来,卷积神经网络(CNN)已被计算机视觉任务广泛采用。由于FPGA的高性能、能效和可重新配置性,已被认为是最有前途的CNN硬件加速器,但是受FPGA计算能力、存储资源的限制,基于传统Winograd算法计算三维卷积的FPGA解决方案性能还有提升的空间。首先,研究了适用于三维运算的Winograd算法一维展开过程;然后,通过增加一次性输入特征图和卷积块的维度大小、低比特量化权重和输入数据等方法改善CNN在FPGA上的运行性能。优化思路包括使用移位代替部分除法的方法、分tile方案、二维到三维扩展及低比特量化等4个部分。相对传统的二维Winograd算法,优化算法每个卷积层的时钟周期数减少了7倍左右,相较传统滑窗卷积算法平均每个卷积层减少7倍左右。通过研究,证明了基于一维展开的3D-Winograd算法可以大大减少运算复杂度,并改善在FPGA运行CNN的性能。 

关 键 词:卷积神经网络(CNN)    FPGA    Winograd    卷积算法    快速算法
收稿时间:2020-07-03

Design and FPGA implementation of fast convolution algorithm based on 3 D-Winograd
LIN Keyu,JIANG Hongxu,ZHANG Yonghua,CONG Rongzi.Design and FPGA implementation of fast convolution algorithm based on 3 D-Winograd[J].Journal of Beijing University of Aeronautics and Astronautics,2021,47(9):1900-1907.
Authors:LIN Keyu  JIANG Hongxu  ZHANG Yonghua  CONG Rongzi
Institution:Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100083, China
Abstract:In recent years, Convolutional Neural Networks (CNNs) have been widely adopted by computer vision tasks. Due to the high performance, energy efficiency, and reconfigurability of FPGA, it has been considered as the most promising CNN hardware accelerator. However, the existing FPGA solutions based on the traditional Winograd method are usually limited by FPGA computing power and storage resources, and there is room for improvement in performance of 3D convolution operations. This paper first studied the one-dimensional expansion process of the Winograd algorithm suitable for three-dimensional operations; then, improved the performance of CNN on FPGA by increasing the one-time input feature map and the dimensional size of the convolution block, low-bit quantization weight and input data. The optimization ideas include four parts: the method of using shift instead of partial division, the division of tiles, the expansion of two-dimensional to three-dimensional, and low-bit quantization. Compared with the traditional two-dimensional Winograd algorithm, the number of clock cycles of each convolutional layer of the optimized algorithm is reduced by about 7 times, which is about 7 times less for each convolutional layer than the traditional sliding window convolution algorithm. Through the research, it is proved that the 3D-Winograd algorithm based on one-dimensional expansion can greatly reduce the computational complexity and improve the performance of running CNN on FPGA. 
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号