-
摘要:
针对烟雾因半透明、形状不规则和边界模糊造成分割困难的问题,提出了基于注意力机制的长距离信息建模方法,以提取长距离像素间的依赖和连续性关系。通过注意力机制作用原理,解决孤立小块区域误分类问题,减少非连续区域的烟雾误判。为避免注意力网络大尺寸矩阵运算造成的内存和计算负担,对空间和通道2种注意力方式进行改进,分别设计了双向定位空间注意力(BDA)模块和多尺度通道注意力(MSCA)融合模块,弥补现有注意力全局池化操作导致的大量空间信息丢失。将所提注意力模块和残差深度网络合并,构建面向图像烟雾分割的全局烟雾注意网络,在尽可能不丢失全局信息相关性的同时减少内存消耗。实验结果表明:所提网络在DS01、DS02、DS03合成烟雾测试集上,取得的平均交并比分别为73.13%、73.81%、74.25%,总体上优于对比算法。
Abstract:Smoke has the characteristics of semi-transparency, irregularity and blurry boundaries, leading to the challenging task of image smoke segmentation. To solve these problems, we propose an attention modeling method to extract the correlation of long-distance information. The attention method can capture the long-distance dependency of pixels and continuity of regions, so as to reduce the misclassification of discontinuous smoke regions. To avoid large memory consumption of large matrix multiplication and high computational complexity, we modify both spatial and channel attention structures to design a bi-direction attention (BDA) and a multi-scale channel attention (MSCA), which are used to compensate for lost spatial information by global pooling in attention methods. In addition, we propose a global smoke attention network, which combines residual networks with attention models to reduce memory consumption and computational complexity without sacrificing global correlation information. Experimental results show that the proposed network achieves the mean intersection over union of 73.13%, 73.81% and 74.25% on the three virtual smoke test datasets of DS01, DS02 and DS03, respectively, and it outperforms most of the existing state-of-the-art methods.
-
表 1 不同算法对比结果
Table 1. Comparison for different algorithms
算法 mIoU/% DS01 DS02 DS03 FCN-8S[27] 64.03 63.28 64.38 SegNet[28] 56.94 56.77 57.18 SMD[29] 62.88 61.50 62.09 TBFCN[7] 66.67 65.85 66.20 DeepLab v1[30] 68.41 68.97 68.71 ESPNet[31] 61.85 61.90 62.77 LRN[32] 66.43 67.71 67.46 DSS[4] 71.04 70.01 69.81 HG-Net2[33] 63.58 62.40 63.61 HG-Net8[33] 63.85 63.27 64.46 W-Net[5] 73.06 73.97 73.36 本文 73.13 73.81 74.25 表 2 剥离实验效果
Table 2. Ablation experimental results
网络结构变体 mIoU/% DS01 DS02 DS03 ResNet+BDA 71.61 72.45 72.89 ResNet+MSCA 70.12 71.79 72.11 ResNet+MSCA串联BDA 72.49 73.26 73.98 ResNet+MSCA并联BDA (本文方法) 73.13 73.81 74.25 -
[1] 夏雪, 袁非牛, 章琳, 等. 从传统到深度: 视觉烟雾识别、检测与分割[J]. 中国图象图形学报, 2019, 24(10): 1627-1647. doi: 10.11834/jig.190230XIA X, YUAN F N, ZHANG L, et al. From traditional methods to deep ones: Review of visual smoke recognition, detection, and segmentation[J]. Journal of Image and Graphics, 2019, 24(10): 1627-1647(in Chinese). doi: 10.11834/jig.190230 [2] 金博. 森林防火: 全国森林火灾分月统计(2017)[M]//国家林业和草原局. 中国林业年鉴(2018). 北京: 中国林业出版社, 2018: 138.JIN B. Forest fire prevention forest fire by months(2017)[M]// State Forestry and Grassland Administration. China forestry yearbook(2018). Beijing: China Forestry Publishing House, 2018: 138(in Chinese). [3] YUAN F N, ZHANG L, WAN B Y, et al. Convolutional neural networks based on multi-scale additive merging layers for visual smoke recognition[J]. Machine Vision and Applications, 2019, 30(2): 345-358. doi: 10.1007/s00138-018-0990-3 [4] YUAN F N, ZHANG L, XIA X, et al. Deep smoke segmentation[J]. Neurocomputing, 2019, 357: 248-260. doi: 10.1016/j.neucom.2019.05.011 [5] YUAN F N, ZHANG L, XIA X, et al. A wave shaped deep neural network for smoke density estimation[J]. IEEE Transactions on Image Processing, 2020, 29: 2301-2313. doi: 10.1109/TIP.2019.2946126 [6] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-12-05)[2021-09-01]. https://arxiv.org/abs/1706.05587. [7] ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4159-4167. [8] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241. [9] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York: ACM, 2014: 2204-2212. [10] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803. [11] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. [12] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 3-19. [13] PARK J, WOO S, LEE J Y, et al. BAM: Bottleneck attention module[EB/OL]. (2018-07-18)[2021-09-01]. https://arxiv.org/abs/1807.06514v2. [14] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3146-3154. [15] YUAN Y, HUANG L, GUO J, et al. OCNet: Object context network for scene parsing[EB/OL]. (2021-03-15)[2021-09-01]. https://arxiv.org/abs/1809.00916v4. [16] HUANG Z, WANG X, HUANG L, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612. [17] HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[EB/OL]. (2021-03-04)[2021-09-01]. https://arxiv.org/abs/2103.02907v1. [18] 张娜, 王慧琴, 胡燕. 粗糙集与区域生长的烟雾图像分割算法研究[J]. 计算机科学与探索, 2017, 11(8): 1296-1304. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS201708012.htmZHANG N, WANG H Q, HU Y. Smoke image segmentation algorithm based on rough set and region growing[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(8): 1296-1304(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS201708012.htm [19] LIN G H, ZHANG Y M, ZHANG Q X, et al. Smoke detection in video sequences based on dynamic texture using volume local binary patterns[J]. KSⅡ Transactions on Internet and Information Systems, 2017, 11(11): 5522-5536. [20] FILONENKO A, HERNÁNDEZ D C, JO K H. Fast smoke detection for video surveillance using CUDA[J]. IEEE Transactions on Industrial Informatics, 2018, 14(2): 725-733. doi: 10.1109/TII.2017.2757457 [21] TAO C Y, ZHANG J, WANG P. Smoke detection based on deep convolutional neural networks[C]//Proceedings of 2016 International Conference on Industrial Informatics—Computing Technology, Intelligent Technology, Industrial Information Integration. Piscataway: IEEE Press, 2016: 150-153. [22] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. New York: ACM, 2012: 1097-1105. [23] YIN Z J, WAN B Y, YUAN F N, et al. A deep normalization and convolutional neural network for image smoke detection[J]. IEEE Access, 2017, 5: 18429-18438. doi: 10.1109/ACCESS.2017.2747399 [24] YUAN F, ZHANG L, XIA X, et al. A gated recurrent network with dual classification assistance for smoke semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 4409-4422. doi: 10.1109/TIP.2021.3069318 [25] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. [26] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. doi: 10.1109/TPAMI.2015.2389824 [27] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440. [28] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615 [29] WANG W, SHEN J, SHAO L. Video salient object detection via fully convolutional networks[J]. IEEE Transactions on Image Processing, 2018, 27(1): 38-49. doi: 10.1109/TIP.2017.2754941 [30] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184 [31] MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 552-568. [32] ISLAM M A, NAHA S, ROCHAN M, et al. Label refinement network for coarse-to-fine semantic segmentation[EB/OL]. (2017-03-01)[2021-09-01]. https://arxiv.org/abs/1703.00551. [33] NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the European Conference on Computer Vision(ECCV). Berlin: Springer, 2016: 483-499.