基于改进空间通道信息的全局烟雾注意网络

董泽舒; 袁非牛; 夏雪

doi:10.13700/j.bh.1001-5965.2021.0549

基于改进空间通道信息的全局烟雾注意网络

doi: 10.13700/j.bh.1001-5965.2021.0549

1.
上海师范大学信息与机电工程学院, 上海 201418
2.
江西财经大学信息管理学院, 南昌 330032

基金项目:

国家自然科学基金 61862029

国家自然科学基金 62062038

江西省教育厅课题 GJJ201117

详细信息

通讯作者:
袁非牛, E-mail: yfn@ustc.edu

中图分类号: TP391
计量
- 文章访问数: 252
- HTML全文浏览量: 128
- PDF下载量: 24
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-14
- 录用日期: 2021-10-01
- 网络出版日期: 2021-10-28
- 整期出版日期: 2022-08-20

Improved spatial and channel information based global smoke attention network

1.
College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China
2.
School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330032, China

Funds:

National Natural Science Foundation of China 61862029

National Natural Science Foundation of China 62062038

Project of Education Department of Jiangxi Province GJJ201117

More Information

Corresponding author: YUAN Feiniu, E-mail: yfn@ustc.edu

摘要

摘要:
针对烟雾因半透明、形状不规则和边界模糊造成分割困难的问题，提出了基于注意力机制的长距离信息建模方法，以提取长距离像素间的依赖和连续性关系。通过注意力机制作用原理，解决孤立小块区域误分类问题，减少非连续区域的烟雾误判。为避免注意力网络大尺寸矩阵运算造成的内存和计算负担，对空间和通道2种注意力方式进行改进，分别设计了双向定位空间注意力(BDA)模块和多尺度通道注意力(MSCA)融合模块，弥补现有注意力全局池化操作导致的大量空间信息丢失。将所提注意力模块和残差深度网络合并，构建面向图像烟雾分割的全局烟雾注意网络，在尽可能不丢失全局信息相关性的同时减少内存消耗。实验结果表明：所提网络在DS01、DS02、DS03合成烟雾测试集上，取得的平均交并比分别为73.13%、73.81%、74.25%，总体上优于对比算法。
- 烟雾分割 /
- 双向定位 /
- 空间注意力 /
- 多尺度融合 /
- 通道注意力
Abstract:
Smoke has the characteristics of semi-transparency, irregularity and blurry boundaries, leading to the challenging task of image smoke segmentation. To solve these problems, we propose an attention modeling method to extract the correlation of long-distance information. The attention method can capture the long-distance dependency of pixels and continuity of regions, so as to reduce the misclassification of discontinuous smoke regions. To avoid large memory consumption of large matrix multiplication and high computational complexity, we modify both spatial and channel attention structures to design a bi-direction attention (BDA) and a multi-scale channel attention (MSCA), which are used to compensate for lost spatial information by global pooling in attention methods. In addition, we propose a global smoke attention network, which combines residual networks with attention models to reduce memory consumption and computational complexity without sacrificing global correlation information. Experimental results show that the proposed network achieves the mean intersection over union of 73.13%, 73.81% and 74.25% on the three virtual smoke test datasets of DS01, DS02 and DS03, respectively, and it outperforms most of the existing state-of-the-art methods.
- smoke segmentation /
- bi-directional localization /
- spatial attention /
- multi-scale fusion /
- channel attention

HTML全文

图 1 双向注意力模块

Figure 1. Bi-direction attention model

下载: 全尺寸图片幻灯片

图 2 多尺度通道注意力融合模块

Figure 2. Multi-scale channel attention fusion model

下载: 全尺寸图片幻灯片

图 3 全局烟雾注意力网络

Figure 3. Global smoke attention network

下载: 全尺寸图片幻灯片

图 4 虚拟合成数据集图例

Figure 4. Samples from virtually synthesized datasets

下载: 全尺寸图片幻灯片

图 5 虚拟烟雾测试集分割结果

Figure 5. Segmented results of virtual smoke test datasets

下载: 全尺寸图片幻灯片

图 6 真实图像分割结果

Figure 6. Segmented results for real images

下载: 全尺寸图片幻灯片

图 7 本文方法的变体

Figure 7. Variants of the proposed method

下载: 全尺寸图片幻灯片

图 8 注意力机制加权后的特征图

Figure 8. Weighted feature maps by attention mechanism

下载: 全尺寸图片幻灯片

图 9 真实场景可视化实验结果

Figure 9. Visualized experimental results of real scenes

下载: 全尺寸图片幻灯片

表 1 不同算法对比结果

Table 1. Comparison for different algorithms

算法	mIoU/%
算法	DS01	DS02	DS03
FCN-8S^[27]	64.03	63.28	64.38
SegNet^[28]	56.94	56.77	57.18
SMD^[29]	62.88	61.50	62.09
TBFCN^[7]	66.67	65.85	66.20
DeepLab v1^[30]	68.41	68.97	68.71
ESPNet^[31]	61.85	61.90	62.77
LRN^[32]	66.43	67.71	67.46
DSS^[4]	71.04	70.01	69.81
HG-Net2^[33]	63.58	62.40	63.61
HG-Net8^[33]	63.85	63.27	64.46
W-Net^[5]	73.06	73.97	73.36
本文	73.13	73.81	74.25

下载: 导出CSV

表 2 剥离实验效果

Table 2. Ablation experimental results

网络结构变体	mIoU/%
网络结构变体	DS01	DS02	DS03
ResNet+BDA	71.61	72.45	72.89
ResNet+MSCA	70.12	71.79	72.11
ResNet+MSCA串联BDA	72.49	73.26	73.98
ResNet+MSCA并联BDA (本文方法)	73.13	73.81	74.25

下载: 导出CSV

参考文献(33)

[1]	夏雪, 袁非牛, 章琳, 等. 从传统到深度: 视觉烟雾识别、检测与分割[J]. 中国图象图形学报, 2019, 24(10): 1627-1647. doi: 10.11834/jig.190230 XIA X, YUAN F N, ZHANG L, et al. From traditional methods to deep ones: Review of visual smoke recognition, detection, and segmentation[J]. Journal of Image and Graphics, 2019, 24(10): 1627-1647(in Chinese). doi: 10.11834/jig.190230
[2]	金博. 森林防火: 全国森林火灾分月统计(2017)[M]//国家林业和草原局. 中国林业年鉴(2018). 北京: 中国林业出版社, 2018: 138. JIN B. Forest fire prevention forest fire by months(2017)[M]// State Forestry and Grassland Administration. China forestry yearbook(2018). Beijing: China Forestry Publishing House, 2018: 138(in Chinese).
[3]	YUAN F N, ZHANG L, WAN B Y, et al. Convolutional neural networks based on multi-scale additive merging layers for visual smoke recognition[J]. Machine Vision and Applications, 2019, 30(2): 345-358. doi: 10.1007/s00138-018-0990-3
[4]	YUAN F N, ZHANG L, XIA X, et al. Deep smoke segmentation[J]. Neurocomputing, 2019, 357: 248-260. doi: 10.1016/j.neucom.2019.05.011
[5]	YUAN F N, ZHANG L, XIA X, et al. A wave shaped deep neural network for smoke density estimation[J]. IEEE Transactions on Image Processing, 2020, 29: 2301-2313. doi: 10.1109/TIP.2019.2946126
[6]	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-12-05)[2021-09-01]. https://arxiv.org/abs/1706.05587.
[7]	ZHANG Z, ZHANG C, SHEN W, et al. Multi-oriented text detection with fully convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4159-4167.
[8]	RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin: Springer, 2015: 234-241.
[9]	MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York: ACM, 2014: 2204-2212.
[10]	WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7794-7803.
[11]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141.
[12]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 3-19.
[13]	PARK J, WOO S, LEE J Y, et al. BAM: Bottleneck attention module[EB/OL]. (2018-07-18)[2021-09-01]. https://arxiv.org/abs/1807.06514v2.
[14]	FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3146-3154.
[15]	YUAN Y, HUANG L, GUO J, et al. OCNet: Object context network for scene parsing[EB/OL]. (2021-03-15)[2021-09-01]. https://arxiv.org/abs/1809.00916v4.
[16]	HUANG Z, WANG X, HUANG L, et al. CCNet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 603-612.
[17]	HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design[EB/OL]. (2021-03-04)[2021-09-01]. https://arxiv.org/abs/2103.02907v1.
[18]	张娜, 王慧琴, 胡燕. 粗糙集与区域生长的烟雾图像分割算法研究[J]. 计算机科学与探索, 2017, 11(8): 1296-1304. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS201708012.htm ZHANG N, WANG H Q, HU Y. Smoke image segmentation algorithm based on rough set and region growing[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(8): 1296-1304(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS201708012.htm
[19]	LIN G H, ZHANG Y M, ZHANG Q X, et al. Smoke detection in video sequences based on dynamic texture using volume local binary patterns[J]. KSⅡ Transactions on Internet and Information Systems, 2017, 11(11): 5522-5536.
[20]	FILONENKO A, HERNÁNDEZ D C, JO K H. Fast smoke detection for video surveillance using CUDA[J]. IEEE Transactions on Industrial Informatics, 2018, 14(2): 725-733. doi: 10.1109/TII.2017.2757457
[21]	TAO C Y, ZHANG J, WANG P. Smoke detection based on deep convolutional neural networks[C]//Proceedings of 2016 International Conference on Industrial Informatics—Computing Technology, Intelligent Technology, Industrial Information Integration. Piscataway: IEEE Press, 2016: 150-153.
[22]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. New York: ACM, 2012: 1097-1105.
[23]	YIN Z J, WAN B Y, YUAN F N, et al. A deep normalization and convolutional neural network for image smoke detection[J]. IEEE Access, 2017, 5: 18429-18438. doi: 10.1109/ACCESS.2017.2747399
[24]	YUAN F, ZHANG L, XIA X, et al. A gated recurrent network with dual classification assistance for smoke semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 4409-4422. doi: 10.1109/TIP.2021.3069318
[25]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
[26]	HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. doi: 10.1109/TPAMI.2015.2389824
[27]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
[28]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. doi: 10.1109/TPAMI.2016.2644615
[29]	WANG W, SHEN J, SHAO L. Video salient object detection via fully convolutional networks[J]. IEEE Transactions on Image Processing, 2018, 27(1): 38-49. doi: 10.1109/TIP.2017.2754941
[30]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
[31]	MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 552-568.
[32]	ISLAM M A, NAHA S, ROCHAN M, et al. Label refinement network for coarse-to-fine semantic segmentation[EB/OL]. (2017-03-01)[2021-09-01]. https://arxiv.org/abs/1703.00551.
[33]	NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the European Conference on Computer Vision(ECCV). Berlin: Springer, 2016: 483-499.