基于启发式遗传算法的模糊测试样本集优化方案

王志华; 王浩帆; 程漫漫

doi:10.13700/j.bh.1001-5965.2020.0422

基于启发式遗传算法的模糊测试样本集优化方案

doi: 10.13700/j.bh.1001-5965.2020.0422

郑州大学网络空间安全学院, 郑州 450002

基金项目:

河南省科技攻关项目 212102210408

河南省高等学校重点科研项目计划 22A520041

详细信息

通讯作者:
王志华, E-mail: zhwang@zzu.edu.cn

中图分类号: TP399
计量
- 文章访问数: 435
- HTML全文浏览量: 49
- PDF下载量: 72
- 被引次数: 0
出版历程
- 收稿日期: 2020-08-12
- 录用日期: 2020-09-25
- 网络出版日期: 2022-02-20

Fuzzing testing sample set optimization scheme based on heuristic genetic algorithm

School of Cyberspace Security, Zhengzhou University, Zhengzhou 450002, China

Funds:

Henan Provincial Department of Science and Technology Project 212102210408

Henan Provincial Key Scientific Research Project 22A520041

More Information

Corresponding author: WANG Zhihua, E-mail: zhwang@zzu.edu.cn

摘要

摘要:
模糊测试作为当前最有效的漏洞挖掘方法，不仅比其他漏洞挖掘技术更能应对复杂的程序，而且可扩展性很强。在数据量相对较大的测试中，模糊测试输入样本集存在质量低、冗余性高和可用性弱等问题。因此，对模糊测试输入样本集进行研究，提出了启发式遗传算法，借助0-1矩阵，通过启发式遗传算法对样本的执行路径进行选取和压缩，从而获得优化后兼顾样本质量的样本集最小样本集合，进而加快模糊测试的效率。实验结果表明：在没有损失的情况下，样本集精简后模糊测试的时间比精简前降低了22%，压缩率相比传统方案提升约40%。
- 漏洞挖掘 /
- 模糊测试 /
- 集合覆盖 /
- 遗传算法 /
- 样本集
Abstract:
As the most effective method of vulnerability mining at present, fuzzy testing not only is more capable of dealing with complex programs than other vulnerability mining techniques, but also has strong scalability. In the fuzzy testing with a large number of data, the input sample set has the problems of low quality, high redundancy and weak availability. Therefore, we study the input sample set of fuzzy testing, and propose a heuristic genetic algorithm. With the help of the 0-1 matrix, the execution path of the sample is selected and compressed through the heuristic genetic algorithm, so as to obtain the smallest sample set that takes into account the sample quality after optimization, thereby speeding up the efficiency of fuzzy testing. The experimental results show that, without loss, the fuzzy testing time after the sample set is simplified is reduced by 22% compared with that before the sample set is simplified, and the compression rate is increased by about 40% compared with the traditional scheme.
- vulnerability mining /
- fuzzy testing /
- set coverage /
- genetic algorithm /
- sample set

HTML全文

图 1 不同变异率选取染色体时间折线图

Figure 1. Broken line graph of chromosome selection time with different mutation rates

下载: 全尺寸图片幻灯片

图 2 测试用例个数与样本数量

Figure 2. Number of test cases and number of samples

下载: 全尺寸图片幻灯片

图 3 模糊测试时间与样本数量

Figure 3. Fuzzy testing time and sample size

下载: 全尺寸图片幻灯片

图 4 不同算法产生相同精简样本集所需时间对比

Figure 4. Comparison of time required for different algorithms to produce the same simplified sample set

下载: 全尺寸图片幻灯片

表 1 样本集数量

Table 1. Number of sample sets

样本集合名	初始样本集数量	精简样本集数量	压缩率/%
jpg1	1 000	409	59
jpg2	3 000	409	86
jpg3	2 000	706	65
jpg4	4 000	1 332	67

下载: 导出CSV

表 2 模糊测试时间

Table 2. Fuzzy testing time

初始样本集数	初始样本集测试用例个数	初始样本集模糊测试时间/min	精简样本集数	精简样本集测试用例个数	精简样本集模糊测试时间/min
1 000	150 538	2 145	409	61 574	1 984
3 000	469 614	6 568	409	64 217	5 656
2 000	323 846	4 371	706	114 312	3 944
4 000	626 024	8 423	1 332	206 645	6 443

下载: 导出CSV

表 3 测试用例数据

Table 3. Test case data

样本集合名	未精简样本集产生的测试用例个数	未精简样本集产生的标记测试用例个数	未精简代码覆盖率/%	精简样本集产生的测试用例个数	精简样本集产生的标记测试用例个数	精简代码覆盖率/%
Test1	3 075	2 356	37	2 954	2 879	37
Test2	5 974	4 517	32	5 804	5 691	32

下载: 导出CSV

表 4 多种方案测试样本数量和覆盖率

Table 4. Test sample size and coverage rate for multiple schemes

软件名称	样本数量				代码覆盖率，压缩率/%
软件名称	初始样本	文献[3]方案	文献[15]方案	本文优化方案	初始样本	文献[3]方案	文献[15]方案	本文优化方案
mspaint	10 000	4 198	6 158	3 684	37, 0	37, 58	37, 38	37, 63
pdfium	10 000	4 517	7 141	5 804	25, 0	25, 54	25, 28	25, 41
VLC Player	10 000	3 759	4 106	3 406	29, 0	29, 62	29, 58	29, 65

下载: 导出CSV

参考文献(15)

[1]	邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化: 软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018, 58(12): 1079-1094. ZOU Q C, ZHANG T, WU R P, et al. From automation to intelligence: Progress in software vulnerability mining technology[J]. Journal of Tsinghua University (Natural Science Edition), 2015, 58(12): 1079-1094(in Chinese).
[2]	张雄, 李舟军. 模糊测试技术研究综述[J]. 计算机科学, 2016, 43(5): 1-8. ZHANG X, LI Z J. Research review of fuzzy testing technology[J]. Computer Science, 2016, 43(5): 1-8(in Chinese).
[3]	马金鑫, 张涛, 李舟军, 等. Fuzzing过程中的若干优化方法[J]. 清华大学学报(自然科学版), 2016, 56(5): 478-483. MA J X, ZHANG T, LI Z J, et al. Optimization methods in the Fuzzing process[J]. Journal of Tsinghua University (Natural Science Edition), 2016, 56(5): 478-483(in Chinese).
[4]	BHME M, PHAM V T, ROYCHOUDHURY A. Coverage-based greybox fuzzing as Markov chain[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2016: 1032-1043.
[5]	BHME M, PHAM V T, NGUYEN M D, et al. Directed greybox fuzzing[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 2329-2344.
[6]	WANG S, NAM J, TAN L. QTEP: Quality-aware test case prioritization[C]//Proceedings of the 201711th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017: 523-534.
[7]	唐枭. 基于动态污点分析的反馈式模糊测试改进方法[J]. 信息安全研究, 2019, 5(2): 145-151. doi: 10.3969/j.issn.2096-1057.2019.02.006 TANG X. Improved method of feedback fuzzy test based on dynamic stain analysis[J]. Information Security Research, 2019, 5(2): 145-151(in Chinese). doi: 10.3969/j.issn.2096-1057.2019.02.006
[8]	冯智莉, 易国洪, 李普山, 等. 并行化遗传算法研究综述[J]. 计算机应用与软件, 2018, 35(11): 1-7. FENG Z L, YI G H, LI P S, et al. Research review of parallel genetic algorithm[J]. Computer Applications and Software, 2018, 35(11): 1-7(in Chinese).
[9]	林耿, 关健. 自适应memetic算法求解集合覆盖问题[J]. 浙江大学学报(理学版), 2016, 43(2): 168-174. LIN G, GUAN J. Adaptive memetic algorithm for solving set coverage problem[J]. Journal of Zhejiang University (Science Edition), 2016, 43(2): 168-174(in Chinese).
[10]	ALYAHYA K, ROWE J E. Landscape analysis of a class of NP-hard binary packing problems[J]. Evolutionary Computation, 2019, 27(1): 47-73. doi: 10.1162/evco_a_00237
[11]	赵颖闻, 王峻, 郭茂祖, 等. 基于0-1矩阵分解的蛋白质功能预测[J]. 中国科学: 信息科学, 2019, 49(9): 1159-1174. ZHAO Y W, WANG J, GUO M Z, et al. Prediction of protein function based on 0-1 matrix decomposition[J]. Scientia Sinica: Information Science, 2019, 49(9): 1159-1174(in Chinese).
[12]	张瑜, 娄卉芳, 文良浩, 等. 一种改进的遗传算法交叉策略[J]. 湖南科技大学学报(自然科学版), 2012, 27(1): 94-97. ZHANG Y, LOU H F, WEN L H, et al. An improved genetic algorithm crossover strategy[J]. Journal of Hunan University of Science and Technology (Natural Science Edition), 2012, 27(1): 94-97(in Chinese).
[13]	王春阳, 赵玉庆, 谢金兴, 等. 遗传算法变异算子的改进[J]. 山东农业大学学报(自然科学版), 2019, 50(5): 898-901. doi: 10.3969/j.issn.1000-2324.2019.05.036 WANG C Y, ZHAO Y Q, XIE J X, et al. Improvement of genetic algorithm mutation operator[J]. Journal of Shandong Agricultural University (Natural Science Edition), 2019, 50(5): 898-901(in Chinese). doi: 10.3969/j.issn.1000-2324.2019.05.036
[14]	伊胜伟, 张翀斌, 谢丰, 等. 基于Peach的工业控制网络协议安全分析[J]. 清华大学学报(自然科学版), 2017, 57(1): 50-54. YI S W, ZHANG C B, XIE F, et al. Security analysis of industrial control network protocol based on Peach[J]. Journal of Tsinghua University (Natural Science Edition), 2017, 57(1): 50-54(in Chinese).
[15]	李谦, 刘嘉勇. 基于Fuzzing技术的样本集精简优化研究[J]. 网络安全技术与应用, 2017(1): 62-64. doi: 10.3969/j.issn.1009-6833.2017.01.039 LI Q, LIU J Y. Research on simplification and optimization of sample set based on Fuzzing technology[J]. Network Security Technology and Application, 2017(1): 62-64(in Chinese). doi: 10.3969/j.issn.1009-6833.2017.01.039