Fuzzing testing sample set optimization scheme based on heuristic genetic algorithm
-
摘要:
模糊测试作为当前最有效的漏洞挖掘方法,不仅比其他漏洞挖掘技术更能应对复杂的程序,而且可扩展性很强。在数据量相对较大的测试中,模糊测试输入样本集存在质量低、冗余性高和可用性弱等问题。因此,对模糊测试输入样本集进行研究,提出了启发式遗传算法,借助0-1矩阵,通过启发式遗传算法对样本的执行路径进行选取和压缩,从而获得优化后兼顾样本质量的样本集最小样本集合,进而加快模糊测试的效率。实验结果表明:在没有损失的情况下,样本集精简后模糊测试的时间比精简前降低了22%,压缩率相比传统方案提升约40%。
Abstract:As the most effective method of vulnerability mining at present, fuzzy testing not only is more capable of dealing with complex programs than other vulnerability mining techniques, but also has strong scalability. In the fuzzy testing with a large number of data, the input sample set has the problems of low quality, high redundancy and weak availability. Therefore, we study the input sample set of fuzzy testing, and propose a heuristic genetic algorithm. With the help of the 0-1 matrix, the execution path of the sample is selected and compressed through the heuristic genetic algorithm, so as to obtain the smallest sample set that takes into account the sample quality after optimization, thereby speeding up the efficiency of fuzzy testing. The experimental results show that, without loss, the fuzzy testing time after the sample set is simplified is reduced by 22% compared with that before the sample set is simplified, and the compression rate is increased by about 40% compared with the traditional scheme.
-
Key words:
- vulnerability mining /
- fuzzy testing /
- set coverage /
- genetic algorithm /
- sample set
-
表 1 样本集数量
Table 1. Number of sample sets
样本集合名 初始样本集数量 精简样本集数量 压缩率/% jpg1 1 000 409 59 jpg2 3 000 409 86 jpg3 2 000 706 65 jpg4 4 000 1 332 67 表 2 模糊测试时间
Table 2. Fuzzy testing time
初始样本集数 初始样本集测试用例个数 初始样本集模糊测试时间/min 精简样本集数 精简样本集测试用例个数 精简样本集模糊测试时间/min 1 000 150 538 2 145 409 61 574 1 984 3 000 469 614 6 568 409 64 217 5 656 2 000 323 846 4 371 706 114 312 3 944 4 000 626 024 8 423 1 332 206 645 6 443 表 3 测试用例数据
Table 3. Test case data
样本集合名 未精简样本集产生的测试用例个数 未精简样本集产生的标记测试用例个数 未精简代码覆盖率/% 精简样本集产生的测试用例个数 精简样本集产生的标记测试用例个数 精简代码覆盖率/% Test1 3 075 2 356 37 2 954 2 879 37 Test2 5 974 4 517 32 5 804 5 691 32 表 4 多种方案测试样本数量和覆盖率
Table 4. Test sample size and coverage rate for multiple schemes
-
[1] 邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化: 软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018, 58(12): 1079-1094.ZOU Q C, ZHANG T, WU R P, et al. From automation to intelligence: Progress in software vulnerability mining technology[J]. Journal of Tsinghua University (Natural Science Edition), 2015, 58(12): 1079-1094(in Chinese). [2] 张雄, 李舟军. 模糊测试技术研究综述[J]. 计算机科学, 2016, 43(5): 1-8.ZHANG X, LI Z J. Research review of fuzzy testing technology[J]. Computer Science, 2016, 43(5): 1-8(in Chinese). [3] 马金鑫, 张涛, 李舟军, 等. Fuzzing过程中的若干优化方法[J]. 清华大学学报(自然科学版), 2016, 56(5): 478-483.MA J X, ZHANG T, LI Z J, et al. Optimization methods in the Fuzzing process[J]. Journal of Tsinghua University (Natural Science Edition), 2016, 56(5): 478-483(in Chinese). [4] BHME M, PHAM V T, ROYCHOUDHURY A. Coverage-based greybox fuzzing as Markov chain[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2016: 1032-1043. [5] BHME M, PHAM V T, NGUYEN M D, et al. Directed greybox fuzzing[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 2329-2344. [6] WANG S, NAM J, TAN L. QTEP: Quality-aware test case prioritization[C]//Proceedings of the 201711th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017: 523-534. [7] 唐枭. 基于动态污点分析的反馈式模糊测试改进方法[J]. 信息安全研究, 2019, 5(2): 145-151. doi: 10.3969/j.issn.2096-1057.2019.02.006TANG X. Improved method of feedback fuzzy test based on dynamic stain analysis[J]. Information Security Research, 2019, 5(2): 145-151(in Chinese). doi: 10.3969/j.issn.2096-1057.2019.02.006 [8] 冯智莉, 易国洪, 李普山, 等. 并行化遗传算法研究综述[J]. 计算机应用与软件, 2018, 35(11): 1-7.FENG Z L, YI G H, LI P S, et al. Research review of parallel genetic algorithm[J]. Computer Applications and Software, 2018, 35(11): 1-7(in Chinese). [9] 林耿, 关健. 自适应memetic算法求解集合覆盖问题[J]. 浙江大学学报(理学版), 2016, 43(2): 168-174.LIN G, GUAN J. Adaptive memetic algorithm for solving set coverage problem[J]. Journal of Zhejiang University (Science Edition), 2016, 43(2): 168-174(in Chinese). [10] ALYAHYA K, ROWE J E. Landscape analysis of a class of NP-hard binary packing problems[J]. Evolutionary Computation, 2019, 27(1): 47-73. doi: 10.1162/evco_a_00237 [11] 赵颖闻, 王峻, 郭茂祖, 等. 基于0-1矩阵分解的蛋白质功能预测[J]. 中国科学: 信息科学, 2019, 49(9): 1159-1174.ZHAO Y W, WANG J, GUO M Z, et al. Prediction of protein function based on 0-1 matrix decomposition[J]. Scientia Sinica: Information Science, 2019, 49(9): 1159-1174(in Chinese). [12] 张瑜, 娄卉芳, 文良浩, 等. 一种改进的遗传算法交叉策略[J]. 湖南科技大学学报(自然科学版), 2012, 27(1): 94-97.ZHANG Y, LOU H F, WEN L H, et al. An improved genetic algorithm crossover strategy[J]. Journal of Hunan University of Science and Technology (Natural Science Edition), 2012, 27(1): 94-97(in Chinese). [13] 王春阳, 赵玉庆, 谢金兴, 等. 遗传算法变异算子的改进[J]. 山东农业大学学报(自然科学版), 2019, 50(5): 898-901. doi: 10.3969/j.issn.1000-2324.2019.05.036WANG C Y, ZHAO Y Q, XIE J X, et al. Improvement of genetic algorithm mutation operator[J]. Journal of Shandong Agricultural University (Natural Science Edition), 2019, 50(5): 898-901(in Chinese). doi: 10.3969/j.issn.1000-2324.2019.05.036 [14] 伊胜伟, 张翀斌, 谢丰, 等. 基于Peach的工业控制网络协议安全分析[J]. 清华大学学报(自然科学版), 2017, 57(1): 50-54.YI S W, ZHANG C B, XIE F, et al. Security analysis of industrial control network protocol based on Peach[J]. Journal of Tsinghua University (Natural Science Edition), 2017, 57(1): 50-54(in Chinese). [15] 李谦, 刘嘉勇. 基于Fuzzing技术的样本集精简优化研究[J]. 网络安全技术与应用, 2017(1): 62-64. doi: 10.3969/j.issn.1009-6833.2017.01.039LI Q, LIU J Y. Research on simplification and optimization of sample set based on Fuzzing technology[J]. Network Security Technology and Application, 2017(1): 62-64(in Chinese). doi: 10.3969/j.issn.1009-6833.2017.01.039