首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向异构大数据环境的数据脱敏模型
引用本文:佟玲玲,李鹏霄,段东圣,任博雅,李扬曦.面向异构大数据环境的数据脱敏模型[J].北京航空航天大学学报,2022,48(2):249-257.
作者姓名:佟玲玲  李鹏霄  段东圣  任博雅  李扬曦
作者单位:国家计算机网络应急技术处理协调中心, 北京 100029
基金项目:国家自然科学基金(U1936110,U1836111)~~;
摘    要:不同场景下数据类型和脱敏需求的差异,使得传统的数据脱敏方法难以满足大数据背景下的用户隐私保护需求。如何实现异构大数据中敏感信息的精准定向、高效脱敏,从而更好地确保数据安全、可信和可用,是本领域的研究难点。提出了一种在异构大数据环境下,基于文本、图片、音频和数据库等异构数据的脱敏模型,并对4个关键模块进行了描述。通过脱敏数据预处理,实现不同应用场景下敏感数据的自动标注和分级设置。采用数据预脱敏处理方法,并从数据可用性、数据关联性、隐私保护度、时间和空间复杂度等5个维度进行脱敏效果评价,实现定制化脱敏策略。经过脱敏任务调度完成脱敏任务分配和执行,并支持用户对部分脱敏数据恢复。基于提出的异构大数据脱敏模型,对2种典型数据脱敏应用场景进行了验证分析,表明所提模型能够实现不同应用场景下异构敏感数据的高效脱敏。 

关 键 词:异构大数据    敏感数据自动标注    数据脱敏    脱敏效果评价    机器学习
收稿时间:2020-08-09

Data masking model for heterogeneous big data environ ment
TONG Lingling,LI Pengxiao,DUAN Dongsheng,REN Boya,LI Yangxi.Data masking model for heterogeneous big data environ ment[J].Journal of Beijing University of Aeronautics and Astronautics,2022,48(2):249-257.
Authors:TONG Lingling  LI Pengxiao  DUAN Dongsheng  REN Boya  LI Yangxi
Institution:National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
Abstract:Due to the variety of data types and desensitization demand in different scenarios, traditional data masking methods cannot meet the user privacy protection requirements in the environment of big data. How to realize the accurate pointing and efficient desensitization of heterogeneous big data for data security, trust and availability, has become the key in this area. In this paper, we propose a data masking model for heterogeneous big data applications, such as texts, images, voices and databases, and four key modules are presented in our model. First, the sensitive data automatic identification and classification in different applications are realized in different application scenarios by desensitization data preprocessing. Second, with data pre-masking method, the data masking evaluation is implemented in five dimensions, including data availability, data relevance, degree of privacy protection, and time and space complexity, to construct the customized desensitization strategy. Finally, after task scheduling, the allocation and execution of the data masking tasks are performed, and the masking data recovery can also be partially supported. Two typical data masking applications are verified and analyzed based on the proposed heterogeneous big data masking model, indicating that effective desensitization can be achieved in different application scenarios. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号