首页 | 本学科首页   官方微博 | 高级检索  
     检索      

自适应短文本关键词生成模型
引用本文:王永剑,孙亚茹,杨莹.自适应短文本关键词生成模型[J].北京航空航天大学学报,2022,48(2):199-208.
作者姓名:王永剑  孙亚茹  杨莹
作者单位:公安部第三研究所, 上海 201204
摘    要:关键词抽取对文本处理影响较大,其识别的准确度及流畅程度是任务的关键。为有效缓解短文本关键词提取过程中词划分不准确、关键词与文本主题不匹配、多语言混合等难题,提出了一种基于图到序列学习模型的自适应短文本关键词生成模型ADGCN。模型采用图神经网络与注意力机制相结合的方式作为对文本信息特征提取的编码框架,针对词的位置特征和语境特征编码,解决了短文本结构不规律和词之间存在关联复杂信息的问题。同时采用了一种线性解码方案,生成了可解释的关键词。在解决问题的过程中,从某社交平台收集并公布了一个标签数据集,其包括社交平台发文文本和话题标签。实验中,从用户需求角度出发对模型结果的相关性、信息量、连贯性进行评估和分析,所提模型不仅可以生成符合短文本主题的关键词,还可以有效缓解数据扰动对模型的影响。所提模型在公开数据集KP20k上仍表现良好,具有较好的可移植性 

关 键 词:关键词提取    关键词生成    图神经网络    注意力机制    主题模型
收稿时间:2020-10-23

Adaptive short text keyword generation model
WANG Yongjian,SUN Yaru,YANG Ying.Adaptive short text keyword generation model[J].Journal of Beijing University of Aeronautics and Astronautics,2022,48(2):199-208.
Authors:WANG Yongjian  SUN Yaru  YANG Ying
Institution:The Third Research Institute of Ministry of Public Security, Shanghai 201204, China
Abstract:Keyword extraction has a great impact on text processing, and the accuracy and fluency of keyword recognition are the keys to the task. In order to effectively solve the problems such as inaccurate word division, mismatch between keywords and text topics, and multi-language mixing in the process of keyword extraction from short text, we propose an adaptive short text keyword generation model based on graph convolutional neural network (ADGCN). First, the model uses graph neural network as the coding framework of text information feature extraction to solve the problem of irregular short text structure and the existence of complex information between words. Then, according to the location features and context features of words, the self attention mechanism is combined to capture rich context dependent information. Finally, a linear decoding scheme is used to generate interpretable keywords. We collect and publish a tag dataset TH from social media platform, including text and topic tags. We evaluate and analyze the relevance, information and coherence of the model results from the perspective of user needs. The model can not only generate keywords that meet the topic of short text, but also effectively alleviate the impact of data disturbance on the model. It is proved that the model performs well on the public dataset KP20k and has good portability. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号