首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于fastText算法的行业分类技术
引用本文:吴震,冉晓燕,苗权,刘纯艳,张栋,魏娜.基于fastText算法的行业分类技术[J].北京航空航天大学学报,2022,48(2):193-198.
作者姓名:吴震  冉晓燕  苗权  刘纯艳  张栋  魏娜
作者单位:1.国家计算机网络应急技术处理协调中心, 北京 100029
摘    要:随着中国经济的高速发展和技术创新能力的不断提升,高效的组织、分类信息是提供个性化行业管理和跟踪分析的基础。根据行业信息特点和发展规律,提出了一种基于fastText算法的行业分类模型。首先,构建行业分类关键词库,通过特征词库进行分词和权重计算。然后,构建分类器模型,实现中文行业的自动分类。最后,实验选取了80 000个包含企业经营范围、企业信息、舆论信息的测试文档,结果表明,所提模型结果高于Bayes、决策树、KNN等分类算法,取得了较好的应用效果。 

关 键 词:自然语言处理    行业分类    fastText算法    关键词    语法模型
收稿时间:2020-08-09

Industry classification technology based on fastText algorithm
WU Zhen,RAN Xiaoyan,MIAO Quan,LIU Chunyan,ZHANG Dong,WEI Na.Industry classification technology based on fastText algorithm[J].Journal of Beijing University of Aeronautics and Astronautics,2022,48(2):193-198.
Authors:WU Zhen  RAN Xiaoyan  MIAO Quan  LIU Chunyan  ZHANG Dong  WEI Na
Institution:1.National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China2.Beijing Branch of National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100055, China3.Great Wall Computer Software & System Inc., Beijing 100190, China
Abstract:With the rapid development of China's economy and the continuous improvement of technological innovation ability, efficient organization and classification information is the basis of providing personalized industry management and tracking analysis. According to the characteristics of industry information and the law of development, a Chinese industry classification model based on fastText is proposed in this paper. First, the keyword database of industry classification is constructed, then word segmentation and weight calculation are carried out by feature lexicon, and finally the classifier model is constructed to realize the automatic classification of industry. In the experiment, 80 000 test documents including business scope, enterprise information and public opinion information were selected. The results show that the classification accuracy of the proposed model is higher than that of Bayes, decision tree, KNN and other classification algorithms. Thus, the proposed model works well in the application. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号