首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于支持向量的迭代修正质心文本分类算法
引用本文:王德庆,张辉.基于支持向量的迭代修正质心文本分类算法[J].北京航空航天大学学报,2013,39(2):269-274.
作者姓名:王德庆  张辉
作者单位:北京航空航天大学软件开发环境国家重点实验室,北京,100191;北京航空航天大学软件开发环境国家重点实验室,北京,100191
基金项目:核高基重大专项资助项目(2010ZX01042-002)
摘    要:针对质心分类算法容易产生归纳偏置或模型失配问题的不足,提出一种基于支持向量的迭代修正质心分类算法.该方法仅使用由支持向量机(SVMs,Support Vector Machines)选出的支持向量来构造质心向量,然后利用训练集误分样本来迭代修正初始质心向量.与其他分类算法相比,该算法取得较好的宏平均F1和微平均F1,在8个常用文本分类数据集上的实验验证了该算法的有效性,特别是在不均衡文本语料上.

关 键 词:文本分类  质心向量  支持向量  迭代修正  支持向量机
收稿时间:2012-01-11

Support-vector-based iteratively adjusted centroid classifier for text categorization
Wang Deqing Zhang Hui.Support-vector-based iteratively adjusted centroid classifier for text categorization[J].Journal of Beijing University of Aeronautics and Astronautics,2013,39(2):269-274.
Authors:Wang Deqing Zhang Hui
Institution:State Key Laboratory of Software Development Environment, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
Abstract:To address the lackness of centroid-based classifier (CC) that is prone to generate inductive bias or model misfit, a support-vector-based iteratively-adjusted centroid classifier (IACC_SV) was proposed, which employs support vectors found by some routines, e.g., linear support vector machines (SVMs) to construct centroid vectors for CC, and then iteratively adjusts the initial centroid vectors according to the misclassified training samples. Compared with traditional classification algorithms, IACC_SV achieves better performance in terms of macro-F1 and micro-F1, and the extensive experiments on 8 real-world text corpora demonstrate the effectiveness of the proposed algorithm, especially on text corpora with highly imbalanced classes.
Keywords:text categorization  centroid vector  support vector  iterative adjustment  support vector machines
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号