首页 | 本学科首页   官方微博 | 高级检索  
     检索      

术语定义抽取的特征选择框架
引用本文:潘湑.术语定义抽取的特征选择框架[J].南京航空航天大学学报,2012,44(3):399-404.
作者姓名:潘湑
作者单位:南京航空航天大学民航学院 南京 210016
基金项目:中国民航局民航应用研究基金
摘    要:为了进一步提升航空领域术语定义抽取的精度和效率,提出了一种不依赖已有特征选择方法的特征选择框架。该框架结合了分类特征的类间分布差异和类内分布差异,更好地表达了术语定义内部各子概念间特征分布的差异对划分类别的贡献。在分析该框架和传统过滤器特征选择方法对特征分布的影响的基础上,在航空领域术语定义语料库中对实验结果进行了对比。结果表明,本文提出的方法在使用平衡随机森林方法时,取得的最好成绩为F1-measure=0.652,F2-measure=0.761,所需特征比例从30%~40%降低到20%~30%;在使用直接分类方法时,F1-measure成绩提高了2.57倍,F2-measure成绩提高了3.11倍,均优于过滤器方法和Fisher Score方法。

关 键 词:特征选择  不平衡语料  定义抽取  文本分类  小析取项

Feature Selection Framework Research in Extracting Term Definition
Pan Xu , Gu Hongbin , Zhao Zhiqing.Feature Selection Framework Research in Extracting Term Definition[J].Journal of Nanjing University of Aeronautics & Astronautics,2012,44(3):399-404.
Authors:Pan Xu  Gu Hongbin  Zhao Zhiqing
Institution:(College of Civil Aviation,Nanjing University of Aeronautics & Astronautics,Nanjing,210016,China)
Abstract:A feature selection framework not reling on existing feature selection method in extracting definitions is extracted from aviation professional corpus.The framework combines between-class distribution difference and within-class distribution difference of features to express contribution of small disjuncts.After analyzing influence of traditional filter method and the framework on feature distribution,experimental results are compared in corpus of term definition corpus of aviation.In BRF classification,features required to obtain the best scores F1-measure=0.652,F2-measure=0.761 is decreased from 30%—40% to 20%—30% by using the proposed framework.In SVM classification,F1-measure of classifier using the framework is increased by 2.57 times and F2-measure is increased by 3.11 times.The results are superior to the filter method and the Fisher Score method.
Keywords:feature selection  unbalanced corpus  definition extraction  text categorization  small disjunct
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《南京航空航天大学学报》浏览原始摘要信息
点击此处可从《南京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号