首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于DOM树的XML数据频繁模式挖掘算法
引用本文:吉根林,韦素云,鲍培明.一种基于DOM树的XML数据频繁模式挖掘算法[J].南京航空航天大学学报,2006,38(2):206-211.
作者姓名:吉根林  韦素云  鲍培明
作者单位:南京师范大学计算机系,南京,210097
基金项目:江苏省高校自然科学基金(04KJB520075和03KJD520117)资助项目
摘    要:由于XM L数据具有半结构化特性,使得面向XM L数据的数据挖掘不同于面向关系数据库的数据挖掘,它具有更复杂的层次结构。研究基于DOM树的XM L数据频繁模式挖掘算法,提出面向XM L数据的频繁模式增量式挖掘算法F reqtT ree。该算法首先将XM L数据转化成DOM树,然后从DOM树挖掘所有频繁模式。F reqtT ree算法采用最右扩展技术,只在树的最右分支上增加新结点生成新树。同时充分利用已生成的频繁模式信息,使得产生的候选模式数量较少。F reqtT ree算法利用频繁k-1模式的支持数计算候选k模式的支持数,该算法只对DOM树遍历一次,具有较高的效率。采用多组数据对此算法的性能进行检验,并与其他算法作对比实验,实验结果表明该算法高效可行。

关 键 词:XML  DOM树  频繁模式  增量式挖掘  数据挖掘
文章编号:1005-2615(2006)02-0206-06
收稿时间:2005-07-28
修稿时间:2005-11-07

DOM-Based Algorithm of Mining Frequent Patterns from XML Data
Ji Genlin,Wei Suyun,Bao Peiming.DOM-Based Algorithm of Mining Frequent Patterns from XML Data[J].Journal of Nanjing University of Aeronautics & Astronautics,2006,38(2):206-211.
Authors:Ji Genlin  Wei Suyun  Bao Peiming
Institution:Department of Computer, Nanjing Normal University, Nanjing, 210097, China
Abstract:Data mining in XML data has a more complicated hierarchical data structure because of the semi-structured data feature,and it quite differs from the rational database-based mining.This paper presents an efficient mining algorithm FreqtTree for discovering all frequent patterns from XML data.Firstly,the algorithm transfers XML data into a DOM tree,and then adopts an incremental method to mine all frequent patterns from the DOM tree.The key of the algorithm FreqtTree is the notion of the rightmost expansion to increase a tree by attaching new nodes only on the rightmost branch.The number of candidate patterns is small because of utilizing the information of the frequent patterns discovered in the pervious iteration.In addition,the algorithm FreqtTree sufficiently uses the support of frequent(k-1) patterns to compute the support of candidate k pattern.Combining the above techniques,the algorithm traverses the DOM tree only once.Finally,a group of XML data is applied to test the performance of the algorithm and the experimental result is compared with other algorithms.Experimental results show that the algorithm is effective.
Keywords:XML  DOM tree  frequent patterns  incremental mining  data mining  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号