首页 | 本学科首页   官方微博 | 高级检索  
     

Dirichlet混合样本的EM算法与动态聚类算法比较
引用本文:夏棒,EMILION Richard,王惠文. Dirichlet混合样本的EM算法与动态聚类算法比较[J]. 北京航空航天大学学报, 2019, 45(9): 1805-1811. DOI: 10.13700/j.bh.1001-5965.2018.0752
作者姓名:夏棒  EMILION Richard  王惠文
作者单位:中国工商银行 博士后科研工作站,北京,100032;奥尔良大学 MAPMO 研究所, 奥尔良 45000;北京航空航天大学 经济管理学院,北京,100083
基金项目:国家自然科学基金71420107025
摘    要:Dirichlet分布是一类包含正参数向量的连续多元概率分布,在比例结构问题中具有广泛的应用。针对Dirichlet混合样本的聚类问题,进行了最大期望(EM)算法和动态聚类算法研究。首先,推导其数学过程,并给出算法迭代步骤。然后,利用数字仿真实验,比较了EM算法与动态聚类算法两种机器学习算法在Dirichlet混合样本中的聚类效果。最后,计算对数似然函数值、程序运行时间、收敛迭代次数、聚类正确率、真正率(TPR)和假正率(FPR)6个评价指标。仿真实验结果表明,EM算法聚类正确率更高但是运算效率相对较低,而动态聚类算法运算效率较高但是损失了部分聚类正确率。因此,实际应用中建议综合权衡聚类正确率与运算效率的相对需求后,再选取合适算法进行Dirichlet混合样本聚类。 

关 键 词:Dirichlet分布  混合样本  最大期望(EM)算法  动态聚类  机器学习
收稿时间:2018-12-25

Comparison between EM algorithm and dynamical clustering algorithm for Dirichlet mixture samples
Affiliation:1.Postdoctoral Workstation, Industrial and Commercial Bank of China, Beijing 100032, China2.MAPMO, University of Orleans, Orleans 45000, France3.School of Economics and Management, Beihang University, Beijing 100083, China
Abstract:Dirichlet distribution is a kind of continuous multivariate probability distribution with positive parameter vectors, which is widely used in proportional structure problems. Expectation maximization (EM) algorithm and dynamical clustering algorithm of Dirichlet mixture samples are presented, their mathematical process is deduced, and the iteration steps of the algorithms are given. Then, using digital simulation experiments, the clustering effects of the two machine learning algorithms with Dirichlet samples are compared. By calculating six evaluation factors which are log-likelihood function value, program running time, convergence iteration times, clustering accuracy, true positive rate (TPR) and false positive rate (FPR), the simulation results show that EM algorithm has higher clustering accuracy but lower operational efficiency, while dynamical clustering algorithm has higher operational efficiency but loses some clustering accuracy. Therefore, in practical application, it is suggested to weigh the relative requirements of accuracy and operational efficiency before selecting a suitable algorithm to cluster Dirichlet samples. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号