首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Gram-Schmidt变换的有监督变量聚类
引用本文:刘瑞平,王惠文,王珊珊.基于Gram-Schmidt变换的有监督变量聚类[J].北京航空航天大学学报,2019,45(10):2003-2010.
作者姓名:刘瑞平  王惠文  王珊珊
作者单位:北京航空航天大学 经济管理学院,北京,100083;北京航空航天大学 经济管理学院,北京 100083;北京航空航天大学 大数据科学与脑机智能高精尖创新中心,北京 100083;北京航空航天大学 经济管理学院,北京 100083;城市运行应急保障模拟技术北京市重点实验室,北京 100083
基金项目:国家自然科学基金71420107025国家自然科学基金11701023
摘    要:为进一步研究回归模型中高维数据的降维方法,提出基于Gram-Schmidt变换的新的有监督变量聚类(SCV-GS)方法。该方法未采用以潜变量为聚类中心的层次聚类,而是借用变量扫描思想,依次挑出对响应变量有重要贡献的关键变量,并将其作为聚类中心。SCV-GS方法基于Gram-Schmidt变换,对变量之间的高度相关性进行批量处理,并得到聚类结果;同时,结合偏最小二乘思想,提出新的同一性度量,并以此来选取最佳聚合参数。SCV-GS不仅可以快速得到变量聚类结果,而且可识别出对响应变量的解释及预测起关键作用的变量类。仿真表明该聚类方法运算速度显著提升,而且所得潜变量对应的回归系数的估计结果与对照方法表现一致;实例分析表明该方法具有更好的解释性和预测能力。 

关 键 词:降维  变量聚类  回归  高度相关  Gram-Schmidt变换
收稿时间:2019-02-16

Supervised clustering of variables based on Gram-Schmidt transformation
Institution:1.School of Economics and Management, Beihang University, Beijing 100083, China2.Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100083, China3.Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, Beijing 100083, China
Abstract:In order to study the dimension reduction method of high-dimensional data based on regression model further, and the supervised clustering of variables algorithm based on Gram-Schmidt transformation (SCV-GS) is proposed. SCV-GS uses the key variables selected in turn by the variable screening idea as the clustering center, which is different from the hierarchical variable clustering around latent variables. High correlation among variables is processed based on Gram-Schmidt transformation and the clustering results are obtained. At the same time, combined with the concept of partial least squares, a new criterion for "homogeneity" is proposed to select the optimal clustering parameters. SCV-GS can not only get the variable clustering results quickly, but also identify the most relevant variable groups and in what kind of structure the variables work to influence the response variable. Simulation results show that the calculation speed is significantly improved by SCV-GS, and the estimated regression coefficients corresponding to the latent variables are consistent with the comparison method. Real data analysis shows that SCV-GS performs better in interpretation and prediction. 
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号