首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于加权朴素贝叶斯分类器和极端随机树的蛋白质接触图预测
引用本文:金康荣,於东军.基于加权朴素贝叶斯分类器和极端随机树的蛋白质接触图预测[J].南京航空航天大学学报,2018,50(5):619-628.
作者姓名:金康荣  於东军
作者单位:南京理工大学计算机科学与工程学院, 南京, 210094
基金项目:国家自然科学基金(61373062,61772273)资助项目。
摘    要:提出一个新的基于集成学习的预测器(TargetPCM),对蛋白质接触图(特别是中长程)进行高精度的预测。首先,TargetPCM使用加权朴素贝叶斯分类器(Weighted Nave Bayes classifier,WNBC)融合3个接触图预测器的输出,其中WNBC中的权重参数通过粒子群算法优化得到;其次,将WNBC融合后的输出和基于序列的特征进行组合,得到更具鉴别能力的特征;在此基础上,应用极端随机树训练得到最终的蛋白质接触图预测模型。为了验证TargetPCM的有效性,在包含98个非冗余蛋白质的数据集上进行了测试。结果表明:对于短程、中程和长程接触,TargetPCM的Top L/5精度比现有最好的集成预测器(NeBcon)分别提高了8.2%,16.1%和5.3%。在CASP11上进一步的验证表明,对于短程、中程和长程接触,TargetPCM的Top L/5精度比现有最好的基于协同进化的集成预测器(MetaPSICOV)分别提高了7.4%,9.1%和7.5%。实验结果验证了本文所提蛋白质接触图预测方法的有效性。

关 键 词:模式识别与智能系统  蛋白质接触图  特征提取  加权朴素贝叶斯分类器  粒子群算法  极端随机树
收稿时间:2017/11/1 0:00:00
修稿时间:2017/12/28 0:00:00

Improved Contact Map Prediction Using Weighted Naïve Bayes Classifier and Extremely Randomized Trees
JIN Kangrong,YU Dongjun.Improved Contact Map Prediction Using Weighted Naïve Bayes Classifier and Extremely Randomized Trees[J].Journal of Nanjing University of Aeronautics & Astronautics,2018,50(5):619-628.
Authors:JIN Kangrong  YU Dongjun
Institution:School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Abstract:The accurate prediction of residue-residue contacts provides crucial help to the ab initio protein folding and 3D structure modeling, because the accurately predicted contacts can enforce useful constraints to the structure assembly. Recent CASP experiments have witnessed the prosperities on this topic and a number of promising protein contact map predictors have emerged in the past decades. Although much progress has been made, challenges (e.g., low prediction accuracy for long-range contacts) remain. Here we developed a new meta-based predictor, called TargetPCM, which can achieve high accuracy for protein contact map prediction. TargetPCM combines the outputs of three existing powerful contact map predictors by using a weighted Naïve Bayes classifier (WNBC), among which the weight parameters are optimized with particle swarm optimization (PSO) algorithm. Then, the outputs of WNBC are further combined with the intrinsic sequence-based features and fed to the final prediction model, which is trained with extremely randomized trees (ERT), for performing contact map prediction. Tested on 98 non-redundant proteins, our TargetPCM improves the Top L/5 accuracy over the best meta-based predictor (NeBcon) by 8.2%, 16.1% and 5.3%, respectively, for short-, medium- and long-range contacts. Further investigations on CASP 11 show that TargetPCM improves the Top L/5 accuracy over the best co-evolution based meta-server predictor (MetaPSICOV) by 7.4%, 9.1% and 7.5%, respectively, for short-, medium- and long-range contacts. Detailed analysis on the experimental results shows that both the effective utilization of complementary information from base predictors and the powerful learning capability of ERT account for the performance improvements of the proposed TargetPCM over existing contact map predictors.
Keywords:pattern recognition and intelligent system  protein contact map  feature extraction  weighted Naïve Bayes classifier  particle swarm optimization  extremely randomized trees
本文献已被 CNKI 等数据库收录!
点击此处可从《南京航空航天大学学报》浏览原始摘要信息
点击此处可从《南京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号