基于探索区域扩张策略的Q-学习算法 Q-Learning Algorithm Based on Exploration Region Expansion Policy期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于探索区域扩张策略的Q-学习算法

引用本文：	胡丹丹,贺振东,刘洁,高庆吉.基于探索区域扩张策略的Q-学习算法[J].中国民航学院学报,2006,24(1):32-35.

作者姓名：	胡丹丹贺振东刘洁高庆吉

作者单位：	[1]中国民用航空学院机器人研究所,天津300300 [2]东北电力学院自动化学院,吉林132012 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001

摘要：	针对Q-学习算法中探索与利用之间的平衡问题，在基于Metropolis准则的Q-学习的基础上，提出了基于探索区域扩张策略的Q-学习改进算法。消除了初始时刻在整个环境中加入探索的盲目性。提高了学习效率。通过加入算法的自主学习结束条件，避免了找到最优路径后的重复学习，节省了学习时间。仿真实验证明了该算法的有效性。
关键词：	Q-学习 Metropolis准则探索区域扩张模拟退火
文章编号：	1001-5000(2006)01-0032-04
收稿时间：	2005-08-25
修稿时间：	2005-10-12
Q-Learning Algorithm Based on Exploration Region Expansion Policy

HU Dan-dan,HE Zhen-dong,LIU Jie,GAO Qing-ji.Q-Learning Algorithm Based on Exploration Region Expansion Policy[J].Journal of Civil Aviation University of China,2006,24(1):32-35.

Authors:	HU Dan-dan HE Zhen-dong LIU Jie GAO Qing-ji

Institution:	1.Robotics Research Institute, CAUC, Tianjin 300300,China; 2.Department of Automation Control Engineering, Northeast China Institute of Electric Power Engineering,Jilin 132012,China; 3.Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001 ,China

Abstract:	The balance between exploration and exploitation is one of the key problems in Q-learning algorithm. An improved Q-learning algorithm based on exploration region expansion policy is proposed on the basis of Metropolis-based Q-learning. The blindness of exploration in the entire environment is eliminated and the efficiency of learning is increased. Through an automatic termination condition, the redundant learning after finding optimal path is avoided and the time of learning is reduced. The validity of the algorithm is proved by simulation experiment.

Keywords:	Q-learning Metropolis criterion exploration region expansion simulated annealing learning
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏