首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于探索区域扩张策略的Q-学习算法
引用本文:胡丹丹,贺振东,刘洁,高庆吉.基于探索区域扩张策略的Q-学习算法[J].中国民航学院学报,2006,24(1):32-35.
作者姓名:胡丹丹  贺振东  刘洁  高庆吉
作者单位:[1]中国民用航空学院机器人研究所,天津300300 [2]东北电力学院自动化学院,吉林132012 [3]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001
摘    要:针对Q-学习算法中探索与利用之间的平衡问题,在基于Metropolis准则的Q-学习的基础上,提出了基于探索区域扩张策略的Q-学习改进算法。消除了初始时刻在整个环境中加入探索的盲目性。提高了学习效率。通过加入算法的自主学习结束条件,避免了找到最优路径后的重复学习,节省了学习时间。仿真实验证明了该算法的有效性。

关 键 词:Q-学习  Metropolis准则  探索区域扩张  模拟退火
文章编号:1001-5000(2006)01-0032-04
收稿时间:2005-08-25
修稿时间:2005-10-12

Q-Learning Algorithm Based on Exploration Region Expansion Policy
HU Dan-dan,HE Zhen-dong,LIU Jie,GAO Qing-ji.Q-Learning Algorithm Based on Exploration Region Expansion Policy[J].Journal of Civil Aviation University of China,2006,24(1):32-35.
Authors:HU Dan-dan  HE Zhen-dong  LIU Jie  GAO Qing-ji
Institution:1.Robotics Research Institute, CAUC, Tianjin 300300,China; 2.Department of Automation Control Engineering, Northeast China Institute of Electric Power Engineering,Jilin 132012,China; 3.Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001 ,China
Abstract:The balance between exploration and exploitation is one of the key problems in Q-learning algorithm. An improved Q-learning algorithm based on exploration region expansion policy is proposed on the basis of Metropolis-based Q-learning. The blindness of exploration in the entire environment is eliminated and the efficiency of learning is increased. Through an automatic termination condition, the redundant learning after finding optimal path is avoided and the time of learning is reduced. The validity of the algorithm is proved by simulation experiment.
Keywords:Q-learning  Metropolis criterion  exploration region expansion  simulated annealing learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号