首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于多智能体混合学习的多星协同动态任务规划算法(英文)
引用本文:王冲,李军,景宁,王钧,陈浩.基于多智能体混合学习的多星协同动态任务规划算法(英文)[J].中国航空学报,2011,24(4):493-505.
作者姓名:王冲  李军  景宁  王钧  陈浩
作者单位:国防科学技术大学电子科学与工程学院
基金项目:National High-tech Research and Development Program of China (2007AA120203)
摘    要:针对多星协同动态任务规划问题,以往多采用基于启发式的重规划算法,但是由于启发式策略依赖于具体任务,使得优化性受到影响。注意到协同规划的历史信息对后续协同规划的影响,本文提出了一种基于策略迭代的多智能体强化学习和迁移学习的混合学习算法求解该问题近似最优策略。本文的多智能体强化学习方法利用神经网络描述各颗卫星的强化学习策略,通过协同进化的方法迭代搜索具有最优拓扑结构和连接权重的策略神经网络个体。针对随机出现的观测任务请求导致历史学习策略失效,通过迁移学习将历史学习策略转换为当前初始策略,保证规划质量前提下加快多星协同任务规划速度。仿真实验及分析结果表明本文算法对动态随机出现的任务请求有良好的适应性。

关 键 词:多卫星动态任务规划问题  多智能体强化学习  增量拓扑神经演化算法  迁移学习
收稿时间:13 December 2010

A Distributed Cooperative Dynamic Task Planning Algorithm for Multiple Satellites Based on Multi-agent Hybrid Learning
Chong WANG,Jun LI,Ning JING,Jun WANG,Hao CHENAuthor vitae.A Distributed Cooperative Dynamic Task Planning Algorithm for Multiple Satellites Based on Multi-agent Hybrid Learning[J].Chinese Journal of Aeronautics,2011,24(4):493-505.
Authors:Chong WANG  Jun LI  Ning JING  Jun WANG  Hao CHEN[Author vitae]
Institution:College of Electronic Science and Engineering, National University of Defense Technology, Changsha 410073, China
Abstract:Traditionally, heuristic re-planning algorithms are used to tackle the problem of dynamic task planning for multiple satellites. However, the traditional heuristic strategies depend on the concrete tasks, which often affect the result's optimality. Noticing that the historical information of cooperative task planning will impact the latter planning results, we propose a hybrid learning algorithm for dynamic multi-satellite task planning, which is based on the multi-agent reinforcement learning of policy iteration and the transfer learning. The reinforcement learning strategy of each satellite is described with neural networks. The policy neural network individuals with the best topological structure and weights are found by applying co-evolutionary search iteratively. To avoid the failure of the historical learning caused by the randomly occurring observation requests, a novel approach is proposed to balance the quality and efficiency of the task planning, which converts the historical learning strategy to the current initial learning strategy by applying the transfer learning algorithm. The simulations and analysis show the feasibility and adaptability of the proposed approach especially for the situation with randomly occurring observation requests.
Keywords:multiple satellites dynamic task planning problem  multi-agent systems  reinforcement learning  neuroevolution of augmenting topologies  transfer learning
本文献已被 CNKI ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号