首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度Q网络的多智能体逃逸算法设计CSCD
引用本文:闫博为,杜润乐,班晓军,周荻.基于深度Q网络的多智能体逃逸算法设计CSCD[J].导航定位于授时,2022(6):40-47.
作者姓名:闫博为  杜润乐  班晓军  周荻
作者单位:哈尔滨工业大学航天学院,哈尔滨 150000;;试验物理与计算数学国家级重点实验室,北京 100076
摘    要:当前多智能体追逃博弈问题通常在二维平面下展开研究,且逃逸方智能体运动不受约束,同时传统方法在缺乏准确模型时存在设计控制策略困难的问题。针对三维空间中逃逸方智能体运动受约束的情况,提出了一种基于深度Q网络(DQN)的多智能体逃逸算法。该算法采用分布式学习的方法,逃逸方智能体通过对环境的探索学习得到满足期望的逃逸策略。为提高学习效率,根据任务的难易程度将智能体策略学习划分为两个阶段,并设计了相应的奖励函数引导智能体探索满足期望的逃逸策略。仿真结果表明,该算法所得逃逸策略效果稳定,并且具有泛化能力,在改变一定的初始位置条件后,逃逸方智能体也可成功逃逸。

关 键 词:逃逸算法  深度强化学习  多智能体  深度Q网络

Multi-Agent Evasion Algorithm Design Based on Deep Q-Network
YAN Bo-wei,DU Run-le,BAN Xiao-jun,ZHOU Di.Multi-Agent Evasion Algorithm Design Based on Deep Q-Network[J].Navigation Positioning & Timing,2022(6):40-47.
Authors:YAN Bo-wei  DU Run-le  BAN Xiao-jun  ZHOU Di
Institution:The School of Astronautics, University of the Harbin Institute of Technology, Harbin 150000, China;;National Key Laboratory of Science and Technology on Test Physics and Numerical Mathematics, Beijing 100076, China
Abstract:At present, the problem of multi-agent pursuit-evasion game is usually studied in the two-dimensional plane, and the movement of the evader is not constrained. At the same time, one problem is that it is difficult for traditional methods to design control strategy without accurate model. Therefore, this paper proposes a multi-agent evasion algorithm based on deep Q-network when the motion of evader is constrained in three-dimensional space. The proposed algorithm is a decentralized algorithm, and the evader obtains the desired evasive strategy by exploring and learning the environment. In order to improve the learning efficiency, the agent strategy learning is divided into two stages according to the difficulty of the task, and the corresponding reward function is designed to guide the agent to explore the desired evasive strategy. The simulation results show that the effect of the evasive strategy obtained by the algorithm is stable, and the algorithm has generalization ability, and the evader can successfully evade after changing certain initial position conditions.
Keywords:Evasion algorithm  Deep reinforcement learning  Multi-agent  Deep Q-Network
本文献已被 维普 等数据库收录!
点击此处可从《导航定位于授时》浏览原始摘要信息
点击此处可从《导航定位于授时》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号