基于分支深度强化学习的非合作目标追逃博弈策略求解 Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于分支深度强化学习的非合作目标追逃博弈策略求解

引用本文：	刘冰雁,叶雄兵,高勇,王新波,倪蕾.基于分支深度强化学习的非合作目标追逃博弈策略求解[J].航空学报,2020,41(10):324040-324040.

作者姓名：	刘冰雁叶雄兵高勇王新波倪蕾

作者单位：	1. 军事科学院, 北京 100091;2. 解放军 32032部队, 北京 100094;3. 航天工程大学, 北京 101416

摘要：	为解决航天器与非合作目标的空间交会问题，缓解深度强化学习在连续空间的应用限制，提出了一种基于分支深度强化学习的追逃博弈算法，以获得与非合作目标的空间交会策略。对于非合作目标的空间交会最优控制，运用微分对策描述为连续推力作用下的追逃博弈问题；为避免传统深度强化学习应对连续空间存在维数灾难问题，通过构建模糊推理模型来表征连续空间，提出了一种具有多组并行神经网络和共享决策模块的分支深度强化学习架构。实现了最优控制与博弈论的结合，有效解决了微分对策模型高度非线性且难于利用经典最优控制理论进行求解的难题，进一步提升了深度强化学习对离散行为的学习能力，并通过算例仿真检验了该算法的有效性。
关键词：	非合作目标空间交会航天器追逃问题连续空间微分对策深度强化学习分支架构
收稿时间：	2020-03-31
修稿时间：	2020-10-25
Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning

LIU Bingyan,YE Xiongbing,GAO Yong,WANG Xinbo,NI Lei.Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning[J].Acta Aeronautica et Astronautica Sinica,2020,41(10):324040-324040.

Authors:	LIU Bingyan YE Xiongbing GAO Yong WANG Xinbo NI Lei

Institution:	1. Academy of Military Sciences, Beijing 100091, China;2. 32032 Troops, Beijing 100094, China;3. Space Engineering University, Beijing 101416, China

Abstract:	To solve the space rendezvous problem between spacecraft and non-cooperative targets and alleviate application limitations of deep reinforcement learning in continuous space, this paper proposes a pursuit-evasion game algorithm based on branching deep reinforcement learning to obtain the space rendezvous strategy. The differential game is used to solve the optimal control problem of space intersection for non-cooperative targets, which is described as a pursuit-evasion game problem under the action of continuous thrust. To avoid the dimension disaster of the traditional deep reinforcement learning in dealing with continuous space, this paper constructs a fuzzy inference model to represent the continuous space, and proposes a branching deep reinforcement learning architecture with multiple parallel neural networks and a shared decision module. The combination of optimal control and game theory is realized, effectively overcoming the difficulty in solving the highly nonlinear differential game model by the classical optimal control theory, and further improving the training ability of deep reinforcement learning on discrete behaviors. Finally, an example is given to verify the effectiveness of the algorithm.

Keywords:	non-cooperative targets space rendezvous pursuit-evasion problem of spacecraft continuous space differential game deep reinforcement learning branching architectures
本文献已被万方数据等数据库收录！
	点击此处可从《航空学报》浏览原始摘要信息
	点击此处可从《航空学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏