首页 | 本学科首页   官方微博 | 高级检索  
     

基于分支深度强化学习的非合作目标追逃博弈策略求解
引用本文:刘冰雁,叶雄兵,高勇,王新波,倪蕾. 基于分支深度强化学习的非合作目标追逃博弈策略求解[J]. 航空学报, 2020, 41(10): 324040-324040. DOI: 10.7527/S1000-6893.2020.24040
作者姓名:刘冰雁  叶雄兵  高勇  王新波  倪蕾
作者单位:1. 军事科学院, 北京 100091;2. 解放军 32032部队, 北京 100094;3. 航天工程大学, 北京 101416
摘    要:
为解决航天器与非合作目标的空间交会问题,缓解深度强化学习在连续空间的应用限制,提出了一种基于分支深度强化学习的追逃博弈算法,以获得与非合作目标的空间交会策略。对于非合作目标的空间交会最优控制,运用微分对策描述为连续推力作用下的追逃博弈问题;为避免传统深度强化学习应对连续空间存在维数灾难问题,通过构建模糊推理模型来表征连续空间,提出了一种具有多组并行神经网络和共享决策模块的分支深度强化学习架构。实现了最优控制与博弈论的结合,有效解决了微分对策模型高度非线性且难于利用经典最优控制理论进行求解的难题,进一步提升了深度强化学习对离散行为的学习能力,并通过算例仿真检验了该算法的有效性。

关 键 词:非合作目标  空间交会  航天器追逃问题  连续空间  微分对策  深度强化学习  分支架构  
收稿时间:2020-03-31
修稿时间:2020-10-25

Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning
LIU Bingyan,YE Xiongbing,GAO Yong,WANG Xinbo,NI Lei. Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(10): 324040-324040. DOI: 10.7527/S1000-6893.2020.24040
Authors:LIU Bingyan  YE Xiongbing  GAO Yong  WANG Xinbo  NI Lei
Affiliation:1. Academy of Military Sciences, Beijing 100091, China;2. 32032 Troops, Beijing 100094, China;3. Space Engineering University, Beijing 101416, China
Abstract:
To solve the space rendezvous problem between spacecraft and non-cooperative targets and alleviate application limitations of deep reinforcement learning in continuous space, this paper proposes a pursuit-evasion game algorithm based on branching deep reinforcement learning to obtain the space rendezvous strategy. The differential game is used to solve the optimal control problem of space intersection for non-cooperative targets, which is described as a pursuit-evasion game problem under the action of continuous thrust. To avoid the dimension disaster of the traditional deep reinforcement learning in dealing with continuous space, this paper constructs a fuzzy inference model to represent the continuous space, and proposes a branching deep reinforcement learning architecture with multiple parallel neural networks and a shared decision module. The combination of optimal control and game theory is realized, effectively overcoming the difficulty in solving the highly nonlinear differential game model by the classical optimal control theory, and further improving the training ability of deep reinforcement learning on discrete behaviors. Finally, an example is given to verify the effectiveness of the algorithm.
Keywords:non-cooperative targets  space rendezvous  pursuit-evasion problem of spacecraft  continuous space  differential game  deep reinforcement learning  branching architectures  
本文献已被 万方数据 等数据库收录!
点击此处可从《航空学报》浏览原始摘要信息
点击此处可从《航空学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号