拦截大气层内机动目标的深度强化学习制导律 Deep Reinforcement Learning Guidance Law for Intercepting Endo atmospheric Maneuvering Targets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

拦截大气层内机动目标的深度强化学习制导律

引用本文：	邱潇颀,高长生,荆武兴. 拦截大气层内机动目标的深度强化学习制导律[J]. 宇航学报, 2022, 43(5): 685-695. DOI: 10.3873/j.issn.1000-1328.2022.05.013

作者姓名：	邱潇颀高长生荆武兴

作者单位：	哈尔滨工业大学航天工程系,哈尔滨 150001

基金项目：	国家自然科学基金(12072090);

摘要：	针对大气层内高速机动目标的拦截问题,提出了一种基于双延迟深度确定性策略梯度(TD3)算法的深度强化学习制导律,它直接将交战状态信息映射为拦截弹的指令加速度,是一种端到端、无模型的制导策略。首先,将攻防双方的交战运动学模型描述为适用于深度强化学习算法的马尔科夫决策过程,之后通过合理地设计算法训练所需的交战场景、动作空间、状态空间和网络结构,并引入奖励函数整形和状态随机初始化,构建了完整的深度强化学习制导算法。仿真结果表明：与比例导引和增强比例导引两种方案相比,深度强化学习制导策略在脱靶量更小的同时能够降低对中制导精度的要求;具有良好的鲁棒性和泛化能力,并且计算负担较小,具备在弹载计算机上运行的条件。
关键词：	导弹制导大气层内拦截机动目标深度强化学习马尔科夫决策,
收稿时间：	2021-07-20
Deep Reinforcement Learning Guidance Law for Intercepting Endo atmospheric Maneuvering Targets

QIU Xiaoqi,GAO Changsheng,JING Wuxing. Deep Reinforcement Learning Guidance Law for Intercepting Endo atmospheric Maneuvering Targets[J]. Journal of Astronautics, 2022, 43(5): 685-695. DOI: 10.3873/j.issn.1000-1328.2022.05.013

Authors:	QIU Xiaoqi GAO Changsheng JING Wuxing

Affiliation:	Department of Aerospace Engineering, Harbin Institute of Technology, Harbin 150001, China

Abstract:	Aiming at the problem of intercepting endo atmospheric high speed maneuvering targets, a deep reinforcement learning guidance law is proposed based on the twin delayed deep deterministic policy gradient(TD3) algorithm. It directly maps the engagement information to the commanded acceleration of the interceptor, which is an end to-end, model free guidance strategy. Firstly, the engagement kinematic model of both sides is described as a Markov decision process suitable for deep reinforcement learning algorithms. After that, a complete deep reinforcement learning guidance algorithm is constructed by reasonably designing the engagement scenarios, action space, state space and network structure required for algorithm training. The reward shaping and random initialization are introduced to construct a complete algorithm. The simulation results show that, compared with the proportional guidance and augmented proportional guidance laws, the proposed guidance strategy can reduce the requirement for mid course guidance while having smaller miss distances. It has good robustness and generalization ability, with less computational burden that makes it eligible to run on missile borne computers.

Keywords:	color:#000000 font-family:" font-size:12px font-style:normal font-weight:400 line-height:18px text-decoration:none '>Missile guidance,Endo atmospheric interception Maneuvering target Deep reinforcement learning Markov decision,

	点击此处可从《宇航学报》浏览原始摘要信息
	点击此处可从《宇航学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏