基于生成对抗近端策略优化的机动策略优化算法 GA-PPO Based Maneuvering Policy Optimization Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于生成对抗近端策略优化的机动策略优化算法

引用本文：	付宇鹏,邓向阳,朱子强,高阳,张立民.基于生成对抗近端策略优化的机动策略优化算法[J].海军航空工程学院学报,2023,38(3):257-261, 300.

作者姓名：	付宇鹏邓向阳朱子强高阳张立民

作者单位：	海军航空大学,山东烟台 264001;海军航空大学,山东烟台 264001;清华大学,北京 100084

摘要：	针对传统强化学习算法在生成空战机动策略时存在收敛效率低、专家经验利用不足的问题,研究了基于生成对抗-近端策略优化的策略生成算法。算法采用判别器-策略-价值(DAC)网络框架,在近端策略优化(PPO)算法基础上,利用专家数据和环境交互数据训练判别器网络,并反馈调节策略网络,实现了约束策略向专家策略方向优化,提高了算法收敛效率和专家经验利用率。仿真环境为基于 JSBSim开源平台的 F-16飞机空气动力学模型。仿真结高,PPO果表明,本文算法收敛效率高于算法,生成的策略模型具备较好的智能性。
关键词：	生成对抗模仿学习近端策略优化机动决策强化学习模仿学习
GA-PPO Based Maneuvering Policy Optimization Algorithm

FU Yupeng,DENG Xiangyang,ZHU Ziqiang,GAO Yang,ZHANG Limin.GA-PPO Based Maneuvering Policy Optimization Algorithm[J].Journal of Naval Aeronautical Engineering Institute,2023,38(3):257-261, 300.

Authors:	FU Yupeng DENG Xiangyang ZHU Ziqiang GAO Yang ZHANG Limin

Institution:	Naval Aviation University, Yantai Shandong 264001,China;Naval Aviation University, Yantai Shandong 264001,China ;Tsinghua university,Beijing 1000084, China

Abstract:	To address the issues that the traditional reinforcement learning algorithm has low convergence efficiency and in.sufficient use of expert data in air combat maneuver decisions, an algorithm based on generative adversarial technique is designed.The algorithm adopts the Discriminator-Actor-Critic (DAC) framework. Based on Proximal Policy Optimization (PPO) algorithm,the discriminator is trained with expert data and environmental interactive data, while training the policy network to achieve thethe optimization of constrained policy towards the expert policy, which improves the convergence of the algorithmand the utilizationefficiency of expert experience. The simulation environment is based on the F-16 aircraft aerodynamic model on the JSBSim opensource platform. The simulation results show that the convergence efficiency of this algorithm is higher than that of the PPOalgorithm, and the generated policy model has good intelligence.

Keywords:	Generative Adversarial Imitation Learning (GAIL) Proximal Policy Optimization (PPO) manuevering decision reinforcement learning imitation learning

	点击此处可从《海军航空工程学院学报》浏览原始摘要信息
	点击此处可从《海军航空工程学院学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏