首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于生成对抗近端策略优化的机动策略优化算法
引用本文:付宇鹏,邓向阳,朱子强,高阳,张立民.基于生成对抗近端策略优化的机动策略优化算法[J].海军航空工程学院学报,2023,38(3):257-261, 300.
作者姓名:付宇鹏  邓向阳  朱子强  高阳  张立民
作者单位:海军航空大学,山东烟台 264001;海军航空大学,山东烟台 264001;清华大学,北京 100084
摘    要:针对传统强化学习算法在生成空战机动策略时存在收敛效率低、专家经验利用不足的问题,研究了基于生成对抗-近端策略优化的策略生成算法。算法采用判别器-策略-价值(DAC)网络框架,在近端策略优化(PPO)算法基础上,利用专家数据和环境交互数据训练判别器网络,并反馈调节策略网络,实现了约束策略向专家策略方向优化,提高了算法收敛效率和专家经验利用率。仿真环境为基于 JSBSim开源平台的 F-16飞机空气动力学模型。仿真结高,PPO果表明,本文算法收敛效率高于算法,生成的策略模型具备较好的智能性。

关 键 词:生成对抗模仿学习  近端策略优化  机动决策  强化学习  模仿学习

GA-PPO Based Maneuvering Policy Optimization Algorithm
FU Yupeng,DENG Xiangyang,ZHU Ziqiang,GAO Yang,ZHANG Limin.GA-PPO Based Maneuvering Policy Optimization Algorithm[J].Journal of Naval Aeronautical Engineering Institute,2023,38(3):257-261, 300.
Authors:FU Yupeng  DENG Xiangyang  ZHU Ziqiang  GAO Yang  ZHANG Limin
Institution:Naval Aviation University, Yantai Shandong 264001,China;Naval Aviation University, Yantai Shandong 264001,China ;Tsinghua university,Beijing 1000084, China
Abstract:To address the issues that the traditional reinforcement learning algorithm has low convergence efficiency and in.sufficient use of expert data in air combat maneuver decisions, an algorithm based on generative adversarial technique is designed.The algorithm adopts the Discriminator-Actor-Critic (DAC) framework. Based on Proximal Policy Optimization (PPO) algorithm,the discriminator is trained with expert data and environmental interactive data, while training the policy network to achieve thethe optimization of constrained policy towards the expert policy, which improves the convergence of the algorithmand the utilizationefficiency of expert experience. The simulation environment is based on the F-16 aircraft aerodynamic model on the JSBSim opensource platform. The simulation results show that the convergence efficiency of this algorithm is higher than that of the PPOalgorithm, and the generated policy model has good intelligence.
Keywords:Generative Adversarial Imitation Learning (GAIL)  Proximal Policy Optimization (PPO)  manuevering decision  reinforcement learning  imitation learning
点击此处可从《海军航空工程学院学报》浏览原始摘要信息
点击此处可从《海军航空工程学院学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号