首页 | 本学科首页   官方微博 | 高级检索  
     

基于DDPG算法的无人机集群追击任务
引用本文:张耀中,许佳林,姚康佳,刘洁凌. 基于DDPG算法的无人机集群追击任务[J]. 航空学报, 2020, 41(10): 324000-324000. DOI: 10.7527/S1000-6893.2020.24000
作者姓名:张耀中  许佳林  姚康佳  刘洁凌
作者单位:1. 西北工业大学 电子信息学院, 西安 710072;2. 西安北方光电科技防务有限公司, 西安 710043
摘    要:无人机的集群化应用技术是近年来的研究热点,随着无人机自主智能的不断提高,无人机集群技术必将成为未来无人机发展的主要趋势之一。针对无人机集群协同执行对敌方来袭目标的追击任务,构建了典型的任务场景,基于深度确定性策略梯度网络(DDPG)算法,设计了一种引导型回报函数有效解决了深度强化学习在长周期任务下的稀疏回报问题,通过引入基于滑动平均值的软更新策略减少了DDPG算法中Eval网络和Target网络在训练过程中的参数震荡,提高了算法的训练效率。仿真结果表明,训练完成后的无人机集群能够较好地执行对敌方来袭目标的追击任务,任务成功率达到95%。可以说无人机集群技术作为一种全新概念的作战模式在军事领域具有潜在的应用价值,人工智能算法在无人机集群的自主决策智能化发展方向上具有一定的应用前景。

关 键 词:DDPG算法  无人机集群  任务决策  深度强化学习  稀疏回报  
收稿时间:2020-03-21
修稿时间:2020-06-15

Pursuit missions for UAV swarms based on DDPG algorithm
ZHANG Yaozhong,XU Jialin,YAO Kangjia,LIU Jieling. Pursuit missions for UAV swarms based on DDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(10): 324000-324000. DOI: 10.7527/S1000-6893.2020.24000
Authors:ZHANG Yaozhong  XU Jialin  YAO Kangjia  LIU Jieling
Affiliation:1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710072, China;2. Xi'an North Electro-optic Science & Technology Co. Ltd, Xi'an 710043, China
Abstract:The Unmanned Aerial Vehicle (UAV) swarm technology is one of the research hotspots in recent years. With continuous advancement in autonomous intelligence of UAVs, the UAV swarm technology is bound to become one of the main trends of UAV development in the future. In view of the collaborative pursuit missions of UAV swarms against the enemy, we establish a typical task scenario, and, based on the Deep Deterministic Policy Gradient (DDPG) algorithm, design a guided reward function which effectively solves the sparse rewards problem of deep intensive learning during long-period missions. We introduce a sliding average based soft updating strategy to reduce parameter oscillations in the Eval network and the target network during the training process, thereby improving the training efficiency. The simulation results show that after training, the UAV swarm can successfully carry out the pursuit missions with a success rate of 95%. The UAV swarm technology as a brand new combat mode has a potential application value for application in the military field, and this artificial intelligence algorithm has a certain application prospect in the development of autonomous decision-making by UAV swarms.
Keywords:DDPG algorithm  UAV swarms  task decision  deep reinforcement learning  sparse rewards  
本文献已被 万方数据 等数据库收录!
点击此处可从《航空学报》浏览原始摘要信息
点击此处可从《航空学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号