首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning
作者姓名:Wenhong ZHOU  Jie LI  Zhihong LIU  Lincheng SHEN
作者单位:College of Intelligence Science and Technology, National University of Defense Technology
基金项目:the National Natural Science Foundation of China (No. 61906209);
摘    要:Multi-Target Tracking Guidance(MTTG) in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV) swarms. Although Multi-Agent Deep Reinforcement Learning(MADRL) is a promising technique for learning cooperation, most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement. This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooper...

收稿时间:24 March 2021

Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning
Wenhong ZHOU,Jie LI,Zhihong LIU,Lincheng SHEN.Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning[J].Chinese Journal of Aeronautics,2022,35(7):100-112.
Institution:College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
Abstract:Multi-Target Tracking Guidance (MTTG) in unknown environments has great potential values in applications for Unmanned Aerial Vehicle (UAV) swarms. Although Multi-Agent Deep Reinforcement Learning (MADRL) is a promising technique for learning cooperation, most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement. This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms. This method reshapes each UAV's reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors. And the dependence between UAVs can be directly captured by the Pointwise Mutual Information (PMI) neural network without complicated aggregation statistics. Then, the experience sharing Reciprocal Reward Multi-Agent Actor-Critic (MAAC-R) algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs. Experiments demonstrate that the proposed algorithm can improve the UAVs’ cooperation more effectively than the baseline algorithms, and can stimulate a rich form of cooperative tracking behaviors of UAV swarms. Besides, the learned policy can better scale to other scenarios with more UAVs and targets.
Keywords:Decentralized cooperation  Maximum reciprocal reward  Multi-agent actor-critic  Pointwise mutual information  Reinforcement learning
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号