Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning

作者姓名：	Wenhong ZHOU Jie LI Zhihong LIU Lincheng SHEN

作者单位：	College of Intelligence Science and Technology, National University of Defense Technology

基金项目：	the National Natural Science Foundation of China (No. 61906209)；

摘要：	Multi-Target Tracking Guidance(MTTG) in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV) swarms. Although Multi-Agent Deep Reinforcement Learning(MADRL) is a promising technique for learning cooperation, most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement. This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooper...
收稿时间：	24 March 2021
Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning

Wenhong ZHOU,Jie LI,Zhihong LIU,Lincheng SHEN.Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning[J].Chinese Journal of Aeronautics,2022,35(7):100-112.

Institution:	College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

Abstract:	Multi-Target Tracking Guidance (MTTG) in unknown environments has great potential values in applications for Unmanned Aerial Vehicle (UAV) swarms. Although Multi-Agent Deep Reinforcement Learning (MADRL) is a promising technique for learning cooperation, most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement. This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms. This method reshapes each UAV's reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors. And the dependence between UAVs can be directly captured by the Pointwise Mutual Information (PMI) neural network without complicated aggregation statistics. Then, the experience sharing Reciprocal Reward Multi-Agent Actor-Critic (MAAC-R) algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs. Experiments demonstrate that the proposed algorithm can improve the UAVs’ cooperation more effectively than the baseline algorithms, and can stimulate a rich form of cooperative tracking behaviors of UAV swarms. Besides, the learned policy can better scale to other scenarios with more UAVs and targets.

Keywords:	Decentralized cooperation Maximum reciprocal reward Multi-agent actor-critic Pointwise mutual information Reinforcement learning
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏