基于改进DQN的复合模式在轨服务资源分配 Allocation of composite mode on-orbit service resource based on improved DQN期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于改进DQN的复合模式在轨服务资源分配

引用本文：	刘冰雁,叶雄兵,周赤非,刘必鎏.基于改进DQN的复合模式在轨服务资源分配[J].航空学报,2020,41(5):323630-323630.

作者姓名：	刘冰雁叶雄兵周赤非刘必鎏

作者单位：	1. 军事科学院, 北京 100091;2. 中国人民解放军 32032部队, 北京 100094

摘要：	针对开展在轨服务前的资源分配非线性多目标优化问题，构建复合服务模式下的在轨资源分配模型，基于对DQN （Deep Q-Network）方法的收敛性和稳定性改进，提出了一种在轨服务资源分配方法。该方法能够应对同时包含"一对多""多对一"的复合服务模式，并在满足预期成功率的前提下优先分配重要服务对象，兼顾资源分配综合效益和总体能耗效率，达到了以期望成功率、较少资源投入尽快完成任务的综合目标。仿真实验表明，改进DQN方法能够在任务执行前依据服务对象重要程度自主分配航天器资源，收敛速度快、训练误差低，在分配效益和总体能耗的优化方面具有明显的比较优势。
关键词：	在轨服务整数规划资源分配深度强化学习神经网络
收稿时间：	2019-11-04
修稿时间：	2019-11-28
Allocation of composite mode on-orbit service resource based on improved DQN

LIU Bingyan,YE Xiongbing,ZHOU Chifei,LIU Biliu.Allocation of composite mode on-orbit service resource based on improved DQN[J].Acta Aeronautica et Astronautica Sinica,2020,41(5):323630-323630.

Authors:	LIU Bingyan YE Xiongbing ZHOU Chifei LIU Biliu

Institution:	1. Academy of Military Sciences, Beijing 100091, China;2. 32032 Troops, Beijing 100094, China

Abstract:	In order to solve the nonlinear multi-objective optimization before on-orbit service, an on-orbit service resource allocation model under the composite service mode is constructed, and an on-orbit service resource allocation method based on Deep Q Network (DQN) convergence and stability improvement was proposed. This approach can cope with a composite service pattern which includes "one to many" and "many to one". This method can prioritize the allocation of important service objects on the premise of satisfying the expected success rate, and at the same time, take into account the comprehensive benefit of resource allocation and the overall energy consumption efficiency, achieving the comprehensive goal of completing the task efficiently and with the expected success rate and less resource input. Simulation results show that improved DQN method can independently allocate spacecraft resources based on the importance of service objects. This method has the advantages of fast convergence, low training error, and obvious comparative advantages in the optimization of distribution benefits and overall energy consumption.

Keywords:	on-orbit servicing integer programming resource allocation deep reinforcement learning neural network
本文献已被万方数据等数据库收录！
	点击此处可从《航空学报》浏览原始摘要信息
	点击此处可从《航空学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏