首页 | 本学科首页   官方微博 | 高级检索  
     

基于启发强化学习的大规模ADR任务优化方法
引用本文:杨家男,侯晓磊,HU Yu Hen,刘勇,潘泉,冯乾. 基于启发强化学习的大规模ADR任务优化方法[J]. 航空学报, 2021, 42(4): 524354-524354. DOI: 10.7527/S1000-6893.2020.24354
作者姓名:杨家男  侯晓磊  HU Yu Hen  刘勇  潘泉  冯乾
作者单位:西北工业大学自动化学院,西安 710129;美国威斯康星大学麦迪逊分校电气与计算机工程系,麦迪逊 53706
基金项目:国家自然科学基金(61703343,61790552);陕西省自然科学基金(2018JQ6070);中央高校基本科研业务费(3102018JCC003)
摘    要:随着航天事业的蓬勃发展,空间碎片尤其是低轨碎片已成为航天任务不可忽视的威胁。考虑到碎片清除的紧迫性和成本,低轨多碎片主动清除(ADR)技术成为缓解现状的必要手段。针对大规模多碎片主动清除任务规划问题,首先,基于任务规划的最大收益模型,提出一种强化学习(RL)优化方法,并依照强化学习框架定义了该问题的状态、动作以及收益函数;其次,基于高效启发因子,提出一种专用的改进蒙特卡罗树搜索(MCTS)算法,该算法使用MCTS算法作为内核,加入高效启发算子以及强化学习迭代过程;最后,在铱星33碎片云的全数据集中检验了所提算法有效性。与相关MCTS变体方法以及贪婪启发算法对比,所提方法能在测试数据集上更高效地获得较优规划结果,较好地平衡了探索与利用。

关 键 词:空间碎片清除  任务规划  强化学习  启发算子  蒙特卡罗树搜索
收稿时间:2020-06-02
修稿时间:2020-09-12

Heuristic enhanced reinforcement learning method for large-scale multi-debris active removal mission planning
YANG Jianan,HOU Xiaolei,HU Yu Hen,LIU Yong,PAN Quan,FENG Qian. Heuristic enhanced reinforcement learning method for large-scale multi-debris active removal mission planning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524354-524354. DOI: 10.7527/S1000-6893.2020.24354
Authors:YANG Jianan  HOU Xiaolei  HU Yu Hen  LIU Yong  PAN Quan  FENG Qian
Affiliation:1. College of Automation, Northwestern Polytechnical University, Xi'an 710129, China;2. Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison 53706, USA
Abstract:Vigorous development of the space industry leads to a nonnegligible space debris threat to future space activities. The Active multi-Debris Removal (ADR) technology has become an indispensable means to alleviate this situation. Aiming at the large-scale multi-debris active removal mission planning problem, a Reinforcement Learning (RL) planning scheme is first proposed based on the maximal-reward optimization model for the ADR problem, and the state, action, and reward function of this problem are defined according to the RL framework. Based on an efficient heuristics method, a specialized Monte Carlo Tree Search (MCTS) algorithm is then presented, with the Monte Carlo Tree Search as the core structure and efficient heuristic operators and reinforcement learning iteration process. Finally, its effectiveness is tested in the large-scale complete Iridium 33 debris cloud. The results show that this method is superior to the original MCTS algorithm and the heuristic greedy algorithm.
Keywords:active debris removal  mission planning  reinforcement learning  heuristic operator  Monte Carlo tree search  
本文献已被 万方数据 等数据库收录!
点击此处可从《航空学报》浏览原始摘要信息
点击此处可从《航空学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号