首页 | 本学科首页   官方微博 | 高级检索  
     检索      

行星软着陆GPS有模型强化学习制导方法
作者姓名:张阳康  孙 晨  泮斌峰
作者单位:西北工业大学 航天学院 航天飞行动力学技术国家级重点实验室
基金项目:装备预研实验室基金(6142210200312)
摘    要:由于距离地球较远、测控延时误差较大、飞行环境十分复杂且难以提前预测,行星软着陆的自主制导技术目前存在水平位置估计困难、导航参考信息匮乏、复杂地形着陆困难等挑战。针对行星软着陆存在的困难和挑战,提出了基于引导策略搜索算法的有模型强化学习制导方法,实现了着陆器在初始状态受到扰动时,无需重新规划,仍能在满足约束条件的情况下降落在指定位置。该方法将迭代线性二次调节器作为控制器,产生初始轨迹;其次,使用多层神经网络拟合制导策略;最后,利用控制器监督策略学习,进而收敛产生可行策略。针对行星表面软着陆的仿真验证结果显示该算法仅通过几次循环,即可以实现初始状态变化的快速软着陆。一方面表明了基于有模型强化学习的数据高效利用率,另一方面也证明了强化学习方法在深空探测领域中具有广阔的应用前景。

关 键 词:迭代线性二次调节器  引导策略搜索  有模型强化学习  行星软着陆

Guidance Method of Planetary Soft Landing with GPS Model-Based Reinforcement Learning
Authors:ZHANG Yangkang  SUN Chen  PAN Binfeng
Abstract:Due to the distance from the earth, the large delay error in measurement and control system, the complicated flight environment and the difficulty in predicting in advance, the autonomous guidance technology for planetary soft landing currently has challenges such as difficult horizontal position estimation, lack of navigation reference information, and difficult terrain landing. A model-based reinforcement learning guidance method based on guided policy search(GPS) is proposed to this issue, which realizes that when the lander is disturbed in the initial state, there is no need to re-plan, and it can still fall to the specified condition under constraints. In this method, the iterative linear quadratic regulator is used as the controller to generate the initial trajectory; secondly, a multi-layer neural network is used to fit the guidance policy; finally, the controller supervises the policy learning and then converges to generate a feasible policy. This paper takes the soft landing of the planet surface as an example for simulation verification. The simulation results show that the algorithm can achieve soft landing rapidly with the changed initial state only through a few training. On the one hand, it shows the efficient use of data based on model-based reinforcement learning; on the other hand, it also proves that the reinforcement learning method has broad application prospects in the field of deep space exploration.
Keywords:iterative Linear Quadratic Regulator (iLQR)  Guided Policy Search (GPS)  model-based reinforcement learning  planetary soft landing
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号