Abstract: | Due to the distance from the earth, the large delay error in measurement and control system, the complicated flight environment and the difficulty in predicting in advance, the autonomous guidance technology for planetary soft landing currently has challenges such as difficult horizontal position estimation, lack of navigation reference information, and difficult terrain landing. A model-based reinforcement learning guidance method based on guided policy search(GPS) is proposed to this issue, which realizes that when the lander is disturbed in the initial state, there is no need to re-plan, and it can still fall to the specified condition under constraints. In this method, the iterative linear quadratic regulator is used as the controller to generate the initial trajectory; secondly, a multi-layer neural network is used to fit the guidance policy; finally, the controller supervises the policy learning and then converges to generate a feasible policy. This paper takes the soft landing of the planet surface as an example for simulation verification. The simulation results show that the algorithm can achieve soft landing rapidly with the changed initial state only through a few training. On the one hand, it shows the efficient use of data based on model-based reinforcement learning; on the other hand, it also proves that the reinforcement learning method has broad application prospects in the field of deep space exploration. |