首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Sarsa(λ)强化学习的空间机械臂路径规划研究
引用本文:徐帷,卢山.基于Sarsa(λ)强化学习的空间机械臂路径规划研究[J].宇航学报,2019,40(4):435-443.
作者姓名:徐帷  卢山
作者单位:1. 上海航天控制技术研究所,上海 201109;2. 上海市空间智能控制技术重点实验室,上海 201109
基金项目:上海市科技人才计划(17XD1420700);上海市自然科学基金(16ZR1415600)
摘    要:针对目标特性未知的在轨操作环境,研究了典型空间操作机械臂的路径规划策略。采用Sarsa(λ)强化学习方法实现目标跟踪及避障的自主路径规划与智能决策,该方法将机械臂系统的每节臂视为一个决策智能体,通过感知由目标偏差和障碍距离程度组成的二维状态,设计符合人工经验的拟合奖赏函数,进行各臂转动动作的强化训练,最终形成各智能体的状态-动作值函数表,即可作为机械臂在线路径规划的决策依据。将本方法应用于多自由度空间机械臂路径规划任务,仿真结果表明新算法能在有限训练次数内实现对移动目标的稳定跟踪与避障,同时各智能体通过学习所得的状态-动作值函数表,具备较强的后期在线自主调整能力,从而验证了算法较强的鲁棒性和智能性。

关 键 词:强化学习  Sarsa方法  空间机械臂  路径规划  
收稿时间:2018-07-10

Analysis of Space Manipulator Route Planning Based on Sarsa (λ) Reinforcement Learning
XU Wei,LU Shan.Analysis of Space Manipulator Route Planning Based on Sarsa (λ) Reinforcement Learning[J].Journal of Astronautics,2019,40(4):435-443.
Authors:XU Wei  LU Shan
Institution:1. Shanghai Institute of Spaceflight Control Technology, Shanghai 201109, China; 2. Shanghai Key Laboratory of Aerospace Intelligent Control Technology, Shanghai 201109, China
Abstract: Focusing on the on-orbit manipulating environment with uncertain target characters,the route planning strategy of a typical space manipulator is studied. The Sarsa(λ) reinforcement learning algorithm is used to achieve the goal of the autonomous route planning and intelligent decision for the tasks on target tracking and obstacle avoidance. This method considers each arm in a manipulator system as a decision agent, by means of percepting the two dimensional states consisting of the target deviation and the degree of obstacle distance, designing and fitting a reward function corresponding to the artificial experience,and the reinforced training on rotating action by each arm,the final state-action value function table of each agent can be used as a decision basis for the online manipulator route planning. We use this method on route planning task by a space manipulator with multi-degree of freedom, the simulation result shows that this new algorithm can achieve the requirement for stable tracking of a moving target and simultaneous obstacle avoidance within finite training times, meanwhile, the state-action value function table of each agent obtained from the reinforcement learning possesses the strong capacity of subsequent online autonomous adjustment, which validates the robustness and intelligence of this algorithm.
Keywords:Reinforcement learning  Sarsa method  Space manipulator  Route planning  
本文献已被 CNKI 等数据库收录!
点击此处可从《宇航学报》浏览原始摘要信息
点击此处可从《宇航学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号