首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A policy iteration method for improving robot assembly trajectory efficiency
Institution:State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China
Abstract:Bolt assembly by robots is a vital and difficult task for replacing astronauts in extra-vehicular activities (EVA), but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bolt. In this paper, a policy iteration method based on reinforcement learning (RL) is proposed, by which the problem of trajectory efficiency improvement is constructed as an issue of RL-based objective optimization. Firstly, the projection relation between raw data and state-action space is established, and then a policy iteration initialization method is designed based on the projection to provide the initialization policy for iteration. Policy iteration based on the protective policy is applied to continuously evaluating and optimizing the action-value function of all state-action pairs till the convergence is obtained. To verify the feasibility and effectiveness of the proposed method, a noncontact demonstration experiment with human supervision is performed. Experimental results show that the initialization policy and the generated policy can be obtained by the policy iteration method in a limited number of demonstrations. A comparison between the experiments with two different assembly tolerances shows that the convergent generated policy possesses higher trajectory efficiency than the conservative one. In addition, this method can ensure safety during the training process and improve utilization efficiency of demonstration data.
Keywords:Bolt assembly  Policy initialization  Policy iteration  Reinforcement learning (RL)  Robotic assembly  Trajectory efficiency
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号