首页 | 官方网站   微博 | 高级检索  
     

自适应学习率的增量强化学习飞行控制
引用本文:刘俊辉,单家元,荣吉利,郑雄.自适应学习率的增量强化学习飞行控制[J].宇航学报,2022,43(1):111-121.
作者姓名:刘俊辉  单家元  荣吉利  郑雄
作者单位:1.北京理工大学宇航学院,北京 100081;2.北京理工大学飞行器动力学与控制教育部重点实验室,北京100081;3.中国运载火箭技术研究院,北京100076
基金项目:国家自然科学基金(62103049,61873031);
摘    要:针对预先设定学习率的增量强化学习(IRL)飞行控制律失败率较高,并且无法适应飞行器大范围动力学特性变化下的稳定控制问题,提出一种自适应学习率的增量强化学习(ALRIRL)控制方法。首先,基于小波分析方法构造控制系统稳定度评价函数,用于评估控制器稳定度。然后,基于梯度下降法设计学习率在线迭代计算方法,以提升强化学习控制器的收敛性。最后,通过随机初始状态及随机动压变化下蒙特卡洛打靶试验和数学仿真来验证ALRIRL算法,仿真结果表明提出的方法能够根据参考状态跟踪误差振荡情况自适应调整学习率参数,实现飞行姿态稳定跟踪控制,提高强化学习飞行控制器的成功率。该方法减轻了IRL飞行控制算法对预先设定学习率超参数的依赖,拓宽了IRL在飞行器大范围动力学参数变化情况下的应用。

关 键 词:自适应学习率  小波分析  飞行控制  增量强化学习  
收稿时间:2021-02-05

Incremental Reinforcement Learning Flight Control with Adaptive Learning Rate
LIU Jun hui,SHAN Jia yuan,RONG Ji li,ZHENG Xiong.Incremental Reinforcement Learning Flight Control with Adaptive Learning Rate[J].Journal of Astronautics,2022,43(1):111-121.
Authors:LIU Jun hui  SHAN Jia yuan  RONG Ji li  ZHENG Xiong
Affiliation:1.School of Aerospace Engineering,Beijing Institute of Technology, Beijing 100081, China; 2. Key Laboratory of Dynamics and Control of Flight Vehicle, Ministry of Education, Beijing Institute of Technology, Beijing 100081, China; 3. China Academy of Launch Vehicle Technology, Beijing 100076, China
Abstract:The existing incremental reinforcement learning (IRL) flight control with preset learning rate has a high failure rate under autonomous learning and can not adapt to control flight vehicle stably with wide range of dynamic variation. An online adaptive learning rate based incremental reinforcement learning (ALRIRL) control method is proposed. First of all, based on the wavelet analysis, a cost function is constructed to evaluate the stability of the controller. Then, utilizing gradient descent method, an online iterative method of learning rate is designed to improve the convergence of IRL. Finally,the nonlinear numerical simulation and Monte Carlo shooting test are developed under random initial state and random dynamic pressure variation to demonstrate the effectiveness of the proposed ALRIRL. The simulation results show that the proposed method can adaptively adjust the learning rate according to the control performance of real time monitoring, maintain attitude stability of flight vehicle, and improve the success rate of IRL. The proposed method can reduce the dependence of IRL flight control algorithm on the preset learning rate, and broaden the application of IRL in the case of large scale dynamic parameters variation of flight vehicle.
Keywords:Adaptive learning rate  Wavelet analysis  Flight control  Incremental reinforcement learning  
点击此处可从《宇航学报》浏览原始摘要信息
点击此处可从《宇航学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号