首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度强化学习的软件定义卫星姿态控制算法
引用本文:许轲,吴凤鸽,赵军锁.基于深度强化学习的软件定义卫星姿态控制算法[J].北京航空航天大学学报,2018,44(12):2651-2659.
作者姓名:许轲  吴凤鸽  赵军锁
作者单位:中国科学院软件研究所, 北京 100190
摘    要:深度强化学习(DRL)作为一种新型的基于机器学习的控制算法,在机器人和无人机等智能控制领域展现出了优异的性能,而卫星姿态控制领域仍然在广泛使用传统的PID控制算法。随着卫星的小型化、智能化以至软件定义卫星的出现,传统控制算法越来越难以满足姿态控制系统对适应性、自主性、鲁棒性的需求。因此对基于深度强化学习的姿态控制算法进行了研究,该算法使用基于模型的算法,比非基于模型的算法拥有更快的收敛速度。与传统控制策略相比,该算法无需对卫星的物理参数和轨道参数等先验知识,具有较强的适应能力和自主控制能力,可以满足软件定义卫星适应不同硬件环境,进行快速研发和部署的需求。此外,该算法通过引入目标网络和并行化启发式搜索算法之后,在网络精度和计算速度方面进行了优化,并且通过仿真实验进行了验证。 

关 键 词:强化学习    深度学习    智能控制    卫星姿态控制    软件定义卫星
收稿时间:2018-06-13

Software defined satellite attitude control algorithm based on deep reinforcement learning
XU Ke,WU Fengge,ZHAO Junsuo.Software defined satellite attitude control algorithm based on deep reinforcement learning[J].Journal of Beijing University of Aeronautics and Astronautics,2018,44(12):2651-2659.
Authors:XU Ke  WU Fengge  ZHAO Junsuo
Institution:Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Abstract:Deep reinforcement learning (DRL) technique is a new kind of machine learning based control algorithm, which shows its outstanding performance in the area of robotics and unmanned aerial vehicle. Meanwhile, in the area of satellite attitude control, traditional PID control algorithm is still widely used. As satellites become smaller and more intelligent and software defined satellite emerges, traditional control methods are even harder to meet the needs of adaptability, autonomy and robustness. To deal with these problems, a deep reinforcement learning based attitude control algorithm is proposed. It is a kind of model-based algorithm, which has much faster convergence speed than model-free algorithm. Compared with traditional method, this algorithm does not need prior knowledge of satellite's physical or orbit parameters and has better adaptability and autonomy, which make it possible for software defined satellite to adapt to different hardware environments and to be developed and deployed much faster. Furthermore, through introducing target network and parallelized heuristic search algorithm, the proposed algorithm has higher network accuracy and faster computation speed. The simulation experiment verifies these improvements.
Keywords:reinforcement learning  deep learning  intelligent control  satellite attitude control  software defined satellite
本文献已被 万方数据 等数据库收录!
点击此处可从《北京航空航天大学学报》浏览原始摘要信息
点击此处可从《北京航空航天大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号