期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

罗庆张涛单鹏张文涛刘子豪《航空学报》2021,42(8):525792-525792

重构蓝图定义了故障状态下系统软硬件资源的重新配置方案,是实现综合模块化航空电子系统重构容错的关键。提出了一种基于改进Q学习的重构蓝图生成方法,综合考虑负载均衡、重构影响、重构时间、重构降级等多优化目标,并应用模拟退火框架改进探索策略,提高了传统Q学习算法的收敛性能。实验结果表明,与模拟退火算法、差分进化算法、传统Q学习算法相比,本文提出的改进Q学习算法效率更高,所生成重构蓝图质量更高。相似文献

2.

一种基于NOMA的Q学习卫星通信随机接入方法

下载免费PDF全文

杨伟康许小东《遥测遥控》2022,43(2):25-35

基于非正交多址接入(NOMA)的Q学习(Q-Learning)随机接入方法(NORA-QL)是实现物联网中海量设备泛在接入的一项有效技术.为了解决NORA-QL方法仍存在的传输能效和过载容量较低的问题,提出了一种适合卫星通信网络的改进方法(I-NORA-QL).针对传输功耗高的问题,I-NORA-QL利用卫星广播的全局... 相似文献

3.

Switching control of morphing aircraft based on Q-learning

《中国航空学报》2020,33(2):672-687

This paper investigates a switching control strategy for the altitude motion of a morphing aircraft with variable sweep wings based on Q-learning. The morphing process is regarded as a function of the system states and a related altitude motion model is established. Then, the designed controller is divided into the outer part and inner part, where the outer part is devised by a combination of the back-stepping method and command filter technique so that the ‘explosion of complexity’ problem is eliminated. Moreover, the integrator structure of the altitude motion model is exploited to simplify the back-stepping design, and disturbance observers inspired from the idea of extended state observer are devised to obtain estimations of the system disturbances. The control input switches from the outer part to the inner part when the altitude tracking error converges to a small value and linear approximation of the altitude motion model is applied. The inner part is generated by the Q-learning algorithm which learns the optimal command in the presence of unknown system matrices and disturbances. It is proved rigorously that all signals of the closed-loop system stay bounded by the developed control method and controller switching occurs only once. Finally, comparative simulations are conducted to validate improved control performance of the proposed scheme. 相似文献

4.

基于探索区域扩张策略的Q-学习算法

胡丹丹贺振东刘洁高庆吉《中国民航学院学报》2006,24(1):32-35

针对Q-学习算法中探索与利用之间的平衡问题，在基于Metropolis准则的Q-学习的基础上，提出了基于探索区域扩张策略的Q-学习改进算法。消除了初始时刻在整个环境中加入探索的盲目性。提高了学习效率。通过加入算法的自主学习结束条件，避免了找到最优路径后的重复学习，节省了学习时间。仿真实验证明了该算法的有效性。相似文献

5.

基于形态自适应网络的无人机目标跟踪方法

刘贞报马博迪高红岗院金彪江飞鸿张军红赵闻《航空学报》2021,42(4):524904-524904

针对无人机影像目标跟踪过程中常出现的目标方向变化、目标遮挡变化、样本多样性不足等问题,提出了一种基于形态自适应网络的无人机航空影像目标跟踪算法。首先使用基于数据驱动的方法对数据集进行扩增,添加了遮挡样本和多旋转角度样本,提高样本多样性;提出的形态自适应网络模型通过旋转不变约束改进深度置信网络,提取强表征能力的深度特征,使得模型能够自动适应目标形态变化,利用深度特征变换算法获取待检测目标的预定位区域,采用基于Q学习算法的搜索机制对目标进行自适应精准定位,使用深度森林分类器提取跟踪目标的类别信息,得到高精度的目标跟踪结果。在多个数据集上进行了对比实验,实验结果表明该算法能够达到较高的跟踪精度,可以适应目标旋转、目标遮挡等形态变化情况,具有较好的准确性和鲁棒性。相似文献

6.

Optimal trajectory and downlink power control for multi-type UAV aerial base stations

《中国航空学报》2021,34(9):11-23

Unmanned Aerial Vehicles (UAVs) enabled Aerial Base Stations (UABSs) have been studied widely in future communications. However, there are a series of challenges such as interference management, trajectory design and resource allocation in the scenarios of multi-UAV networks. Besides, different performances among UABSs increase complexity and bring many challenges. In this paper, the joint downlink transmission power control and trajectory design problem in multi-type UABSs communication network is investigated. In order to satisfy the signal to interference plus noise power ratio of users, each UABS needs to adjust its position and transmission power. Based on the interactions among multiple communication links, a non-cooperative Mean-Field-Type Game (MFTG) is proposed to model the joint optimization problem. Then, a Nash equilibrium solution is solved by two steps: first, the users in the given area are clustered to get the initial deployment of the UABSs; second, the Mean-Field Q (MFQ)-learning algorithm is proposed to solve the discrete MFTG problem. Finally, the effectiveness of the approach is verified through the simulations, which simplifies the solution process and effectively reduces the energy consumption of each UABS. 相似文献

7.

基于导向强化Q学习的无人机路径规划

周彬郭艳李宁钟锡健《航空学报》2021,42(9):325109-325109

随着无人机的广泛应用,其飞行能耗和计算能力面临着瓶颈问题,因此无人机路径规划研究越来越重要。很多情况下,无人机并不能提前获得目标点的确切位置和环境信息,往往无法规划出一条有效的飞行路径。针对这一问题,提出了基于导向强化Q学习的无人机路径规划方法,该方法利用接收信号强度定义回报值,并通过Q学习算法不断优化路径;提出"导向强化"的原则,加快了学习算法的收敛速度。仿真结果表明,该方法能够实现无人机的自主导航和快速路径规划,与传统算法相比,大大减少了迭代次数,能够获得更短的规划路径。相似文献

8.

Multi-agent Q-Learning control of spacecraft formation flying reconfiguration trajectories

《Advances in Space Research (includes Cospar's Information Bulletin, Space Research Today)》2023,71(3):1627-1643

This paper presents a novel approach based on multi-agent reinforcement learning for spacecraft formation flying reconfiguration tracking problems. In this scheme, spacecrafts learn the control strategy via transfer learning. For this matter, a new generalized discounted value function is introduced for the tracking problems. Due to the digital nature of spacecraft computer systems, local optimal controllers are developed for the spacecrafts in discrete-time. The stability of the controller is proven. Two Q-learning algorithms are proposed, in each of which the optimal control solution is learned on-line without knowledge about the system dynamics. In the first algorithm, each agent learns the optimal control independently. In the second one, each agent shares the learned information with other agents. Next, the collision avoidance capability is provided. The effectiveness of the presented schemes is verified through simulations and compared with each other. 相似文献

9.

Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks

《中国航空学报》2023,36(4):338-353

Reinforcement Learning (RL) techniques are being studied to solve the Demand and Capacity Balancing (DCB) problems to fully exploit their computational performance. A locally generalised Multi-Agent Reinforcement Learning (MARL) for real-world DCB problems is proposed. The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management (ATFM) region to quickly obtain a satisfactory solution. In this method, agents of all flights in a scenario form a multi-agent decision-making system based on partial observation. The trained agent with the customised neural network can be deployed directly on the corresponding flight, allowing it to solve the DCB problem jointly. A cooperation coefficient is introduced in the reward function, which is used to adjust the agent’s cooperation preference in a multi-agent system, thereby controlling the distribution of flight delay time allocation. A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated. Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method. From a statistical point of view, it is proven that the proposed method is generalised within the scope of the flights and sectors of interest, and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods. The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation. 相似文献