首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于STDP奖励调节的类脑面向目标导航
引用本文:戴嘉伟,熊智,晁丽君,杨闯.基于STDP奖励调节的类脑面向目标导航[J].导航定位于授时,2023,10(2):47-56.
作者姓名:戴嘉伟  熊智  晁丽君  杨闯
作者单位:南京航空航天大学自动化学院导航研究中心,南京 211106
基金项目:国家自然科学基金(61873125);国防基础科研计划项目(JCKY2020605C009);校创新基金项目(xcxjh20210334)
摘    要:动物具有优秀的空间自主定位导航能力,能够实现在无先验环境信息下的导航定位和导航决策过程。针对智能体在连续空间中面向目标导航问题,研究了一种基于生物学放电时间依赖可塑性学习规则的智能体面向目标导航算法。首先分析了动物面向目标导航决策过程中的生理学机理,在此基础上,构建了基于脉冲神经网络的位置细胞和动作细胞模型。动作细胞间权值采用横向竞争模型更新,通过环境奖励信号的更新,采用放电时间依赖可塑性学习规则对位置细胞前馈动作细胞模型的突触权重进行权值调节,利用动作细胞群的脉冲放电现象表征智能体运动方向和速度。最后,对所提算法进行了仿真实验验证。仿真结果表明,所提出的类脑面向目标导航算法能够在单障碍环境中实现30 ms左右的规划速度,相比传统强化学习Q学习方法平均路径规划长度缩短了15.9%。

关 键 词:类脑目标导航  放电时间依赖可塑性  智能体  脉冲神经网络  位置细胞  动作细胞

Brain-inspired target-driven navigation based on STDP reward modulation
DAI Jiawei,XIONG Zhi,CHAO Lijun,YANG Chuang.Brain-inspired target-driven navigation based on STDP reward modulation[J].Navigation Positioning & Timing,2023,10(2):47-56.
Authors:DAI Jiawei  XIONG Zhi  CHAO Lijun  YANG Chuang
Institution:Navigation Research Center, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Abstract:Animals have an excellent ability to perform autonomous localization and navigation, which can realize the navigation and decision-making process without prior environmental information. Aiming at the problem of target-oriented navigation of agents in continuous space, a target-driven navigation algorithm based on the rule of biological spiking-time-dependent plasticity(STDP) is studied. Firstly, the physiological mechanism in the decision-making process of target-driven navigation in animals is analyzed. On this basis, a place cell and action cell model based on spiking neural network is constructed. The weights between action cells are updated by the horizontal competition model, and the synaptic weights between the place cell and action cell model is updated by the rule of STDP. The movement direction and speed of the agent is represented by the pulse discharge phenomenon of the action cell group. Finally, the proposed algorithm is verified by simulation experiments. Simulation results show that the proposed brain-inspired target-driven navigation algorithm can achieve a planning speed of about 30 ms in a single obstacle environment. Compared with the traditional reinforcement learning method of Q learning, the average path planning length is reduced by 15.9%.
Keywords:Brain-inspired navigation  Spiking-time-dependent plasticity  Agents  Spiking neural network  Place cell  Action cell
点击此处可从《导航定位于授时》浏览原始摘要信息
点击此处可从《导航定位于授时》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号