二次奖罚学习自动机 Quadratic Reward-Penalty Learning Automaton期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

二次奖罚学习自动机

引用本文：	刘晓.二次奖罚学习自动机[J].航空计算技术,1999,29(2):47-49.

作者姓名：	刘晓

作者单位：	中国航空计算技术研究所,西安,710068

摘要：	研究了奖罚型学习自动机的一种非线性强化算法。与线性的奖罚模型（ＬＲＰ）不同，新模型的行动选择概率的更新函数为二次的。这使得该模型的学习性能优于ＬＲＰ，且对不同的环境，其具有不同的行为和特点。
关键词：	人工智能强化学习学习自动机
修稿时间：	1998-12-08
Quadratic Reward-Penalty Learning Automaton

Liu Xiao.Quadratic Reward-Penalty Learning Automaton[J].Aeronautical Computer Technique,1999,29(2):47-49.

Authors:	Liu Xiao

Abstract:	In this paper a nonlinear reinforcement algorithm for reward penalty type learning automata is studied. It is different from the linear reward penalty model (L RP ), the update function of action selection probability of the presented algorithm is quadratic. The learning performance of the new model is superior to the one of the L RP Additionally, for different environments, the proposed automaton possesses different behaviours and properties.

Keywords:	Artificial intelligence Reinforcement learning Learning automata
本文献已被 CNKI 维普万方数据等数据库收录！