Approximate policy iteration using regularised Bellman residuals minimisation
In this paper we present an approximate policy iteration (API) method called API‐BRM using a very effective implementation of incremental support vector regression (SVR) to approximate the value function able to generalise in continuous (or large) space reinforcement learning (RL) problems. RL is a methodology able to solve complex and uncertain decision problems usually modelled as Markov decision problems. API-BRM is formalised as a non-parametric regularisation problem based on an outcome of the Bellman residual minimisation (BRM) which is able to minimise the variance of the problem. API-BRM is incremental and can be applied to RL using the on-line agent interaction framework. Based on non-parametric SVR API-BRM is able to find the global solution of the problem with convergence guarantees to the optimal solution. A value function should be defined to find the optimal policy specifying the total reward that an agent might expect in its current state taking one action. Therefore, the agent will use the value function to choose the action to take. Some experimental evidence and performance for well-known RL benchmarks are presented.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
Document Type: Research Article
Affiliations: Computer Science Department, Universitat Politecnica de Catalunya, Barcelona, Spain
Publication date: March 3, 2016