Reinforcement learning of a continuous motor sequence with hidden states

Authors: Arie, Hiroaki1; Ogata, Tetsuya2; Tani, Jun3; Sugano, Shigeki1

Source: Advanced Robotics, Volume 21, Number 10, 2007 , pp. 1215-1229(15)

Publisher: VSP, an imprint of Brill

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content

Abstract:

Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron.

Keywords: RECURRENT NEURAL NETWORK; REINFORCEMENT LEARNING; ACTOR-CRITIC METHOD; PERCEPTUAL ALIASING PROBLEM; PENDULUM SWING-UP

Document Type: Research article

DOI: 10.1163/156855307781389365

Affiliations: 1: Department of Mechanical Engineering, Waseda University, 3-4-1 Okubo Shinjuku-ku, Tokyo 169-8555, Japan 2: Graduate School of Informatics, Kyoto University, Yoshida-honmachi Sakyo-ku, Kyoto 606-8501, Japan 3: Brain Science Institute, RIKEN, 2-1 Hirosawa Wako-shi, Saitama 351-0198, Japan

The full text electronic article is available for purchase. You will be able to download the full text electronic article after payment.

$25.00 plus tax

 

OR

Back to top

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content
Page Help Click here for Page Help
Shopping cart
Tools
Sign in






Need to register?
Sign up here
Text size: A | A | A | A