Learning algorithm
Parallel learning architecture
Recent neuroscience studies have revealed heterogeneous learning modules that work in parallel to achieve fast and robust learning of skilled behaviors.
We propose a novel framework termed cooperative competitive concurrent learning with importance sampling (CLIS). The CLIS can select an appropriate module for action and accurately improve the policies of all learning modules, including those that are not selected, based on the method of importance sampling.
Natural TD algorithm
Recently, the study on the information geometry is introduced in order to improve the learning efficiency of the gradient estimates in the framework of the policy gradient based reinforcement learning. However, its computational cost is expensive for implementation into the robotic systems. Therefore, we propose an efficient implementation of the natural policy gradient estimates by introducing an additional estimator that approximates the temporal difference error in terms of the state and action.
Reward function
Learning from intrinsic and extrinsic rewards
- [see more detail]
Vectorized rewards
- [see more detail]
State representation
Hierarchical mixture of experts
A real robot receives a wealth of varied high dimensional sensory inputs, and it must construct task-relevant state representations based on its own online experience. We propose a method to extract the task-relevant features by applying an idea of Hierarchical Mixture of Experts (HMoE).
Learning and evolution
Embodied evolution of survival
Evolution of neural controllers
We consider how the complexity of evolved neural controllers depends on the environment using foraging behavior of the Cyber Rodent in two different environments. In the first environment, each fruit can be seen from limited directions and different groups of fruits become ripe in different periods. In the second environment, fruits inside a zone are rewarding and those outside are aversive. After evolution, agents with recurrent neural controller outperformed those with feed-forward controllers by effectively using the memory of border passage. Simulation and experimental results confirmed the selection of appropriate complexity of neural controller, both in size and structure, through evolution.
Evolution of meta-parameters
The performance of learning systems depends critically on a number of meta-parameters that controls how the detailed system parameters change with learning. In most of the previous approaches, the meta-parameters are determined based on the experience of the experimenters. However, humans and animals can learn novel behaviors under a wide variety of environments without help of the experimenters/supervisors. We propose a new method to determine the values of meta-parameters such as learning rate and temperature for exploration based on evolutionary approach.
Evolution of hierarchical architectures in RL
Hierarchical structure is often introduced into reinforcement learning to cope with large scaled problems. However, a limitation to the use of hierarchical reinforcement learning algorithms is that the structure has to be given by the designer in advance. We present an evolutionary approach for automatic construction of the structure by combining the MAXQ hierarchical reinforcement learning and the genetic programming.



