The ability to learn in complex environments is the key reason that humans are effective problem solvers. Thus, the first thing we need to do to bring the behaviour of machine intelligence closer to the human level is to advance its ability to learn in hard, intricate conditions.

Maze problems, usually represented as grid-like two-dimensional areas that may contain different objects of any quantity and various quality, serve as a simplified virtual model of the real world. Their relative simplicity allows us to control the process of learning and trace the behaviour of the learning agent at every stage. At the same time the idea of maze environments includes a virtually unlimited number of graduated complexity levels, enabling researchers to use as simple or as complex environments as they need. These two factors make maze environments a good research paradigm for many navigation-based problems of Artificial Intelligence, from domestic appliance robots and autopilots for the automotive industry to network routing agents and autonomous walking robots for space research.

Learning is a psychological phenomenon. Despite this, there has been relatively little research on applying established psychological principles in the design of autonomous agent learning algorithms for maze problems. Also, despite the fact that the maze environment problem has a long history of usage in learning research, there has been little analysis on the complexity of maze problems. Thus, there are two aims of this research: the first is to improve our understanding of nature and structure of maze environments; and the second is to apply some psychological principles to construct a new learning agent that can solve mazes better than existing algorithms.

First we introduce new metrics for classifying the complexity of mazes based on agent-independent and agent-dependent characteristics of maze environments. We analyze 50 mazes used in the literature by the metrics and then introduce 351 new maze environments, including these 271 of increased difficulty. The purpose of preparing the extensive set of maze environments is to provide a suitable evaluation environment for alternative learning agent architectures.

To fulfil our second goal we introduce the psychological model of Associative Perception Learning, integrate it into the Reinforcement Learning framework and define a new Learning Classifier System, AgentP. The system learns through a process of explicit imprinting and organization of images of the environment and uses a deterministic IDs system for the differentiation of aliasing squares. The rule structure of AgentP is similar to that used by ACS, in that a rule includes the initial and resulting state. However, AgentP differs from ACS and other LCS in several key ways:

  • Like ACS, AgentP employs a state-action-state rule structure and like XCSM and XCSMH it employs a memory structure (an ID/Fixing system). Unlike any previously proposed LCS, it uses these in conjunction.
  • Like ACS, AgentP attempts to perform differentiation to find out if the resulting state from a given initial state-action pair is predicted correctly and adjust rules accordingly. Unlike ACS, AgentP also performs a backward differentiation, in that it adjusts rules that predicted the observed resulting state, but incorrectly matched the initial state.
  • AgentP uses a distance based reward distribution mechanism.
  • AgentP does not attempt to learn generalizations of states. The system can be run in two different learning modes, Self-Adjusting, or labile, and Gradual, or conservative. The learning modes are inspired by the phenomenon of different types of the nervous system, specifically the mobility characteristic, observed in animals and humans.

References

  1. Bagnall, A.J. and Zatuchna, Z.V., On the Classification of Maze Problems, Applications of Learning Classifier Systems, Studies in, Edited by Bull, L. and Kovacs, T., Springer, pp. 307-316, 2005
  2. Zatuchna, Z.V., AgentP model: Learning Classifer System with Associative Perception, Proceedings of PPSN VIII, the 8th Parallel Problem Solving from Nature International Conference, pp. 1172-1182, 2004
  3. Zatuchna, Z.V. and Bagnall, A.J., Classifier System: Self-adjusting vs. Gradual Approach, Proceedings of the 2005 Congress on Evolutionary Computation, 2005 (Download PDF 203 KB)
  4. Zatuchna, Z.V. and Bagnall, A.J., A Reinforcement Learning Agent with Associative Perception, Symposium on Associative Learning and Reinforcement Learning at AISB'06: Adaptation in Artificial and Biological Systems,2006
  5. Zatuchna, Z.V. and Bagnall, A.J., Modelling of Temperament in an Associative Reinforcement Learning Agent,Symposium on Associative Learning and Reinforcement Learning at AISB'06: Adaptation in Artificial and Biological Systems, 2006 (Download PDF 197 KB)
  6. Zatuchna, Z.V., Towards the Axioms of Consciousness: Modelling the Rat in a Maze, Symposium on Machine Consciousness at AISB'06: Adaptation in Artificial and Biological Systems, 2006

Research Team

Dr Tony Bagnall, Mrs. Zhanna Zatuchnna