[Bristol CS | ISL | Tim Kovacs | Text ]

Strength or Accuracy: Credit Assignment in Learning Classifier Systems

Tim Kovacs

PhD Thesis, 2002. School of Computer Science. University of Birmingham. Birmingham, U.K. Supervisor: Manfred Kerber.

Abstract

Classifier systems are an intriguing approach to a broad range of machine learning problems, based on automated generation and evaluation of condition/action rules. In reinforcement learning tasks they simultaneously address the two major problems of learning a policy and generalising over it (and related objects, such as value functions). Despite over 20 years of research, however, classifier systems have met with mixed success, for reasons which were often unclear. Finally, in 1995 Stewart Wilson claimed a long-awaited breakthrough with his XCS system, which differs from earlier classifier systems in a number of respects, the most significant of which is the way in which it calculates the value of rules for use by the rule generation system. Specifically, XCS (like most classifier systems) employs a genetic algorithm for rule generation, and the way in which it calculates rule fitness differs from earlier systems. Wilson described XCS as an accuracy-based classifier system and earlier systems as strength-based. The two differ in that in strength-based systems the fitness of a rule is proportional to the return (reward/payoff) it receives, whereas in XCS it is a function of the accuracy with which return is predicted. The difference is thus one of credit assignment, that is, of how a rule's contribution to the system's performance is estimated. XCS is a Q-learning system; in fact, it is a proper generalisation of tabular Q-learning, in which rules aggregate states and actions. In XCS, as in other Q-learners, Q-values are used to weight action selection. In rule generation, a rule's weight is an inverse function of the error of its Q-value; thus XCS finds rules which minimise errors in the Q-function, and is able to accurately approximate it. Wilson argued XCS's superiority over earlier systems, and demonstrated improved performance on two tasks. However, the theoretical basis of XCS's results -- and indeed those of earlier systems -- needed development. Consequently, this thesis has been devoted to the development of theory to explain how and why classifier systems behave as they do. In particular, a theory underlying the difference between strength and accuracy-based fitness has been sought. In order to compare XCS with more traditional classifier systems, SB-XCS, XCS's strength-based twin, is introduced. SB-XCS is designed to resemble XCS as much as possible, except that it uses the Q-value of a rule as its weight in both action selection and rule generation. It is shown that this minor algorithmic change has profound and subtle implications on the way in which SB-XCS operates, and on its capacity to adapt. Two types of non-sequential learning task are identified, distinguished by whether the reward function is biased or unbiased. (An unbiased reward function is constant over the action(s) in each state which maximise reward.) It is argued that although SB-XCS can be expected to adapt to unbiased reward functions, its ability to adapt to biased reward functions is limited. Further, it is claimed that SB-XCS will only adapt well to trivial sequential decision tasks. SB-XCS's difficulties are attributed to two forms of pathological relationships between rules, termed strong and fit overgeneral rules. The analysis of XCS and SB-XCS presented herein supports the study of accuracy-based fitness, and fitness sharing in strength-based systems as directions for future work.