[Bristol CS
| ISL
| Tim Kovacs
| Text
]
Strength or Accuracy: Credit Assignment in Learning Classifier Systems
Tim Kovacs
PhD Thesis, 2002. School of Computer Science. University of Birmingham. Birmingham, U.K. Supervisor: Manfred Kerber.
Abstract
Classifier systems are an intriguing approach to a broad range of machine
learning problems, based on automated generation and evaluation of
condition/action rules. In reinforcement learning tasks they
simultaneously address the two major problems of learning a policy and
generalising over it (and related objects, such as value functions).
Despite over 20 years of research, however, classifier systems have met
with mixed success, for reasons which were often unclear. Finally, in 1995
Stewart Wilson claimed a long-awaited breakthrough with his XCS system,
which differs from earlier classifier systems in a number of respects, the
most significant of which is the way in which it calculates the value of
rules for use by the rule generation system. Specifically, XCS (like most
classifier systems) employs a genetic algorithm for rule generation, and
the way in which it calculates rule fitness differs from earlier systems.
Wilson described XCS as an accuracy-based classifier system and earlier
systems as strength-based. The two differ in that in strength-based
systems the fitness of a rule is proportional to the return
(reward/payoff) it receives, whereas in XCS it is a function of the
accuracy with which return is predicted. The difference is thus one of
credit assignment, that is, of how a rule's contribution to the system's
performance is estimated. XCS is a Q-learning system; in fact, it is a
proper generalisation of tabular Q-learning, in which rules aggregate
states and actions. In XCS, as in other Q-learners, Q-values are used to
weight action selection. In rule generation, a rule's weight is an inverse
function of the error of its Q-value; thus XCS finds rules which minimise
errors in the Q-function, and is able to accurately approximate it. Wilson
argued XCS's superiority over earlier systems, and demonstrated improved
performance on two tasks. However, the theoretical basis of XCS's results
-- and indeed those of earlier systems -- needed development.
Consequently, this thesis has been devoted to the development of theory to
explain how and why classifier systems behave as they do. In particular,
a theory underlying the difference between strength and accuracy-based
fitness has been sought. In order to compare XCS with more traditional
classifier systems, SB-XCS, XCS's strength-based twin, is introduced.
SB-XCS is designed to resemble XCS as much as possible, except that it
uses the Q-value of a rule as its weight in both action selection and rule
generation. It is shown that this minor algorithmic change has profound
and subtle implications on the way in which SB-XCS operates, and on its
capacity to adapt. Two types of non-sequential learning task are
identified, distinguished by whether the reward function is biased or
unbiased. (An unbiased reward function is constant over the action(s) in
each state which maximise reward.) It is argued that although SB-XCS can
be expected to adapt to unbiased reward functions, its ability to adapt to
biased reward functions is limited. Further, it is claimed that SB-XCS
will only adapt well to trivial sequential decision tasks. SB-XCS's
difficulties are attributed to two forms of pathological relationships
between rules, termed strong and fit overgeneral rules. The analysis of
XCS and SB-XCS presented herein supports the study of accuracy-based
fitness, and fitness sharing in strength-based systems as directions for
future work.