In this paper we present initial results from the use of Genetic Programming in Holland's learning Classifier System architecture. Rule are of the form condition(s)/action in which the conditions are represented as binary strings and the actions are represented by S-expressions. The basic Classifier System architecture used here in more analogous to Wilson's ZCS system than Holland's original formalism since no internal message list exists; stimulus-response rules are developed here. Using two well-known classification tasks it is shown that our approach can develop useful feature extractors for the K-nearest-neighbour algorithm. We also show that the use of niche-based evolutionary search can improve performance.
Ensemble techniques have proved to be very successful in boosting the performance of several types of machine learning methods. In this paper, we illustrate its usefulness in combination with GAssist, a Pittsburgh-style Learning Classifier System. Two types of ensembles are tested. First we evaluate an ensemble for consensus prediction. In this case several rule sets learnt using GAssist with different initial random seeds are combined using a flat voting scheme in a fashion similar to bagging. The second type of ensemble is intended to deal more efficiently with ordinal classification problems. That is, problems where the classes have some intrinsic order between them and, in case of misclassification, it is preferred to predict a class that is close to the correct one within the class intrinsic order. The ensemble for consensus prediction is evaluated using 25 datasets from the UCI repository. The hierarchical ensemble is evaluated using a Bioinformatics dataset. Both methods significantly improve the performance and behaviour of GAssist in all the tested domains.
Evolutionary computation has started to receive significant attention during the last decade, although the origins can be traced back to the late 1950's. This article surveys the history as well as the current state of this rapidly growing field. We describe the purpose, the general structure, and the working principles of different approaches, including genetic algorithms (GA) [with links to genetic programming (GP) and classifier systems (CS)], evolution strategies (ES), and evolutionary programming (EP) by analysis and comparison of their most important constituents (i.e., representations, variation operators, reproduction, and selection mechanism). Finally, we give a brief overview on the manifold of application domains, although this necessarily must remain incomplete.
The objective of this study is to synthesize a learning model capable of successful and effective operation in hard-to-model environments. Here, we are presenting a structurally simple and functionally flexible model. The model follows the learning patterns experienced by the humans. The novelty of the adaptive model lies on the knowledge base, dual learning strategy, and flexible reasoning. The knowledge based is allowed to grow for as long as the agent lives. Learning is brought about by the interaction between two qualitatively different activities leaving long-term and short-term marks on the behavior of the agent.
This work has no abstract.
This paper describes an autonomous adaptive agent model of the UK market in electricity, where the agents represent electricity generating companies. We briefly describe the UK market in electricity generation, then detail the simplifications we have made. Our current model consists of a single adaptive agent bidding against several nonadaptive agents. The adaptive agent uses a hierarchical agent structure with two Learning Classifier Systems to evolve market bidding rules to meet two objectives. We detail how the agent interacts with its environment, the particular problems this environment presents to the agent and the agent and classifier architectures we used in our experiments. We present the results and conclude that using our structure can improve performance.
A model of the UK market in electricity combining key factors influencing generator bidding is proposed and a hierarchical multi-objective adaptive agent architecture using case based reasoning and learning classifier systems is described. Experimentation shows that the adaptive agents learn bidding strategies that have been observed in the real world, and that in some market scenarios the agents appear to be learning the benefits of cooperating to receive increased long term rewards. The potential of the adaptive agent model is illustrated by experimentation with an alternative market structure.
Classifier systems are well tested vehicles for implementing genetic algorithms in machine learning environments. This paper presents a novel system architecture that transforms a classifier system's knowledge representation from message-based structures to self-organizing neural networks. These networks have been integrated with a classifier system to produce a Hybrid Learning System (HLS) that exhibits adaptive behaviour when driven by low level environmental feedback. Problems are represented within HLS as objects characterized by environmental features. Objects controlled by the system have preset goals set against a subset of their features and the system has to achieve these goals by developing a behavioural repertoire that efficiently explores and exploits the problem environment. Three types of knowledge structures evolve during this adaptive process: a cognitive map of useful regularities within the environment (encoded in a self-organizing network); classifier behaviour calibrated against feature states and targets (encoded in a set of self-organizing feature maps); a population of complex behaviours (evolved from a gene pool supplied as part of the initial problem specification).
Whilst the development of Learning Classifier Systems has produced excellent results in some fields of application, it has been widely noted that problems emerge when seeking to establish higher levels of knowledge (see Barry (1993) for relevant review). Tsotsos (1995) suggests that research into the operation of the Visual Cortex shows a hierarchical decomposition of processing more structured than a simple Subsumption Architecture arrangement. Whilst the LCS can provide both memory and planning by the use of tags an rule chains, it provides a flat rule space. Various approached have been taken to introducing structure to the LCS. We examine these approaches and identify three major lines of research: multiple interacting LCS, Single LCS with a structured population; and Structured Encoding of Rules. We illustrate that the first two of these areas have been interpreted quite differently, and seek to draw out common findings from the different approaches. We round off our examination of the area by a more detailed look at the work of Dorigo and Schnepf (1992), using a Hybrid Classifier System to examine the performance claims of Dorigo and Schnepf's architecture.
Whilst XCS (Wilson, 1998) has been shown to be more robust and reliable than previous LCS implementations (Kovacs, 1996, 1997), Lanzi(1997) identified a potential problem in the application of XCS to certain simple multi-step non Markovian environments. The 'Aliasing Problem' occurs when the environment provides the same message for two states in environmental positions that generate different constant payoffs. This prevents classifiers forming a correct payoff prediction for that message. This paper introduces a sub-class of the aliasing problem termed the 'Consecutive StateProblem' and uses the subclass to identify the effects of consecutive state aliasing on the learning of the State * Action * Payoff mapping within XCS. It is shown that aliasing states can prevent the formation of classifiers covering preceding states due to the trade-off of accuracy for match set occupancy made by the classifiers covering the aliasing states. This can be prevented by identifying a condition encoding which makes such match set ' piracy' improbable. However, under conditions of intense competition for population space where the classifier covering the aliased states cannot gain additional match set occupancy these classifiers will not be maintained within the population. Barry (1999) uses these findings to identify a solution to the Consecutive State Problem which is less heavyweight than the more general solution proposed by Lanzi (1997, 1998).
The 'Aliasing Problem' within XCS (Wilson,1995, 1998), first identified by Lanzi (1997), does not only appear whenever the aliased states occur in separate environmental locations but also when they occur consecutively (Barry,1999). Lanzi (1997, 1998) introduced a mechanism that could solve the Aliasing Problem through the use of memory mechanisms within XCS (Wilson, 1995; Cliff and Ross,1994). Whilst this mechanism is a solution to the general problem of aliasing, it is a heavyweight solution. By limiting the scope of a solution to the Consecutive State Problem, which is shown to be a sub-problem of the Aliasing Problem, a simpler solution is proposed, and is shown to adequately address this problem. The application of a potential solution utilising explicit action duration identification is discussed and shown to be inadequate both as a solution to the Consecutive State Problem and for more general use within XCS.
In investigating the Consecutive State Problem within XCS (Barry, 1999) it was suggested that a possible solution lay in allowing the XCS to persist with a single action over the aliased states. It was shown that this technique was sufficient to overcome the Consecutive State Problem as long as mechanisms were also provided which prevented the persistent application of 'Null Actions'. An alternative solution based on the work of Cobb and Grefenstette (1991) was discussed which sought to extend the action of each classifier so that each classifier could specify the duration that the action should be applied for. It was noted that this was an inadequate solution for the Consecutive State Problem because XCS would still explore the possibility of an action which persisted into but not beyond the aliased states. This work now applies these ideas to a number of non-aliased multiple step environments. It demonstrates that, given a suitable exploration strategy, action persistence can be utilised within XCS to enable the selection of a pathway to a reward state which entails the minimum number of different actions. It is also shown that a modification to the learning mechanism restores the ability of XCS to select the pathway to a reward state with the minimum number of steps whilst minimising the number of actions used.
Within Michigan-style Learning Classifier Systems based upon Holland's model (Holland et al 1986) support for learning in delayed-reward multiple-step environments was through the co-operation of classifiers within rule-chains. Despite the successful use of this LCS model in direct-reward environments (Wilson, 1985, 1987; Parodi and Bonelli, 1990; Holmes, 1997; Dorigo andColombetti, 1994), the application of LCS to delayed reward Markovian environments has been problematic. There is now a persuasive body of evidence that suggests that the use of strength as a fitness metric for the Genetic Algorithm (Kovacs and Kerber, 2000; Kovacs, 2000a), the use of rule-chains to establish multiple-step policies (Riolo, 1987b, 1989a; Forrest and Miller, 1990; Compiani et al, 1990), and the lack of mechanisms to encourage the development of co-operative populations (Booker, 1988; Smith, 1991; Smith and Golberg, 1991) all contribute to its inability within these environments. XCS (Wilson, 1995, 1998) presents solutions to each of these issues and initial results have shown the considerable promise of the approach (Kovacs, 1996, 1997; Lanzi, 1999c; Saxon and Barry, 1999a). In this work it is shown that whilst the XCS action-chaining mechanisms are effective for short action-chains, the combination of the use of discounted payoff and generalisation prevents XCS from learning optimal solutions in environments requiring even moderately sized action chains. In response it is hypothesised that the structuring of the solution, possibly hierarchically, can be used to reduce the required action chain length. A framework for hierarchical LCS research is proposed using a review of previous LCS hierarchical or structured approaches (Barry, 1993, 1996), and this work is compared to developments within the Reinforcement Learning community. Within a hierarchical solution low-level action chains may suffer when re-used if different payments are given to the action chains. An investigation into the Aliasing Problem (Lanzi, 1998a) reveals a subset of the problem, termed the Consecutive State Problem (Barry, 1999a), that will admit to a simple solution, which is empirically demonstrated (Barry, 1999b). It is shown that XCS is also able to learn the optimal state * action * duration * payoffmapping when a mechanism providing persistent actions is added (Barry, 2000), and that although this cannot be used as a solution to the aliasing problem it does provide a means of increasing the range of action chains. Two forms of preidentified hierarchical structures are introduced and it is shown that these allow multiple XCS instances to learn a hierarchical model that can be applied to operate successfully within environments requiring long action chains.
Paper is an extended abstract
A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called The Hayek Machine is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve more complex BW problems than any previous learning algorithm. Given intermediate reward and simple features, it has learned to efficiently solve arbitrary BW problems. The Hayek Machine can also be seen as a model of evolutionary economics.
Both symbolic and subsymbolic models contribute important insights to our understanding of intelligent systems. Classifier systems are low-level learning systems that are also capable of supporting representations at the symbolic level. In this paper, we explore in detail the issues surrounding the integration of programmed and learned knowledge in classifier-system representations, including comprehensibility, ease of expression, explanation predictability, robustness, redundancy, stability, and the use of analogical representations. We also examine how these issues speak to the debate between symbolic and subsymbolic paradigms. We discuss several dimensions for examining the tradeoffs between programmed and learned representations, and we propose an optimization model for constructing hybrid systems that combine positive aspects of each paradigm.
Connectionist networks and the Classifier System (CFS) provide two important examples of massively parallel knowledge representation systems for which successful learning algorithms have been developed. We present a construction that shows how the behavior of a large class of connectionist networks can be reproduced in a CFS. We then use this construction derive a version of the Rumelhart et al.'s back-propagation algorithm for the CFS, and conclude with remarks about critical differences between connectionist networks and the CFS that this analysis highlights.
This paper reports on an experiment of learning and forecasting on the foreign exchange market by means of an Artificial Intelligence methodology (a 'Classifier System') which simulates learning and adaptation in complex and changing environments. The experiment has been run for two different exchange rates, the US dollar-D mark rate and the US dollar-yen rate, representative of two possibly different market environments. A fictitious ''artificial agent'' is first trained on a monthly data base from 1973 to 1990, and then tested out-of-sample from 1990 to 1992. Its forecasting performance is then compared with the performance of decision rules which follow the prescription of various economic theories on exchange rate behaviour, and the performance of forecasts given by VAR estimations of the exchange-rate's determinants.
In this paper, we proposed an approach to a single-step Classifier System, in which the useful population is built by progressively specializing classifiers. It has been applied to a classification task in a medical domain. To permit the system to explore alternatives without making decisions earlier in learning stages, all the classifiers that might be selected are triggered and receive the resulting reward corresponding to their action. The payoff function involves the classifier's performance, its specificity and the system's performance (its robustness). Genetic operators are activated with a probability which depends on the system's robustness. During the test stages, no further learning takes place and the system's performance is measured by the percentage of correct classification made on the second set of examples. When the measure of performance is the highest, the population is stabilized and contains the correct classifiers (the payoff function and genetic operators have no more effect on classifiers). This approach achieves convergency more quickly and makes it possible to have a final accurate population without over-specializing.
We discuss some issues concerning the application of learning classifier systems to real-valued applications. In particular, we focus on the possibility of classifying data by crisp and fuzzy intervals, showing the effect of their granularity on the learning performance. We introduce the concept of sensorial cluster and we discuss the difference between cluster aliasing and perceptual aliasing. We show the impact of different choices on the performance of both crisp and fuzzy learning classifier systems applied to a mobile, autonomous, robotic agent.
We discuss some issues concerning the application of learning classifier systems to real-valued applications. In particular, we focus on the possibility of classifying data by crisp and fuzzy intervals, showing the effect of their granularity on the learning performance. We introduce the concept of sensorial cluster and we discuss the difference between cluster aliasing and perceptual aliasing. We show the impact of different choices on the performance of both crisp and fuzzy learning classifier systems applied to a mobile, autonomous, robotic agent.
In this paper, we discuss situations arising with reinforcement learning algorithms, when the reinforcement is delayed. The decision to consider delayed reinforcement is typical in many applications, and we discuss some motivations for it. Then, we summarize Q-Learning, a popular algorithm to deal with delayed reinforcement, and its recent extensions to use it to learn fuzzy logic structures (Fuzzy Q-Learning). Moreover, we present how a reinforcement learning algorithm we have developed in the past (ELF - Evolutionary Learning of Fuzzy rules) implements an extension of the popular Q-Learning algorithm for the distribution of delayed reinforcement when the controller to be learnt is a Fuzzy Logic Controller (FLC). Finally, we present some examples of the application of ELF to learning FLCs that implement behaviors for an autonomous agent.
We discuss the problem of learning fuzzy rules using Evolutionary Learning techniques, such as Genetic Algorithms and Learning Classifier Systems. We present ELF, a system able to evolve a population of fuzzy rules to obtain a sub-optimal Fuzzy Logic Controller. ELF tackles some of the problems typical of the Evolutionary Learning approach: competition and cooperation between fuzzy rules, evolution of general fuzzy rules, imperfect reinforcement programs, fast evolution for real-time applications, dynamic evolution of the focus of the search. We also present some of the results obtained from the application of ELF to the development of Fuzzy Logic Controllers for autonomous agents and for the classical cart-pole problem.
We have implemented a tool to compare different modules of Reinforcement Learning algorithms applied to Learning Classifier Systems (LCS). We focus on three main classes of modules: credit assignment modules, exploration policies, and evolutionary strategies. For each class we have implemented many of the proposals we can find in the literature and also some new algorithms that we have designed. In this paper, we present the results of the application of our tool to both fuzzy and crisp LCSs that learn behaviors for simulated autonomous agents. Fuzzy LCSs can be considered a successful approach to cope with real-valued input and output in a real environment. A lot of investigations can be done with this tool in this experimental setting. This paper is focused on the comparison among different credit assignment algorithms and on their performance in learning both crisp and fuzzy models. Our experiments show that the more complex credit assignment algorithms (such as, for instance, the TD(lambda) generally have better performance than the more basic (such as Q-learning or Bucket Brigade) also when applied to LCSs. Moreover, fuzzy LCSs seem to require a larger computational effort, but also show more robustness.
We present a class of Learning Classifier Systems that learn fuzzy rule-based models, instead of interval-based or Boolean models. We discuss some motivations to consider Learning Fuzzy Classifier Systems (LFCS) as a promising approach to learn mappings from real-valued input to real-valued output, basing on data interpretation implemented by fuzzy sets. We describe some of the approaches explicitly or implicitly referring to this research area, presented in literature since the beginning of the last decade. We also show how the general LFCS model can be considered as a framework for a wide range of systems, each implementing in a different way the modules composing the basic architecture. We also mention some of the applications of LFCS presented in literature, which show the potentialities of this type of systems. Finally, we introduce a general methodology to extend reinforcement distribution algorithms usually not designed to learn fuzzy models. This opens new application possibilities.
In this paper, we describe a Classifier System, Newboole, and we present its experimental comparison with two widely used learning algorithms, CN2 (logic reduction system) and Back Propagation (neural net), on three medical domains. The experimental results, obtained in the context of learning from preclassified examples, demonstrate two main points: firstly, that all three systems perform very closely on the induction tasks with a slight advantage for the Back Propagation algorithm. Secondly, that a Classifier System can provide comprehensive solutions in the form of a reasonable number of ``symbolic'' decision rules, which is not the case using Back Propagation.
Genetics based machine learning systems are considered by a majority of machine learners as slow rate learning systems. In this paper, we propose an improvement of Wilson's classifier system BOOLE that show how Genetics based machine learning systems learning rates can be greatly improved. This modification consists in a change of the reinforcement component. We then compare the respective performances of this modified BOOLE, called NEWBOOLE, and a neural net using back propagation on a difficult boolean learning task, the multiplexer function. The results of this comparison show that NEWBOOLE obtains significantly faster learning rates.
Classifier systems are massively parallel, message-passing, rule-based systems that learn through credit assignment (the bucket brigade algorithm) and rule discovery (the genetic algorithm). They typically operate in environments that exhibit one or more of the following characteristics: (1) perpetually novel events accompanied by large amounts of noisy or irrelevant data; (2) continual, often real-time requirements for action; (3) implicitly or inexactly defined goals; and (4) sparse payoff or reinforcement obtainable only through long action sequences. Classifier systems are designed to absorb new information continuously from such environments, devising sets of competing hypotheses (expressed as rules) without disturbing significantly capabilities already acquired. This paper reviews the definition, theory, and extant applications of classifier systems, comparing them with other machine learning techniques, and closing with a discussion of advantages, problems, and possible extensions of classifier systems.
Classifier systems must continuously infer useful categories and other generalizations in the form of classifier taxa from the steady stream of messages received and transmitted. This paper describes ways to use the genetic algorithm more effectively in discovering such patterns. Two issues are addressed. First, a flexible criterion is advocated for deciding when a message matches a classifier taxon. This is shown to improve performance over a wide range of categorization problems. Second, a restricted mating policy and crowding algorithm are introduced. These modifications lead to the growth and dynamic management of subpopulations correlated with the various pattern categories in the environment.
Most classifier systems learn a collection of stimulus-response rules, each of which directly acts on the problem-solving environment and accrues strength proportional to the overt reward expected from the behavioral sequences in which the rule participates. GOFER is an example of a classifier system that builds an internal model of its environment, using rules to represent objects, goals, and relationships. The model is used to direct behavior, and learning is triggered whenever the model proves to be an inadequate basis for generating behavior in a given situation. This means that overt external rewards are not necessarily the only or the most useful source of feedback for inductive change. GOFER is tested in a simple two-dimensional world where it learns to locate food and avoid noxious stimulation.
Recent work by Quinlan (1988) and Grefenstette (1988) has raised doubts about the ability of classifier systems to learn concepts or long temporal sequences of rules efficiently. This paper shows how the use of learning triggers can greatly increase the performance of a classifier system to the point that it compares much more favorably with other learning systems. We introduce a new classifier system called Gofer-1 that demonstrates how to trigger rule discovery in an effective manner.
Legitimate concerns have been raised about the expressive adequacy of the classifier language. This paper shows that many of those concerns stem from the inadequacies of the binary encodings typically used with classifier systems, not the classifier language per se. In particular, we describe some straightforward binary encodings for attribute-based instance spaces. These encodings give classifier systems the ability to represent ordinal and nominal attributes as expressively as most symbolic machine learning systems, without sacrificing the building blocks required by the genetic algorithm.
Paper is an extended abstract
Classifier systems have traditionally used explicit measures of utility (strength, predicted payoff, accuracy, etc.) to quantify the performance and fitness of classifier rules. Much of the effort in designing and implementing these systems has focused on getting these utilities ``right''. One alternative worth exploring is the idea of using endogenous fitness; that is, reinforcing successful performance with ``resources'' that rules need in order to reproduce. Under this regime, the best rules are those that accumulate the most resources over their lifetime and, consequently, have the most offspring. This paper describes a classifier system designed along these lines. Rules have no associated utility measure, just a resource reservoir. When enough resources have been accumulated, the rule reproduces and the reservoir is reduced. Preliminary tests of this system on the multiplexor problem show that it performs as well as utility based classifier systems such as XCS.
Paper is an extended abstract
Classifier systems have traditionally used explicit measures of utility (strength, predicted payoff, accuracy, etc.) to quantify the performance and fitness of classifier rules. Much of the effort in designing and implementing these systems has focused on getting these utilities ``right''. One alternative worth exploring is the idea of using endogenous fitness; that is, reinforcing successful performance with ``resources'' that rules need in order to reproduce. Under this regime, the best rules are those that accumulate the most resources over their lifetime and, consequently, have the most offspring. This paper describes a classifier system designed along these lines. Rules have no associated utility measure. Instead, each rule has one or more reservoirs that can be used to store resources. When enough resources have been accumulated, a rule utilizes some of its resources to reproduce and the reservoir level is reduced accordingly. Preliminary tests of this system on the multiplexor problem show that it performs as well as utility-based classifier systems such as XCS.
Previous work has shown the potential advantages of using endogenous fitness schemes in classifier systems. The basic idea behind endogenous fitness is to reinforce successful system performance with ``resources'' that rules need in order to reproduce. Instead of storing explicit quantitative estimates of performance, each rule has one or more reservoirs that are used to store resources. When enough resources have been accumulated, a rule utilizes some of its resources to reproduce and the reservoir level is reduced accordingly. This paper extends this concept to accommodate environments having delayed rewards. Reinforcement learning techniques for solving average-reward Markovian decision processes are combined with a simple endogenous fitness scheme in a classifier system. We describe initial tests of this approach on state space search problems used in previous classifier system studies.
The classifier system framework is a general-purpose approach to learning and representation designed to exhibit non-brittle behavior in complex, continually varying environments. Broadly speaking, classifier systems are expected to avoid brittle behavior because they implement processes that build and refine models of the environment. One of the most important of these processes is categorization. As Holland [5] has pointed out (p. 598) "Categorization is the system's major weapon for combating the enviironment's perpetual novelty. The system must readily generate categories for input messages, and it must be able to generate categories relevant to its internal processes". Research in classifier systems has focused almost exclusively on finding generalizations for input messages. However, generalizations of actions will also be required in order to build effective models of the environment. This paper introduces a new encoding for actions in classifier rules that lends itself to representing abstract actions.
While there has been some attention given recently to the issues of function approximation using learning classifier systems (e.g. [13, 3]), few studies have looked at the quality of the value function approximation computed by a learning classifier system when it solves a reinforcement learning problem [1, 8]. By contrast, considerable attention has been paid to this issue in the reinforcement learning literature [12]. One of the fundamental assumptions underlying algorithms for solving reinforcement learning problems is that states and state-action pairs have well-defined values that can be computed and used to help determine an optimal policy. The quality of those approximations is a critical factor in determining the success of many algorithms in solving reinforcement learning problems.
The aim of this project is to improve the quality and consistency of coiling in a steel hot strip mill. Signal processing will be used to gather information on the parameter characteristics of the mill downcoilers as an aid to operator and engineering decision making. The Artificial Intelligence (AI) paradigm of Learning Classifier Systems (LCS) is proposed for the signal processing.
The aim of this project is to improve the quality and consistency of coiling in a hot strip mill at British Steel Strip Products, Integrated Works. The Artificial Intelligence paradigm of Learning Classifier Systems is proposed for the processing of plant data. The stochastic computational technique of LCS will produce off-line rules to aid operator and engineering decision making. These rules link the plant inputs (plant condition, strip properties and associated variables) to coil outputs (presentation - including telescoping and pinching) in a form that is capable of being verified and validated. This is central to the initial operation, where on-line data will produce off-line rules that are critically evaluated by a human operator before implementation. Improvements to a basic LCS, that allow operation on industrial data, are detailed. Initial experimental results show that the technique of LCS has the potential to become a very useful tool for processing industrial data. Improvements in availability, coil presentation and ultimately customer satisfaction will result in a cost benefit to British Steel Plc.
The aim of this project is to improve the quality and consistency of coiling in a steel hot strip mill at British Steel Strip Products, Integrated Works. The artificial intelligence paradigm of learning classifier systems (LCS) is proposed for the processing of plant data. Improvements to a basic LCS, that allow operation on industrial data, are detailed. Initial experimental results show that the technique of LCS has the potential to become a very useful for processing industrial data. The stochastic computational technique will produce off-line rules to aid operator and engineering decision making. Improvements in availability, coil presentation and ultimately customer satisfaction will result in cost benefits to British Steel Plc.
This paper describes the development of an Industrial Learning Classifier System for application in the steel industry. The real domain problem was the prediction and diagnosis of product quality issues in a Steel Hot Strip Mill. The properties of the data from this environment include multi-modality (much parameter interaction), poor separation between fault levels and high dimensionality (many parameters). The method to develop the Learning Classifier System technique, based on deterministic simulated data, is presented. The advances made in the technique that assist in its functionality in this type of industrial environments are given. The novel methods developed are core to the Learning Classifier System technique and are not 'fixes' for given problems. They address the fitness measure, encoding alphabet, population scope, phases of training, genetic operators, life limits and removal of taxation schemes. These improvements allow the industrial LCS to function correctly in the simulated domain. Encouraging results from diagnosis of real data are presented; however, further work is needed for greater accuracy and to allow the prediction function to be used on-line. Learning Classifier Systems represent a potentially useful tool that combines the transparency of symbolic approaches (such as Decision Trees) with the learning ability of connectionist approaches (such as Artificial Neural Networks) to machine learning.
Learning Classifier Systems (LCS) have received considerable attention in the research community, yet few have been applied in practice. This paper describes the development of a LCS for monitoring data produced by a hot strip mill at British Steel Strip Products. The problems associated with applying a theoretical technique in a practical environment are discussed with particular attention being given to the pre processing of voluminous real data. The appropriate choice of alphabet for the LCS is also discussed, with a comparison of two alphabets, namely the ternary alphabet and the real numbered approach, being included.
The search for continual improvement in industry has identified the resource of plant data. Learning Classifier Systems (LCSs) were anticipated to be capable of exploiting plant data for cost benefit. The initial LCS performed poorly on simulated data as complex domains caused greedy and unstable performance. The aim of the project was to develop the LCS technique into a robust tool for application to industry. The utilisation of performance methods where appropriate, was achieved by splitting the LCS rule-base into the three training phases. Another advance was to separate the fitness measure into component functions, thus enabling optimal control of the LCS. Combining rule accuracy with the degree of domain match allowed the rule discovery to evenly search all niches of the rule base, whilst still exerting a generalisation pressure. Motivated by experiments with real data a morphing genetic operator to improve search rates, an evaluation limit to enable graceful improvements of hierarchies and a child limit to prevent convergence to a sub-optimum performance level were created. Implementing a real numbered alphabet simplified rule interpretation, automatically adjusted condition ranges to avoid aliasing and formed correct rule boundaries. Further simplification of the internal parameters removed all taxation, which greatly simplified the use of the industrial LCS. Optimum prediction and correct diagnosis of the complex simulated data was achieved. The real data sets from British Steel governed plant conditions and output quality. Diagnosis of the input-output relationships that could assist operators, engineers and managers was possible and contained encouraging results. However, inadequacies in data quality and the technique allowed only 80 prediction, which was insufficient confidence for plant predictive use. Although the LCS technique is still not fully developed, the effective learning, transparency and co-operation in rules has many potential benefits for industry.
In this paper we suggest a general approach to using the genetic algorithm (GA)[1] to evolve complex control systems. It has been shown [2] that although the GA may be used to evolve simple controllers, it is not able to cope with the evolution of controllers for more complex problems. We present an architecture of co-evolving communicating classifier systems [3] as a general solution to this, where the only restriction is that each classifier system is responsible for one simple behaviour. Thus the ecology of sub-problems evolves its own organisational structure at the same time its constituents evolve their solutions. Whether this structure ends up as a democratic soup, a hierarchy, or something in between, is determined by co-evolution rather than prescribed a priori by a creator. We use the trail following ``tracker task'' to compare the performance of a single classifier, responsible for the control of the whole system, evolved for this task with the performance of a co-evolved controller using our approach. The resulting interactions of the classifier system are also examined.
The use and benefits of self-adaptive mutation operators are well-known within evolutionary computing. In this paper we examine the use of self-adaptive mutation in Michigan-style Classifier Systems with the aim of improving their performance as controllers for autonomous mobile robots. Initially, we implement the operator in the ZCS classifier and examine its performance in two ``animat'' environments. It is shown that, although no significant increase in performance is seen over results presented in the literature using a fixed rate of mutation, the operator adapts to approximately this rate regardless of the initial range. The same operator is then implemented in the more sophisticated XCS classifier, with its performance examined on another animat task. Again it is shown that no real improvements in performance are obtained over previous results with a fixed mutation rate, but that the operator adapts to a suitable rate.
Nature is full of examples of both inter and intra species; from the workings of ant colonies to the cleaning symbiosis seen between the Pederson shrimp and the fish of the Bahamas. The fields of Artificial Intelligence and Artificial Life have consequently focused on these phenomenon as a means of dealing with complex systems in which agents must cooperate to achieve certain goals. In this thesis we examine the performance of the Genetic Algorithm when applied to systems of this type. That is, we examine the use of Evolutionary Computing techniques within cooperative multiagent environments. In the process we investigate some aspects of the natural phenomenon of symbiosis on which we base many elements of the work, in particular conditions under which various aspects of symbiotic associations occur. In extending the Genetic Algorithm to cooperative multiagent environments we introduce two macro-level operators (megamutations) to allow for greater integration between agents; the forming of hereditary endo-symbiosis and the horizontal transfer genes between such symbionts. Our results indicate that hereditary endo-symbiosis will form between agents evolving from within a window of the chaotic region of their attribute space and that gene transfer will occur from within a larger overlapping window. These operators are used within a generic rule-based framework, a simplified version of Pittsburgh-style Classifier Systems, which we alter to allow for direct systemic communication to evolve between the thus represented agents. We find that uninterpreted communication protocols will emerge between such agents using our framework. This work therefore contributes to the implementation of the Genetic Algorithm within complex systems.
This paper examines the performance of the ZCS Michigan-style classifier system in multi-agent environments. Using an abstract multi-agent model the effects of varying aspects of the performance, reinforcement and discovery components are examined. It is shown that small modifications to the basic ZCS architecture can improve its performance in environments with significant inter-agent dependence. Further, it is suggested that classifier systems have characteristics which make them more suitable to such non-stationary problem domains in comparison to other forms of reinforcement learning. Results from initial use of ZCS as an adaptive economic trading agent within an artificial double-auction market are then presented, with the findings from the abstract model shown to improve the efficiency of the traders and hence the overall market.
This paper presents results from on-going investigations into the performance of the Michigan-style classifier system in a complex multi-agent environment. Using a simplified model of a continuous double-auction market place the use of ZCS as an adaptive economic trading agent is examined. It is shown that a number of small changes to the basic system greatly improves its performance, resulting in improvements in the overall efficiency of the market. It is also shown that the role of the rule-discovery component of the classifier system is particularly critical in such a closely-coupled multi-agent environment.
Paper is an extended abstract
Paper is an extended abstract
Learning Classifier Systems use evolutionary algorithms to facilitate rule- discovery, where rule fitness is traditionally payoff based and assigned under a sharing scheme. Most current research has shifted to the use of an accuracy-based scheme where fitness is based on a rule’s ability to predict the expected payoff from its use. Learning Classifier Systems that build anticipations of the expected states following their actions are also a focus of current research. This paper presents a simple but effective learning classifier system of this last type, using payoff-based fitness, with the aim of enabling the exploration of their basic principles, i.e., in isolation from the many other mechanisms they usually contain. The system is described and modelled, before being implemented. Comparisons to an equivalent accuracy-based system show similar performance. The use of self-adaptive mutation in such systems in general is then considered.
Learning consists in the acquisition of knowledge. In Reinforcement Learning this is knowledge about how to reach a maximum of environmental reward. We are interested in the acquisition of knowledge that consists in having expectations of behavioral consequences. Behavioral consequences depend on the current situation, so it is necessary to learn in which situation S which behavior/reaction R leads to which behavioral consequences C. In other words, SRC units are learned. It was the psychologist Edward Tolman (1932) who firstly stated that animals learn SRC units. Seward (1949) proved that rats are able to learn in the absence of reward and confirmed Tolman's assumption. Learning in the absence of reinforcement is called `latent learning' and cannot be explained by usual reinforcement learning techniques. In the field of Learning Classifier Systems (LCS) latent learning is realized in Riolo's CFSC2 (Riolo, 1991) and Stolzmann's ACS (Stolzmann, 1997, 1998). Both authors prove the performance of their learning algorithms with a simulation of Seward's experiment. This experiment consists in a learning phase without any reward followed by a test phase where the rats have to use the knowledge they acquired during the learning phase to do action-planning. Action-planning and latent learning occur at different times. This paper focuses on the integration of action-planning and latent learning in ACS. Using an example about learning of the hand-eye co-ordination of a robot arm in conjunction with a camera it will be shown, that a combination of action-planning and latent learning in ACS induces a substantial reduction of the number of trials which are required to learn a complete model of a prototypically environment.
A concise description of the XCS classifier system's parameters, structures, and algorithms is presented as an aid to research. The algorithms are written in modularly structured pseudo code with accompanying explanations.
An Anticipatory Classifier System (ACS) is a learning mechanism based on learning classifier systems and the cognitive model of ``Anticipatory Behavioral Control''. By comparing perceived consequences with its own expectations (anticipations), an ACS is able to learn in multi-step environments. To date, the ACS has proven its abilities in various problems of that kind. It is able to learn latently (i.e. to learn without getting any reward) and it is able to distinguish between non-Markov states. Additionally, an ACS is capable of incrementally building a cognitive map that can be used to do action-planning. Although the ACS has proven to scale up in suitable environments, it depends on certain environmental properties. It believes itself to be the only agent that can change the perceptions received from an environment. Any environmental change is considered and believed to be caused by the executed actions. The ACS learns from the changes by using fixed mechanisms. This paper reveals the properties of an environment that the current ACS assumes to be given. By investigating the problems of the current ACS when violating these properties we believe that this investigation will immediately serve for a better understanding of the ACS and lead to many ideas to improve the current ACS. We will propose some ideas and discuss the important ones in more detail.
The anticipatory classifier system (ACS) combines the learning classifier system framework with the learning theory of anticipatory behavioral control. The result is an evolutionary system that builds an environmental model and further applies reinforcement learning techniques to form an optimal behavioral policy in the model. After providing some background as well as outlining the objectives of the system, we explain in detail all involved current processes. Furthermore, we analyze the deficiency of over-specialization in the anticipatory learning process (ALP), the main learning mechanism in the ACS. Consequently, we introduce a genetic algorithm (GA) to the ACS that is meant for generalization of over-specialized classifiers. We show that it is possible to form a symbiosis between a directed specialization and a genetic generalization mechanism achieving a learning mechanism that evolves a complete, accurate, and compact description of a perceived environment. Results in three different environmental settings confirm the usefulness of the genetic algorithm in the ACS. Finally, we discuss future research directions with the ACS and anticipatory systems in general.
The Anticipatory Classifier System (ACS) is a learning classifier system that is based on the cognitive mechanism of anticipatory behavioral control. Besides the common reward learning, the ACS is able to learn latently (i.e. to learn in an environment without getting any reward) which is not possible with reinforcement learning techniques. Furthermore, it forms a complete internal representation of the environment and thus it is able to use cognitive processes such as reasoning and planning. Latest research observed that the ACS is not generating accurate, maximally general rules reliably (i.e. rules which are accurate and also as general as possible), but it is sometimes generating over-specialized rules. This paper shows how a genetic algorithm can be used to overcome this present pressure of over-specification in the ACS mechanism with a genetic generalization pressure. The ACS works then as a hybrid which learns latently, forms a cognitive map, and evolves accurate, maximally general rules.
The Anticipatory Classifier System (ACS) is able to form a complete internal representation of an environment. Unlike most other classifier system and reinforcement learning approaches, it is able to learn latently (i.e. to learn in an environment without getting any reward) and to form an internal model of the perceived environment. After the observation that the model is not necessarily maximally general a genetic generalization pressure was introduced to the ACS. This paper focuses on the different mechanisms in the anticipatory learning process, which resembles the specification pressure, and in the genetic algorithm, which realizes the genetic generalization pressure. The capability of generating maximally general rules and evolving a completely converged population is investigated in detail. Furthermore, the paper approaches a first comparison with the XCS classifier system in different mazes and the multiplexer problem.
Recently, a genetic algorithm (GA) was introduced to the Anticipatory Classifier System (ACS) which surmounted the occasional problem of over-specification of rules. This paper investigates the resulting generalization capabilities further by monitoring in detail the performance of the ACS in the highly challenging multiplexer task. Moreover, by comparing the ACS to the XCS in this task it is shown that the ACS generates accurate, maximally general rules and its population converges to those rules. Besides the observed ability of latent learning and the formation of an internal environmental representation, this ability of generalization adds a new advantage to the ACS in comparison with similar approaches.
Paper is an extended abstract
Takes initial steps toward a theory of generalization and learning in the learning classifier system XCS. We start from Wilson's generalization hypothesis, which states that XCS has an intrinsic tendency to evolve accurate, maximally general classifiers. We analyze the different evolutionary pressures in XCS and derive a simple equation that supports the hypothesis theoretically. The equation is tested with a number of experiments that confirm the model of generalization pressure that we provide. Then, we focus on the conditions, termed "challenges," that must be satisfied for the existence of effective fitness or accuracy pressure in XCS. We derive two equations that suggest how to set the population size and the covering probability so as to ensure the development of fitness pressure. We argue that when the challenges are met, XCS is able to evolve problem solutions reliably. When the challenges are not met, a problem may provide intrinsic fitness guidance or the reward may be biased in such a way that the problem will still be solved. The equations and the influence of intrinsic fitness guidance and biased reward are tested on large Boolean multiplexer problems. The paper is a contribution to understanding how XCS functions and lays the foundation for research on XCS's learning complexity.
It has been shown empirically that the XCS classifier system solves typical classification problems in a machine learning competitive way. However, until now, no learning time estimate has been derived analytically for the system. This paper introduces a time estimate that bounds the learning time of XCS until maximally accurate classifiers are found. We assume a domino convergence model in which each attribute is successively specialized to the correct value. It is shown that learning time in XCS scales polynomially in problem length and problem complexity and thus in a machine learning competitive way.
We investigate rule matching in learning classifier systems for problems involving binary and real inputs. We consider three rule encodings: the widely used character-based encoding, a specificity-based encoding, and a binary encoding used in Alecsys. We compare the performance of the three algorithms both on matching alone and on typical test problems. The results on matching alone show that the population generality influences the performance of the matching algorithms based on string representations in different ways. Character-based encoding becomes slower and slower as generality increases, specificity-based encoding becomes faster and faster as generality increases. The results on typical test problems show that the specificity-based representation can halve the time required for matching but also that binary encoding is about ten times faster on the most difficult problems. Moreover, we extend specificity-based encoding to real-inputs and propose an algorithm that can halve the time require for matching real inputs using an interval-based representation.
Recent advances in XCS technology have shown that self-adaptive mutation can be highly useful to speed-up the evolutionary progress in XCS. Moreover, recent publications have shown that XCS can also be successfully applied to challenging real-valued domains including datamining, function approximation, and clustering. In this paper, we combine these two advances and investigate self-adaptive mutation in the XCS system for function approximation with hyperellipsoidal condition structures, referred to as XCSF in this paper. It has been shown that XCSF solves function approximation problems with an accuracy, noise robustness, and generalization capability comparable to other statistical machine learning techniques and that XCSF outperforms simple clustering techniques to which linear approximations are added. This paper shows that the right type of self-adaptive mutation can further improve XCSF's performance solving problems more parameter independent and more reliably. We analyze various types of self-adaptive mutation and show that XCSF with self-adaptive mutation ranges,differentiated for the separate classifier condition values, yields most robust performance results. Future work may further investigate the properties of the self-adaptive values and may integrate advanced self-adaptation techniques.
The XCS classifier system was developed by Wilson (1995). The learning mechanism is based on the accuracy of its reward prediction. This method leads to the formation of accurate most general classifiers. This paper explains how to download, compile and use the XCS code version 1.0 written in ANSI C. It discusses how to select various parameter settings, how to add and remove certain procedures in the XCS, how to apply the XCS in the multiplexer environment and diverse woods environments, and how to add code to apply the XCS in other environments. The code provides the mechanisms introduced by Wilson (1995) and the enhancements published by Wilson (1998).
The XCSJava 1.0 implementation of the XCS classifier system in Java is freely available from the IlliGAL anonymous ftp-site. The implementation covers the basic features of the XCS classifier system and provides a multiplexer and maze environment for testing purposes. This paper explains how to download, compile, and run the code. Moreover, it explains the object oriented approach in the implementation and the possible parameter manipulation as well as the environmental interface to hook in other test environments. Additionally to the source code, an executable package of the version as well as an XCSJava 1.0 API documentation is provided.
This book offers a comprehensive introduction to learning classifier systems (LCS) – or more generally, rule-based evolutionary online learning systems. LCSs learn interactively – much like a neural network – but with an increased adaptivity and flexibility. This book provides the necessary background knowledge on problem types, genetic algorithms, and reinforcement learning as well as a principled, modular analysis approach to understand, analyze, and design LCSs. The analysis is exemplarily carried through on the XCS classifier system – the currently most prominent system in LCS research. Several enhancements are introduced to XCS and evaluated. An application suite is provided including classification, reinforcement learning and data-mining problems. Reconsidering John Holland’s original vision, the book finally discusses the current potentials of LCSs for successful applications in cognitive science and related areas.
Genetic Algorithms and Classifier Systems are often used in biologic-like and evolutionary behaviors' simulations. The basic example is Wood7 Wilson's world. In this environment it is interesting to study some problems: Can evolve the cooperative behaviors of organisms present in the world? How and when do the behaviors evolve? Some preliminary results show the conditions under that cooperative behavior rules are developing rapidly. Particularly we have pointed out the likely of following observations: a. The cooperative behavior develops more easily if the initial population starts from the same point. b. It exists some thresholds under that the cooperative behavior can't evolve; these thresholds depend to the population size.
This paper describes a fuzzy classifier system using the Pittsburgh model. In this model genetic operations and fitness assignment apply to complete rule-sets, rather than to individual rules, thus overcoming the problem of conflicting individual and collective interests of classifiers. The fuzzy classifier system presented here dynamically adjusts both membership functions and fuzzy relations. A modified crossover operator for particular use in Pittsburgh-syle fuzzy classifier systems, with variable length rule-sets, is introduced and evaluated. Experimental results of the new system, which appear encouraging, are presented and discussed.
A fuzzy classifier system is described which explicitly represents time in the classifier syntax by augmenting individual classifiers with temporal tags. This feature allows the learning algorithm - in this case the genetic algorithm - to explore and exploit temporal features of the environment in which the classifier system might be expected to operate. The proposed temporal fuzzy classifier system is applied to a multi-agent distributed control task - adaptive distributed rooting in packet-switched communications networks.
Distributed routing control in telecommunication networks is a challenging problem. A networked assembly of geographically dispersed routing controllers are required to route traffic across the network in such a way so as to avoid congestion. Measured state information for each controller is delayed and necessarily available only on occasion. Interactions between routing controllers are highly non-linear and instability is a serious problem. A hybrid technique for distributing routing is proposed based on a synthesis of shortest-path routing, machine learning and fuzzy control. An architecture is described for a novel temporal fuzzy classifier system which forms the basis for each routing controller. Experimental results are presented which compare the new technique with two extant routing methods -- non-adaptive shortest-hop routing and adaptive shortest-path routing.
To manifest anticipatory behaviour that goes beyond simple stimulus-response, classifier systems must evolve internal reasoning processes based on couplings via internal messages. A major challenge that has been encountered in engendering internal reasoning processes in classifier systems has been the discovery and maintenance of long classifier chains. This paper proposes a modified version of the traditional classifier system, called the delayed action classifier system (DACS), devised specifically for learning of anticipatory or predictive behaviour. DACS operates by delaying the action (i.e. posting of messages) of appropriately tagged, matched classifiers by a number of execution cycles which is encoded on the classifier. Since classifier delays are encoded on the classifier genome, a GA is able to explore simultaneously the spaces of actions and delays. Results of experiments comparing DACS to a traditional classifier system in terms of the dynamics of classifier reinforcement and system performance using the bucket brigade are presented and examined. Experiments comparing DACS with a traditional classifier system, which appear encouraging, for a simple prediction problem are described and considered. Areas for further work using the delayed-action classifier notion are suggested and briefly discussed.
The systems of controlling and improving traffic movement have been studied for several years now. The usefulness of these systems is that they can modify and change the lights signals of traffic lights. It is not enough to intervene when the situation has reached a critical point such as a traffic jam. The system has to work out how the traffic will flow. The ideal solution would be a system that works out and foresees the situation on the roads based on a model of motorists' behaviour. This research shows how to best utilise the classifier systems so that it would be possible to create a model that is similar to that of the real world.
This paper describes experiments using multiple classifier system (CS) agents to play the iterated prisoner's dilemma (IPD) under various conditions. Our main interest is in how, and under what circumstances, cooperation is most likely to emerge through competition between these agents. Experiments are conducted with agents playing fixed strategies and other agents individually and in tournaments, with differing CS parameters. Performance improves when reward is stored and averaged over longer periods, and when a genetic algorithm (GA) is used more frequently. Increasing the memory of the system improves performance to a point, but long memories proved difficult to reinforce fully and performed less well.
A novel Linear Genetic Programming (LGP) paradigm called Genetic Parallel Programming (GPP) has been proposed to evolve parallel programs based on a Multi-ALU Processor. It is found that GPP can evolve parallel programs for Data Classification problems. In this paper, five binary-class UCI Machine Learning Repository databases are used to test the effectiveness of the proposed GPP-classifier. The main advantages of employing GPP for data classification are: 1) speeding up evolutionary process by parallel hardware fitness evaluation; and 2) discovering parallel algorithms automatically. Experimental results show that the GPP-classifier evolves simple classification programs with good generalization performance. The accuracies of these evolved classifiers are comparable to other existing classification algorithms.
Nowadays, many artificial intelligent trading models divided the process in three separate subprocesses: trading, validation and application, but these models cannot meet the request of today's trading environment. A new online learning algorithm, extended classifier system (XCS) is used in futures extended classifier trading mechanism (FXCTM) to satisfy traders' requirement. This paper verifies that FXCTM provides a very good forecast ability in future market trading performance. Also, this paper discusses about how the population set of XCS affects the result of the model. Finally, the simulation results show that this model could get an obvious profit from futures market.
This research attempts to integrate the existing ideas in two fields: reinforcement learning algorithms based on the methods of temporal differences (TD), in particular Q-learning, and genetics-based machine learning, in particular classifier systems (CS). Close relations between the bucket brigade credit assignment algorithm used in classifier systems and TD methods, several widely realized drawbacks of CS, and good theoretical properties of TD, gave the initial motivation for developing a learning architecture that would combine TD-based temporal credit assignment algorithms with genetics-based adaptive knowledge representation. This paper presents a simple instantiation of this idea, called GBQL (Genetics-Based Q-Learning). This learning architecture may be expected to be a promising alternative for stimulus-response classifier systems on one hand, and for the implementations of Q-learning using other knowledge representation methods (e.g., connectionist networks) on the other hand.
Classifier systems are genetics-based learning systems using the paradigm of reinforcement learning. In the most challenging case of delayed reinforcement, it involves a difficult temporal credit assignment problem. Standard classifier systems solve this problem using the bucket brigade algorithm. In this paper we show how to make the temporal credit assignment process faster by augmenting this algorithm by some refinements borrowed from a related field of reinforcement learning algorithms based on the methods of temporal differences (TD). These algorithms usually converge significantly faster if they are used in combination with TD(lambda). As a natural consequence of the easily noticeable similarity between the bucket brigade and TD(0), the BB(lambda) algorithm is derived, using the standard technique of eligibility traces. The TTD(lambda,m) procedure, which eliminates eligibility traces and implements an approximation of TD(lambda) in a computationally efficient way, has also been ported to the context of classifier systems, yielding the TBB(lambda,m) algorithm. The two resulting novel algorithms provide promising and, strangely enough, completely unexplored so far possibilities of making learning classifier systems learn faster under the conditions of reinforcement delay.
The reinforcement learning paradigm differs significantly from the traditional supervised learning paradigm. An agent in each particular input situation must generate an action. Then it receives a reinforcement value from the environment, providing a measure of the agent's performance. The task for the agent is to maximize the reinforcement values it receives in long term. Reinforcement learning agents are adaptive, reactive, and self-improving. To formulate a particular task as a reinforcement learning task one just has to design an appropriate reinforcement function, specifying the goal of the task. This makes the paradigm widely applicable, especially in such domains as game playing, automatic control, and robotics. The reinforcement value received by the agent at a particular time step may reflect the positive or negative consequences of actions taken several steps before. In order to deal with such delayed reinforcement one needs some algorithms for temporal credit assignment. This thesis concentrates on a class of algorithms based on Sutton's temporal differences (TD) methods. The AHC and Q-learning algorithms are well known instances of this class. The TTD procedure is proposed for the efficient and general implementation of this class of algorithms, as an alternative to the traditional so called eligibility traces implementation, which is found to suffer from both inefficiency and lack of generality. Important practical issues in using these algorithms are discussed. The problem of learning with multidimensional actions is addressed and a simple way to generalize appropriately TD-based algorithms is presented. It is argued that existing one-dimensional algorithms are hardly applicable to tasks with vector actions and that the proposed extensions, despite their simplicity, constitute a promising approach to this problem, though they require further work. A novel genetics-based reinforcement learning architecture is introduced. It combines Q-learning with genetics-based knowledge representation and rule discovery mechanisms. For the class of learning problems considered in this thesis, it can be a promising alternative to Holland's classifier systems with the bucket brigade temporal credit assignment algorithm. For all described algorithms experimental results are presented, illustrating their performance. Several important open problems in reinforcement learning are identified and directions for future research are outlined.
The paradigm of reinforcement learning provides an appealing framework for developing intelligent adaptive systems. The learner interacts with a possibly unknown and stochastic environment by observing its states and performing actions. It receives scalar reinforcement, or reward values, which provide a relative measure of the quality of the executed actions. The learner's task is to identify an optimal decision policy, i.e., a state-action mapping that leads to the maximization of the rewards it receives in the long term. Reinforcement values may be sparse and delayed with respect to the actions which contributed to them. A common approach to learning from delayed rewards is to use TD(lambda) methods for predicting future rewards in each state. Q-learning is currently the most popular and best theoretically understood TD-based reinforcement learning algorithm, but a variety of other related algorithms can be used. There have been a few impressive practical applications of reinforcement learning, but the existing algorithms still suffer from important deficiencies. This thesis examines possible ways of overcoming some of them, and thus making it easier to develop successful intelligent systems based on the reinforcement learning paradigm. Probably the most painful problem to be addressed is the relatively slow convergence of reinforcement learning algorithms. Although using TD(lambda>0) is known to usually give a considerable learning speedup, in practice TD(0) is still often used, because positive lambda increases the computational expense enormously, particularly for realistic tasks, with large state spaces. This is because TD(lambda) is implemented using eligibility traces, maintained and updated at each time step for all states. In this thesis the effects of the eligibility traces implementation are analyzed and an alternative implementation is derived, called the TTD procedure, which closely approximates TD(lambda) in a computationally efficient way, so that one can use lambda>0 at essentially the same cost as TD(0). This novel technique is theoretically shown to be approximately equivalent to, and empirically demonstrated to perform at least as well as eligibility traces, while it gives impressive computational savings. This is the major contribution of the dissertation around which the remaining contributions are concentrated. The theoretical analysis of TTD leads to a number of additional interesting results. The proposed technique is shown to be covered by the existing TD(lambda) convergence theory, by proving its error-reduction property. It is extended to variable lambda, to allow one to select lambda values adaptively, which has been suggested by some prior work. Reinforcement learning speedup techniques proposed by other authors, based on experience replay, are shown to be equivalent to special variable lambda forms of TTD. A TTD analog is derived for replacing eligibility traces, recently proposed by other authors and argued to have some important advantages over traditional, accumulating eligibility traces. Finally, a version of TTD is presented for average-reward reinforcement learning, alternative to the standard discounted-reward framework, adopted by this work. To apply reinforcement learning algorithms to tasks with large, especially continuous state spaces, it is usually necessary to combine them with learning function approximators to generalize over the state space. For a particular, but widely used class of function approximators, known as parameter-estimation methods, TTD is rederived in a gradient form and shown to be equivalent to the corresponding version of eligibility traces. Empirical results are presented for the combination of TTD and CMAC, a function approximator particularly well suited to reinforcement learning, which show that it learns successfully and requires much less computation than eligibility traces.
Different animals employ different strategies for sampling sensory data. The strategies are often closely constrained by environmental considerations, such as the animal's ecological niche. In animals that can see, differences in sampling strategy manifest themselves as differences in field of view and in spatially variant sampling (so-called ``foveal'' vision). In analysing adaptive behaviour in animals, or attempting to design autonomous robots, mechanisms for exploring variations in sensory sampling strategy will be required. This paper describes our work exploring a minimal system for investigating the effects of variations in patterns of sensory sampling. We have re-implemented Wilson's (1986) animat, and then experimented with altering its sensory sampling pattern (i.e. its sensory field). Empirical results are presented which demonstrate that alterations in the sensory field pattern can have a significant effect on the animat's observable behaviour (and hence also on the internal mechanisms which generate the behaviours). Analysis of our results involves characterising the interaction between the animat's sensory field and the environment within which the animat resides. We found that the animat's observed behaviour can, at least in part, be explained as a result of the animat cautiously moving in a manner which maximises the generation of new information from the environment over time. The paper concludes with a discussion of the generality of the results, and reflections on the prospects for further work.
In a recent paper, Wilson (1994b) described a `zeroth-level' classifier system (ZCS). ZCS employs a reinforcement learning technique comparable to Q-Learning (Watkins, 1989). This paper presents results from the first reconstruction of ZCS. Having replicated Wilson's results, we extend ZCS in a manner suggested by Wilson: the original formulation of ZCS has no memory mechanisms, but Wilson (1994b) suggested how internal `temporary memory' registers could be added. We show results from adding one-bit and two-bit memory registers to ZCS. Our results demonstrate that ZCS can efficiently exploit memory facilities in non-Markov environments. We also show that the memoryless ZCS can converge on near-optimal stochastic solutions in non-Markov environments. Following the discussion of adding memory, we present results from trials using ZCS in Markov environments requiring increasingly long chains of actions before reward is received. Our results indicate that inaccurate over-general classifiers can interact with the classifier-generation mechanisms to cause catastrophic breakdowns in overall system performance. Basing classifier fitness on accuracy may alleviate this problem. We conclude that the memory mechanism in its current form is unlikely to scale well for situations requiring large amounts of temporary memory. Nevertheless, the ability to find stochastic solutions when there is insufficient memory might offset this problem to some extent.
This paper explores the effect of explicitly searching for the persistence of each decision in a time-dependent sequential decision task. In prior studies, Grefenstette, et al, show the effectiveness of SAMUEL, a genetic algorithm-based system, in solving a simulation problem where an agent learn show to evade a predator that is in pursuit. In their work, an agent applies a control action at each time step. This paper examines a reformulation of the problem: the agent learns not only the level of response of a control action, but also how long to apply that control action. By examining this problem, the work shows that it is appropriate to choose a representation of the state space that compresses time information when solving a time-dependent sequential decision problem. By compressing time information, critical events in the decision sequence become apparent.
Learning plays a vital role in the development of situated agents. In this paper, we explore the use of reinforcement learning to "shape" a robot to perform a predefined target behavior. We connect both simulated and real robots to Alecsys, a parallel implementation of a learning classifier system with an extended genetic algorithm. After classifying different kinds of Animat-like behaviors, we explore the effects on learning of different types of agent's architecture (monolithic, flat and hierarchical) and of training strategies. In particular, hierarchical architecture requires the agent to learn how to coordinate basic learned responses. We show that the best results are achieved when both the agent's architecture and the training strategy match the structure of the behavior pattern to be learned. We report the results of a number of experiments carried out both in simulated and in real environments, and show that the results of simulations carry smoothly to real robots. While most of our experiments deal with simple reactive behavior, in one of them we demonstrate the use of a simple and general memory mechanism. As a whole, our experimental activity demonstrates that classifier systems with genetic algorithms can be practically employed to develop autonomous agents.
This paper is concerned with training an agent to perform sequential behavior. In previous work we have been applying reinforcement learning techniques to control a reactive robot. Obviously, a pure reactive system is limited in the kind of interactions it can learn. In particular, it can only learn what we call pseudo-sequences, that is sequences of actions in which the transition signal is generated by the appearance of a sensorial stimulus. We discuss the difference between pseudo-sequences and proper sequences, and the implication that these differences have on training procedures. A result of our research is that, in case of proper sequences, for learning to be successful the agent must have some kind of memory; moreover it is often necessary to let the trainer and the learner communicate. We study therefore the influence of communication on the learning process. First we consider trainer-to-learner communication introducing the concept of reinforcement sensor, which let the learning robot explicitly know whether the last reinforcement was a reward or a punishment; we also show how the use of this sensor induces the creation of a set of error recovery rules. Then we introduce learner-to-trainer communication, which is used to disambiguate indeterminate training situations, that is situations in which observation alone of the learner behavior does not provide the trainer with enough information to decide if the learner is performing a right or a wrong move. All the design choices we make are discussed and compared by means of experiments in a simulated world.
This paper is concerned with training an agent to perform sequential behavior. In previous work we have been applying reinforcement learning techniques to control a reactive agent. Obviously, a pure reactive system is limited in the kind of interactions it can learn. In particular, it can learn what we call pseudo-sequences, that is sequences of actions in which each action is selected on the basis of current sensory stimuli; on the contrary, it cannot learn proper sequences, in which actions have to be selected also on the basis of some internal state. Moreover, it is a result of our research that effective learning of proper sequences is improved by letting the agent and the trainer communicate. First we consider trainer-to-agent communication, introducing the concept of reinforcement sensor, which lets the learning robot explicitly know whether the last reinforcement was a reward or a punishment; we also show how the use of this sensor makes error recovery rules emerge. Then we introduce agent-to-trainer communication, which is used to disambiguate ambiguous training situations, that is situations in which the observation of the agent's behavior does not provide the trainer with enough information to decide whether the agent's move is right or wrong. We also show an alternative solution of the problem of ambiguous situations, which involves learning to coordinate behavior in a simpler, unambiguous setting, and then transferring what has been learnt to a more complex situation. All the design choices we make are discussed and compared by means of experiments in a simulated world.
Classifier systems are discussed as high-dimensional dynamical systems. Their learning abilities and long term behavior are analyzed in a letter prediction task domain. We find that the system can develop different types of solutions, sometimes heavily relying on its dynamical properties. A taxonomy of the system solutions is outlined, and some problems due to the activity of the genetic operators are discussed, as well as ways to solve or alleviate them.
Classifier systems are rule-based adaptive systems whose learning capabilities emerge from processes of selection and competition within a population of rules (classifiers). These processes are ruled by the values of numerical variables which measure the fitness of each rule. The system's adaptivity is ensured by a fitness reallocation mechanism (the bucket brigade algorithm) and by genetic algorithms which are responsible for the internal dynamics of the system. In this paper we discuss classifier systems as dynamical systems, the main focus being on the asymptotic dynamics due to the bucket brigade abstracting from the action of the genetics. This topic is discussed with reference to a specific task domain, in which the system is used as a detector of statistical properties of periodic or fluctuating external environments. We also describe a major consequence of the genetics on the bucket brigade dynamics, namely the proliferation of individual rules into subpopulations of equivalent classifiers: we then show that this can eventually lead to undesired stochastic behavior or to the destabilizatiion of correct solutions devised by the system.
This paper describes an application of genetic algorithms (GA's) to classify epidemiological data, which is often challenging to classify due to noise and other factors. For such complex data (that requires a large number of very specific rules to achieve a high accuracy), smaller rule sets, composed of more general rules, may be preferable, even if they are less accurate. The GA presented here allows the user to encourage smaller rule sets by setting a parameter. The rule sets found are also compared to those created by standard decision-tree algorithms. The results illustrate tradeoffs involving the number of rules, descriptive accuracy, predictive accuracy, and accuracy in describing and predicting positive examples across different rule sets.
In this paper we describe the genetic algorithms and fuzzy logic, focusing them as tools to model control processes and to design intelligent and automatic control systems. We describe the application of genetic algorithms to design fuzzy logic controllers, as well as the learning classifier systems and their development in a fuzzy environment, the fuzzy learning classifier systems.
In recent years, a great number of publications have explored the use of genetic algorithms as a tool for designing fuzzy systems. Genetic Fuzzy Systems explores and discusses this symbiosis of evolutionary computation and fuzzy logic. The book summarizes and analyzes the novel field of genetic fuzzy systems, paying special attention to genetic algorithms that adapt and learn the knowledge base of a fuzzy-rule-based system. It introduces the general concepts, foundations and design principles of genetic fuzzy systems and covers the topic of genetic tuning of fuzzy systems. It also introduces the systems: the Michigan, Pittsburgh and Iterative-learning methods. Finally, it explores hybrid genetic fuzzy systems such as genetic fuzzy clustering or genetic neuro-fuzzy systems and describes a number of applications from different areas. Genetic Fuzzy System represents a comprehensive treatise on the design of the fuzzy-rule-based systems using genetic algorithms, both from a theoretical and a practical perspective. It is a valuable compendium for scientists and engineers concerned with research and applications in the domain of fuzzy systems and genetic algorithms.
Learning classifier systems (LCSs) have existed for nearly twenty years (Holland & Reitman, 1978). Research efforts in reinforcement learning (RL), evolutionary computation (EC), and neural networks have enhanced the original LCS paradigm. New thoughts from these areas have created a ``renaissance'' period for the LCS. This paper highlights some key LCS advancements and the fields that inspired them. One inspiration, from neural networks, is examined for a novel LCS approach to autonomous mobile robots. A simple, LCS-controlled robot simulation is presented. This simulation shows the potential benefits of combined biological paradigms and the hybridization of ideas in the LCS. Future directions for LCS reseach are discussed.
The learning classifier system (LCS) is an application of the genetic algorithm to machine learning. Artificial neural networks (ANNs) perform mappings of input vectors to outputs much the same way a LCS does. This chapter introduces the LCS paradigm and provides literature references for future investigation. Through the use of LCS principles, an ANN becomes a variable structure production system, capable of making complex input-output mappnigs that are similar to a LCS. The evolutionary process of a single ANN facilitates a broad understanding of how evolution may help rule-based (or neuron-based) systems. An evolutionary approach to ANN structure is reviewed. Its similarities to the LCS are discussed. A simple extension to Smith and Cribbs' (1994) and Cribbs' (1995) work in an ANN and LCS analogy is presented. The experiment presented removes the nonlinerarity of the ANN's output layer to assess the nonlinear effects of the GA's partitioning within the hidden layer. The results indicate that GA-induced nonlinearity actively participates in the solution of a difficult Boolean problem -- the six multiplexor problem.
Neural networks are machine learning systems based on simple, localized responses to external stimuli. They can respond to the same stimuli that classifier systems respond to, and they alter their internal structure on the basis of reinforcement from an external source. The learning techniques used by researchers in the neural network field have traditionally been quite different from those used by genetic algorithm researchers, however. The tension between these similarities and differences have led researchers to wonder what the formal relationship between the two systems is. This is one of two papers showing that there is a sense in which these two types of machine learning systems are equivalent. In a companion paper, it is shown that any classifier system may be transformed into a neural network that is isomorphic in function. In this paper, it is shown that any neural network can be transformed into a classifier system that is isomorphic in function, although several modifications must be made to standard classifier system practice for this transformation to work. The present paper also considers a different transformation procedure described by Belew and Gherrity that accomplishes this task in a different way. The paper concludes with a discussion of these transformation procedures and their import.
Paper is an extended abstract
Classifier systems are sub-symbolic or dynamic approaches to machine learning. These systems have been studied rather extensively. In this thesis some theoretical results about the long-term behaviour and the computational abilities of classifier systems are derived. Then some experiments are undertaken. The first experiment entails the implementation of a simple logic function, a multiplexer in a simple classifier system. It is shown that this task can be learned very well. The second task that is taught to the system is a mushroom-classification problem that has been researched with other learning systems. It is shown that this task can be learned. The last problem is the parity problem. First it is shown that this problem does not scale linearly with its number of bits in a straightforward classifier system. An attempt is made to solve it with a multilayer classifier-system, but this is found to be almost impossible. Explanations are given of why this should be the case. Then some thought is given to analogies between classifier systems and neural networks. It is indicated that there are mappings between certain classifier systems and certain neural networks. It is suggested that this is a main concern for future classifier systems research.
In this article, we explore the use of genetic algorithms (GAs) as a key element in the design and implementation of robust concept learning systems. We describe and evaluate a GA-based system called GABIL that continually learns and refines concept classification rules from its interaction with the environment. The use of GAs is motivated by recent studies showing the effects of various forms of bias built into different concept learning systems, resulting in systems that perform well on certain concept classes (generally, those well matched to the biases) and poorly on others. By incorporating a GA as the underlying adaptive search mechanism, we are able to construct a concept learning system that has a simple, unified architecture with several important features. First, the system is surprisingly robust even with minimal bias. Second, the system can be easily extended to incorporate traditional forms of bias found in other concept learning systems. Finally, the architecture of the system encourages explicit representation of such biases and, as a result, provides for an important additional feature: the ability to dynamically adjust system bias. The viability of this approach is illustrated by comparing the performance of GABIL with that of four other more traditional concept learners (AQ14, C4.5, ID5R, and IACL) on a variety of target concepts. We conclude with some observations about the merits of this approach and about possible extensions.
Genetic algorithms represent a class of adaptive search techniques that have been intensively studied in recent years. Much of the interest in genetic algorithms is due to the fact that they provide a set of efficient domain-independent search heuristics which are a significant improvement over traditional ``weak methods'' without the need for incorporating highly domain-specific knowledge. There is now considerable evidence that genetic algorithms are useful for global function optimization and NP-hard problems. Recently, there has been a good deal of interest in using genetic algorithms for machine learning problems. This paper provides a brief overview of how one might use genetic algorithms as a key element in learning systems.
SEAGUL (Sensitive Evolutionary Adaptable Genetic Unique Learner) is a genetic algorithm that creates production rules that pick the winners of horse races. SEAGUL uses data from the Daily Racing Form, a newspaper that is found at all race tracks and is available to the general public, to generate these rules. SEAGUL deviates from orthodox genetic algorithms in several areas. It has a pre-defined procedure for generating the initial population, it creates inviolable components that cannot be modified through mutation, it does not use the bucket brigade algorithm, and it optimizes its rule set by analyzing variables individually and then collectively.
Paper is an extended abstract
This work describes a control architecture based on a hierarchical classifier system. This architecture, which uses both reactive and planning rules, implements a motivationally autonomous animat that chooses the actions it will perform according to the expected consequences of the alternatives. The adaptive faculties of this animat are illustrated through various examples.
This paper describes how an animat endowed with the MonaLysa control architecture can build a cognitive map that merges into a hierarchical framework not only topological links between landmarks, but also higher-level structures, control information, and metric distances and orientations. The paper also describes how the animat can use such a map to locate itself, even if it is endowed with noisy dead-reckoning capacities. MonaLysa's mapping and self-positioning capacities are illustrated by results obtained in three different environments and four noise-level conditions. These capacities appear to be gracefully degraded when the environment grows more challenging and when the noise level increases. In the discussion, the current approach is compared to others with similar objectives, and directions for future work are outlined.
This work describes a control architecture based on a hierarchical classifier system. This system, which learns both reactive and planning rules, implements a motivationally autonomous animat that chooses the actions it performs according to its perception of the external environment, to its physiological or internal state, to the consequences of its current behavior, and to the expected consequences of its future behavior. The adaptive faculties of this architecture are illustrated within the context of a navigation task, through various experiments with a simulated and a real robot.
This paper describes how the MonaLysa control architecture implements a route-following navigation strategy. Two procedures that allow map building and self-positioning are described, and experimental results are provided that demonstrate that such procedures are robust with respect to noise. This approach is compared to others with similar objectives, and directions for future work are outlined.
This thesis is centered on MonaLysa, the control architecture of a motivationally autonomous animat. This architecture implements a motivational system that selects the actions and goals of an artificial agent, according to its internal state, the stimuli from the environment, and its evaluation of the long term consequences of its behavioral choices. This architecture is based on an original hierarchical classifier system, that efficiently learns several action plans and builds an internal representation of the animat's environment. The functionalities of MonaLysa are illustrated within the context of the navigation of a simulated animat and of a real robot. In the first part of this work, the animat has to reach a goal and to avoid the obstacles it encounters on its way. We demonstrate that MonaLysa can efficiently learn a general reactive behaviour, notably because it can dynamically change its current goal when the animat encounters an obstacle. Moreover, MonaLysa exploits its interactions with the environment to learn alternative plans and to deduce an optimal path towards its goal; it is also able to modify the organization of its plans so as to adapt to environmental changes. In the second part of this work, the animat has to explore its environment, when various amounts of noise are added to the normal functioning of its odometric sensors. In this context, MonaLysa is able to learn a reliable spatial representation of its environment, while maintaining a correct estimate of its position. This spatial representation is very robust with respect to noise and can adapt to any environment. The generality of the approach presented herein opens the way to many applications, which are outlined at the end of this work.
Reinforcement Learning is a class of problems in which an autonomous agent acting in a given environment improves its behavior by progressively maximizing a function calculated just on the basis of a succession of scalar responses received from the environment. Q-learning and classifier systems (CS) are two methods among the most used to solve reinforcement learning problems. Notwithstanding their popularity and their shared goal, they have been in the past often considered as two different models. In this paper we first show that the classifier system, when restricted to a sharp simplification called discounted max very simple classifier system (DMAX-VSCS), boils down to tabular Q-learning. It follows that DMAX-VSCS converges to the optimal policy as proved by Watkins & Dayan (1992), and that it can draw profit from the results of experimental and theoretical works dedicated to improve Q-learning and to facilitate its use in concrete applications. In the second part of the paper, we show that three of the restrictions we need to impose to the CS for deriving its equivalence with Q-learning, that is, no internal states, no don't care symbols, and no structural changes, turn out so essential as to be recently rediscovered and reprogrammed by Q-learning adepts. Eventually, we sketch further similarities among ongoing work within both research contexts. The main contribution of the paper is therefore to make explicit the strong similarities existing between Q-learning and classifier systems, and to show that experience gained with research within one domain can be useful to direct future research in the other one.
Learning plays a vital role in the development of situated agents. In this paper, we explore the use of reinforcement learning to shape a robot to perform a predefined target behavior. We connect both simulated and real robots to ALECSYS, a parallel implementation of a learning classifier system with an extended genetic algorithm. After classifying different kinds of Animat-like behaviors, we explore the effects on learning of different types of agent's architecture (monolithic, flat and hierarchical) and of training strategies. In particular, hierarchical architecture requires the agent to learn how to coordinate basic learned responses. We show that the best results are achieved when both the agent's architecture and the training strategy match the structure of the behavior pattern to be learned. We report the results of a number of experiments carried out both in simulated and in real environments, and show that the results of simulations carry smoothly to real robots. While most of our experiments deal with simple reactive behavior, in one of them we demonstrate the use of a simple and general memory mechanism. As a whole, our experimental activity demonstrates that classifier systems with genetic algorithms can be practically employed to develop autonomous agents.
Intelligent robots should be able to use sensor information to learn how to behave in a changing environment. As environmental complexity grows, the learning task becomes more and more difficult. We face this problem using an architecture based on learning classifier systems and on the structural properties of animal behavioural organization, as proposed by ethologists. After a description of the learning technique used and of the organizational structure proposed, we present experiments that show how behaviour acquisition can be achieved. Our simulated robot learns to follow a light and to avoid hot dangerous objects. While these two simple behavioural patterns are independently learnt, coordination is attained by means of a learning coordination mechanism. Again this capacity is demonstrated by performing a number of experiments
A major problem with learning systems is how to tackle real world problems. A distinctive characteristic of many real world problems is that they present a complexity that cannot be ``user-defined'', and which is generally orders of magnitude higher than in toy systems. The use of more powerful, parallel machines, is a way to attack this problem from two sides: through an increase in the performance of standard algorithms, and by design of a new structural organization of the learning system -- organization that should allow a better control on the environmental complexity. In order to explore these potentialities we have built a tool, ALECSYS, that can be used to implement parallel learning classifier systems in a modular fashion. In ALECSYS parallelism is used both to increase the system performance, by what we call low-level parallelization, and to allow the use of many different learning classifier systems simultaneously, by what we call high-level parallelization. In the paper we first present the system organization and the algorithms used, then we report some simulation results and finally we give some hints for further work.
A major problem with learning systems is how to tackle real world problems. A distinctive characteristic of many real world problems is that they present a complexity that cannot be user-defined, and which is generally orders of magnitude higher than in toy systems. The use of more powerful, parallel machines, is a way to attack this problem from two sides: through an increase in the performance of standard algorithms, and by design of a new structural organization of the learning system - organization that should allow a better control on the environmental complexity. In order to explore these potentialities we have built a tool, ALECSYS, that can be used to implement parallel learning classifier systems in a modular fashion. In ALECSYS parallelism is used both to increase the system performance, by what we call low-level parallelization, and to allow the use of many different learning classifier systems simultaneously, by what we call high-level parallelization. In the paper we first present the system organization and the algorithms used, then we report some simulation results and finally we give some hints for further work.
It is well known that standard learning classifier systems, when applied to many different domains, exhibit a number of problems: payoff oscillation, difficulty in regulating interplay between the reward system and the background genetic algorithm (GA), rule chains' instability, default hierarchies' instability, among others. ALECSYS is a parallel version of a standard learning classifier system (CS) and, as such, suffers from these same problems. In this paper we propose some innovative solutions to some of these problems. We introduce the following original features. Mutespec is a new genetic operator used to specialize potentially useful classifiers. Energy is a quantity introduced to measure global convergence to apply the genetic algorithm only when the system is close to a steady state. Dynamic adjustment of the classifiers set cardinality speeds up the performance phase of the algorithm. We present simulation results of experiments run in a simulated two-dimensional world in which a simple agent learns to follow a light source.
In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.
Learning Classifier Systems (LCS) consist of the three components: function approximation, reinforcement learning, and classifier replacement. In this paper we formalize the function approximation part, by providing a clear problem definition, a formalization of the LCS function approximation architecture, and a definition of the function approximation aim. Additionally, we provide definitions of optimality and what conditions need to be fulfilled for a classifier to be optimal. As a demonstration of the usefulness of the framework, we derive commonly used algorithmic approaches that aim at reaching optimality from first principles, and introduce a new Kalman filter-based method that outperforms all currently implemented methods, in addition to providing further insight into the probabilistic basis of the localized model that a classifier provides. A global function approximation in LCS is achieved by combining the classifier's localized model, for which we provide a simplified approach when compared to current LCS, based on the Maximum Likelihood of a combination of all classifiers. The formalizations in this paper act as the foundation of a currently actively developed formal framework that includes all three LCS components, promising a better formal understanding of current LCS and the development of better LCS algorithms.
In this paper we promote a new methodology for designing LCS that is based on first identifying their underlying model and then using standard machine learning methods to train this model. This leads to a clear identification of the LCS model and makes explicit the assumptions made about the data, as well as promises advances in the theoretical understanding of LCS through transferring the understanding of the applied machine learning methods to LCS. Additionally, it allows us, for the first time, to give a formal and general, that is, representation-independent, definition of the optimal set of classifiers that LCS aim at finding. To demonstrate the feasibility of the proposed methodology we design a Bayesian LCS model by borrowing concepts from the related Mixtures-of-Experts model. The quality of a set of classifiers and consequently also the optimal set of classifiers is defined by the application of Bayesian model selection, which turns finding this set into a principled optimisation task. Using a simple Pittsburgh-style LCS, a set of preliminary experiments demonstrate the feasibility of this approach.
This book provides a comprehensive introduction to the design and analysis of Learning Classifier Systems (LCS) from the perspective of machine learning. LCS are a family of methods for handling unsupervised learning, supervised learning and sequential decision tasks by decomposing larger problem spaces into easy-to-handle subproblems. Contrary to commonly approaching their design and analysis from the viewpoint of evolutionary computation, this book instead promotes a probabilistic model-based approach, based on their defining question "What is an LCS supposed to learn?". Systematically following this approach, it is shown how generic machine learning methods can be applied to design LCS algorithms from the first principles of their underlying probabilistic model, which is in this book -- for illustrative purposes -- closely related to the currently prominent XCS classifier system. The approach is holistic in the sense that the uniform goal-driven design metaphor essentially covers all aspects of LCS and puts them on a solid foundation, in addition to enabling the transfer of the theoretical foundation of the various applied machine learning methods onto LCS. Thus, it not only advances the analysis of existing LCS but also puts forward the design of new LCS within that same framework.
This work has no abstract.
We present a probabilistic formulation of UCS (a sUpervised Classifier System). UCS is shown to be a special case of mixture of experts where the experts are learned independently and later combined during prediction. In this work, we develop the links between the constituent components of UCS and a mixture of experts, thus lending UCS a strong analytical background. We find during our analysis that mixture of experts is a more generic formulation of UCS and possesses more generalization capability and flexibility than UCS, which is also verified using empirical evaluations. This is the first time that a simple probabilistic model has been proposed for UCS and we believe that this work will form a useful tool to analyse Learning Classifier Systems and gain useful insights into their working.
The estimation of the rule usefulness in a classifier system is faced to the credit-apportionment problem. Usually, the apportioning of payoffs process is performed by the bucket brigade algorithm. However, some works have shown that this algorithm presents some difficulties. Generally, the condition part of a rule is defined on an alphabet containing a ``don't care'' symbol. That is why a same rule can be fired in different contexts. In such conditions, it is impossible to use too generalized classifiers because of the incoherence of the strength management. The solution we propose here, can solve the problem: general classifiers belonging to a success-ending sequence are dynamically specialized. In order not to store all the sequence actions, the Bucket Brigade algorithm is applied to the new-created rule specificity. So, the closer a classifier is from the end of the solution sequence, the more specific it is. This new algorithm is presented here and applied to an autonomous moving robot which must learn how to move in an environment with obstacles.
For new, potentially improved rules that is, the search performed by a classifier system's genetic algorithm is guided by the relative strength of the rules in the extant rule base. This paper identifies three general types of rule whose presence in a plan can affect the relative strength of rules in a rule base and thereby provide the potential to compromise the effectiveness of the genetic algorithm. The nature and extent of relative strength distortion is investigated and a method to combat the distortion which involves adaptation of the standard bucket brigade, is proposed.
Symbolic knowledge representation schemes have been suggested as one way to improve the performance of classifier systems in the context of complex, real-world problems. The main reason for this is that unlike the traditional binary string representation, high-level languages facilitate the exploitation of problem specific knowledge. However, the two principle genetic operators, crossover and mutation, are, in their basic form, ineffective with regard to discovering useful rules in such representations. Moreover, the operators do not take into account any environmental cues which may benefit the rule discovery process. A further source of inefficiency in classifier systems concerns their capacity for forgetting valuable experience by deleting previously useful rules. In this paper, solutions to both these problems are addressed. First, in respect of the suitability of crossover and mutation, a new set of operators, specifically tailored for a high level language, are proposed. Moreover, to alleviate the problem of forgetfulness, an approach based on the way some enzyme systems facilitate the repair of genes in biological systems, is investigated.
The search for novel and useful patterns within large databases, known as data mining, has become of great importance owing to the ever-increasing amounts of data collected by large organizations. In particular the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classicial statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. The results are compared with those obtained by other approaches.
The term connectionism is usually applied to neural networks. There are, however, many other models that are mathematically similar, including classifier systems, immune networks, autocatalytic chemical reaction networks, and others. In view of this similarity, it is appropriate to broaden the term connectionism. I define a connectionist model as a dynamical system with two properties: (1) The interactions between the variables at any given time are explicitly constrained to a finite list of connections. (2) The connections are fluid, in that their strength and/or pattern of connectivity can change with time. This paper reviews the four examples listed above and maps them into a common mathematical framework, discussing their similarities and differences. It also suggests new applications of connectionist models, and poses some problems to be addressed in an eventual theory of connectionist systems.
NEXTPITCH, a learning classifier system using genetic algorithms, inductively learns to predict the next note in a childhood melody. Just as a listener develops expectations of what is to follow based on what has been heard, NEXTPITCH models human music learning by developing the rules that represent actual pitch transitions in the melody. This paper introduces our representation of music and our description of classifier formats. Our results are correlated by analysis of variance statistical routines. Information theory is used to partition melodies into classes so that we may examine the applicability of the results from one set of melodies to another.
This paper addresses the question of the impact of the representation of a domain on the performance in a learning classifier system. NEXTPITCH, a learning classifier system using genetic algorithms, inductively learns to predict the next note of a Bach chorale. This paper presents an analyses of different representations of specific features of Western tonal music. Our results are correlated by analyses of variance statistical routines.
NEXTPITCH, a learning classifier system (LCS) using genetic algorithms, inductively learns to predict the next note in a melody. This paper addresses the issues of (1) the impact of the representation of a domain on the performance of an LCS and (2) the classification of the input to an LCS in order to determine performance.
In the field of mathematical modelling of economic phenomena analytical models, assuming the existence of a stable equilibrium, are very popular. It is often expected, that the quantities defining the state of the system gradually approach to the equilibrium, and remain there in the following. Moreover, it is often assumed, that the interacting individuals behave perfectly rational --- i.e they always take those decisions, that maximize their utility. Such approaches, however, have only limited suitability for modelling economic systems subject to a permanent and succesively accelerated change. Moreover, they are only partially capable to describe individuals that do not only possess intelligence but also emotions. Therefore, the theory of complex adaptive systems applies computer simulations whose implementations does not necessaryly depend on the existence of equilibria or perfectly ratinal agents. At the beginning of this thesis the necessity and the practicality of the employment of complex adaptive systems for describing recent economic happenings is discussed. Then methods that are qualified to implement such systems --- in particular classifier systems and genetic algorithms --- are being explained. In the following some examples are provided to illustrate possibilities and also restrictions of the usage of such procedures. Based on this elaborations two comprehensive economic models are formulated. The first model is about the problem of communication within a firm developing a new product. In big enterprises it is often a big challenge to establish an effective and efficient flow of information. Moreover, in distributed decision making processes conflicting objectives may occur. Many different groups of employees are cooperating in the process of designing a new product. The tasks reqired to successfully introduce a new product involve employees from market research, engineering, scheduling, maintenance, and many others. These people may possess different viewpoints and different technical languages. To solve this problem citep *hauser suggested a communication scheme called ``House of Quality''. This thesis introduces a simulation based on the ``House of Quality'', which was implemented in MATLAB. The decision makers are implemented as classifier systems, and apply genetic algorithms to learn a meaningful solution. To evaluate the rules generated by this adaptive learning process, the obtained results are compared with the results gained by full enumeration. It turns out that the genetic algorithms indeed create pretty good decision rules. These rules illustrate how the responsible individuals should react to the encountered situations. The second model examines a heterogenous market of goods that can be substituted for each other. The task of the learning agents is to place there products in the market such that the number of customers who buy their products is maximized. Four classes of different agents occur in this market. Two of these groups contain learning agents. The first group observes the positions (= the needs) of the customers directly. The second group observes the positions of the suppliers and their profits. The decisions for the next planning period are based on these observations. Additionally, there is a group of suppliers placing their products according to a ``random walk'', and another supplier who always takes over the position of the most successful seller of the previous period. To compare these strategies three different behaviour patterns of the demand side are taken into consideration: i) static, ii) cyclic, and iii) random walk. To allow accurate conclusions about the relation between customer behaviour and the success of a certain selling strategy, all the customers exhibit the same behaviour pattern within one particular simulation. It turned out, that in the cases i) and iii) the strategy of imitating the most successful seller is optimal --- under the assumption, that only one supplier follows that strategy. If the customers behave according to ii), on the other hand, the agents observing the customers directly are the winners.
Genetic algorithms are generating a great deal of interest today. They take their inspiration from the ways plants and animals evolve. Developed by John Holland, genetic algorithms use the ideas and language of genetics -- genes, chromosomes, mutations, etc. Holland's unique method for performing adaptive searches has received a great deal of attention, and numerous applications which make use of genetic algorithms have been developed. The algorithm behind evolution solves the problem of producing species able to thrive in particular environments. This same genetic algorithm can solve many other kinds of problems as well. It forms the basis of an emerging field of computer programming that could challenge expert systems. Research is under way on adapting the genetic algorithm to such applications as running pump stations on oil pipelines, eliminating distortion from x-ray images, and designing very-large-scale integrated (VLSI) computer chips. This paper begins with an overview of genetic algorithms, followed by a summary of the history of their development. The next three sections of the paper discuss the major areas of genetic algorithm research: applications of genetic algorithms, theoretical work, and classifier systems. Next, genetic algorithms are compared to other related methods, and their strengths are discussed. The final section discusses the future of genetic algorithms.
This project simulates a simple commodity economy in which artificially intelligent adaptive agents learn to trade with one another. Two versions of the economy are studied in detail. The first consists of three types of agent and three types of commodity, and the second version includes the addition of `fiat money'. The agents make decisions using a classifier system, capable of being rewarded and punished according to the relative success of the economic strategies generated. The economic environment that the agents inhabit is taken from Marimon et al(1989) but a different classifier system is used, to investigate whether an alternative implementation affects the results of the simulation. The results were not fully replicated, and the differences between the two implementations are analysed, giving rise to novel observations about the environment and the original research.
We consider; the role of selectionist reinforcement learning in classifier systems; the fuzzy matching and activation of rules, and the evolution of communication within and between classifier systems.
This paper describes how genetic algorithms are being used to engineer control and classification systems for industrial and commercial applications. Using multiple burner combustion control and credit risk assessment as examples it illustrates how expert human knowledge can be complemented by searching large amounts of data using genetic algorithms in knowledge-based machine learning systems. It goes on to discuss recent research on parallel and distributed genetic algorithms aimed at tackling large complex problems.
It is shown how co-evolving populations of individual rules can outperform evolving a population of complete sets of rules with the genetic algorithm in learning classifier systems. A rule-based control system is presented which uses only the genetic algorithm for learning individual control rules with immediate reinforcement after the firing of each rule. How this has been used for an industrial control problem is described as an example of its operation. The refinement of the system to deal with delayed reward is presented and its operation on the cart-pole balancing problem described. A comparison is made of the performance of the refined system using only selection and mutation to learn individual rules with that of the genetic algorithm to learn a complete set of rules. A comparison is also made of the performance of the refined system using only selection to learn individual rules with that of the bucket-brigade and other reinforcement algorithms on the same task.
The paper presents examples of emergent behavior in classifier systems, focusing on symbolic reasoning and learning. These behaviors are related to global dynamical properties such as state cycles, basins of attraction, and phase transitions. A mapping is defined between classifier systems and an equivalent dynamical system (Boolean networks). The mapping provides a way to understand and predict emergent classifier system behaviors by observing the dynamical behavior of the Boolean networks. The paper reports initial results and discusses the implications of this approach for classifier systems.
Paper is an extended abstract
Machine rule induction was examined on a difficult categorization problem by applying a Holland-style classifier system to a complex letter recognition task. A set of 20,000 unique letter images was generated by randomly distorting pixel images of the 26 uppercase letters from 20 different commercial fonts. The parent fonts represented a full range of character types including script, italic, serif, and Gothic. The features of each of the 20,000 characters were summarized in terms of 16 primitive numerical attributes. Our research focused on machine induction techniques for generating IF-THEN classifiers in which the IF part was a list of values for each of the 16 attributes and the THEN part was the correct category, i.e., one of the 26 letters of the alphabet. We examined the effects of different procedures for encoding attributes, deriving new rules, and apportioning credit among the rules. Binary and Gray-code attribute encodings that required exact matches for rule activation were compared with integer representations that employed fuzzy matching for rule activation. Random and genetic methods for rule creation were compared with instance-based generalization. The strength/specificity method for credit apportionment was compared with a procedure we call ``accuracy/utility.''
This document has no abstract.
This paper details our attempt to find control knowledge of multi-input systems using a Fuzzy Classifier System (FCS). Simulations are done to show that the FCS can find fuzzy rules for collision avoidance in steering a ship. This paper presents new payoffs and credits for building antecedent parts of fuzzy rules which have truth values larger than zero and for finding fuzzy control rules which achieve the collision avoidance steering. The results show that the FCS can discover fuzzy rules for the multi-input system.
A description of the problems, successes and failures encountered whilst attempting to encourage a Classifier System to learn to play a simple board game well. Classifier Systems are a kind of free-for-all Production Rule System where the pattern-matching rules compete on the basis of their (modifiable) strength values, and the population of rules is altered by a Genetic Algorithm. They have shown promise in problems where there is very little specific, (i.e. useful) information available from the environment, and the internal adjustments proceed without explicit direction from the environment (or the programmer). In this thesis an attempt is made to `coerce' a variant of Goldberg's Simple Classifier System to `learn' how to play a simple board game (called Dodgems). Various options were tried , among them were: different internal representations, adding more powerful move operators, forcing every move to be valid, and others... The results, whilst not startling, do indicate increased performance with the use of the enhanced move operators over the initial representations. Larger population sizes appear to be beneficial. Also, there is a discussion of the problems involved in choosing the relevant data to study the internal workings of the Classifier System.
Paper is an extended abstract
MACS (Modular Anticipatory Classifier System) is a new Anticipatory Classifier System. With respect to its predecessors, ACS, ACS2 and YACS, the latent learning process in MACS is able to take advantage of new regularities. Instead of anticipating all attributes of the perceived situations in the same classifier, MACS only anticipates one attribute per classifier. In this paper we describe how the model of the environment represented by the classifiers can be used to perform active exploration and how this exploration policy is aggregated with the exploitation policy. The architecture is validated experimentally. Then we draw more general principles from the architectural choices giving rise to MACS. We show that building a model of the environment can be seen as a function approximation problem which can be solved with Anticipatory Classifier Systems such as MACS, but also with accuracy-based systems like XCS or XCSF, organized into a Dyna architecture.
A new and original trend in the learning classifier system (LCS) framework is focussed on latent learning. These new LCSs call upon classifiers with a (condition), an (action) and an (effect) part. In psychology, latent learning is defined as learning without getting any kind of reward. In the LCS framework, this process is in charge of discovering classifiers which are able to anticipate accurately the consequences of actions under some conditions. Accordingly, the latent learning process builds a model of the dynamics of the environment. This model can be used to improve the policy learning process. This paper describes YACS, a new LCS performing latent learning, and compares it with ACS.
A Holland classifier system is an adaptive, general purpose machine learning system which is designed to operate in noisy environments with infrequent and often incomplete feedback. Examples of such environments are financial markets, stock management systems, or chemical processes. In financial markets, a Holland classifier system would develop trading strategies, in a stock management system order heuristics, and in a chemical plant it would perform process control. In this paper we describe a Holland classifier system and present the implementation of its components, namely the production system, the bucket brigade algorithm, the genetic algorithm, and the cover detector, cover effector and triggered chaining operator. Finally, we illustrate the working of a Holland classifier system by learning to find a path with a high payoff in a simple finite state world.
This book integrates fuzzy rule-languages with genetic algorithms, genetic programming, and classifier systems with the goal of obtaining fuzzy rule-based expert systems with learning capabilities. The main topics are first introduced by solving small problems, then a prototype implementation of the algorithm is explained, and last but not least the theoretical foundations are given. The second edition takes into account the rapid progress in the application of fuzzy genetic algorithms with a survey of recent developments in the field. The chapter on genetic programming has been revised. An exact uniform initialization algorithm replaces the heuristic presented in the first edition. A new method of abstraction, compound derivations, is introduced.
PANIC (Parallelism And Neural networks in Classifier systems), an Evolutionary Rule Based System (ERBS) to evolve behavioral strategies codified by sets of rules, is presented. PANIC assigns credit to the rules through a new mechanism, Q-Credit Assignment (QCA), based on Q-learning. By taking into account the context where a rule is applied, QCA is more accurate than classical methods when a single rule can fire in different situations. QCA is implemented through a multi-layer feed-forward neural network.
PANIC (Parallelism And Neural networks In Classifier Systems) is a parallel system to evolve behavioral strategies codified by sets of rules. It integrates several adaptive techniques and computational paradigms, such as genetic algorithms, neural networks, temporal difference methods and classifier systems, to define a powerful and robust learning system. To allocate credit to rules, we propose a new mechanism, Q-Credit Assignment (QCA), based on the temporal difference method Q-learning. To overcome the sharing rule problem, posed by traditional credit assignment strategies in rule based systems, QCA evaluates a rule depending on the context where it is applied. The mechanism is implemented through a multi-layer, feed-forward neural network. To overcome the heavy computational load of this approach, a decentralized and asynchronous parallel model of the genetic algorithm for massive parallel architecture has been devised.
A Classifier System is used to learn control and profit optimisation of a batch chemical reaction. Ability to learn different market conditions and changes to reaction parameters is demonstrated. The Profit Sharing algorithm is used for Apportionment of Credit. The greater effectiveness of the use of the genetic algorithm over Apportionment of Credit alone or the random replacement of low strength rules is also shown. The Classifier System is unusual in having more than one action per rule.
Genetic Algorithms have been proposed by many authors for Machine Learning tasks. In fact, they are appealing for several different reasons, such as the flexibility, the great exploration power, and the possibility of exploiting parallel processing. Nevertheless, it is still controversial whether the genetic approach can really provide effective solutions to learning tasks, in comparison to other algorithms based on classical search strategies. In this paper we try to clarify this point and we overview the work done with respect to the task of learning classification programs from examples. The state of the art emerging from our analysis suggests that the genetic approach can be a valuable alternative to classical approaches, even if further investigation is necessary in order to come to a final conclusion.
This paper describes REGAL, a distributed genetic algorithm-based system, designed for learning First Order Logic concept descriptions from examples. The system is a hybrid between the Pittsburgh and the Michigan approaches, as the population constitutes a redundant set of partial concept descriptions, each evolved separately. In order to increase effectiveness, REGAL is specifically tailored to the concept learning task; hence, REGAL is task-dependent, but, on the other hand, domain-independent. The system proved to be particularly robust with respect to parameter setting across a variety of different application domains. REGAL is based on a selection operator, called Universal Suffrage operator, provably allowing the population to asymptotically converge, in average, to an equilibrium state, in which several species coexist. The system is presented both in a serial and in a parallel version, and a new distributed computational model is proposed and discussed. The system has been tested on a simple artificial domain, for the sake of illustration, and on several complex real-world and artificial domains, in order to show its power, and to analyze its behavior under various conditions. The results obtained so far suggest that genetic search may be a valuable alternative to logic-based approaches to learning concepts, when no (or little) a priori knowledge is available and a very large hypothesis space has to be explored.
The increasing amount of information available is encouraging the search for efficient techniques to improve the data mining methods, especially those which consume great computational resources, such as evolutionary computation. Efficacy and efficiency are two critical aspects for knowledge-based techniques. The incorporation of knowledge into evolutionary algorithms (EAs) should provide either better solutions (efficacy) or the equivalent solutions in shorter time (efficiency), regarding the same evolutionary algorithm without incorporating such knowledge. In this paper, we categorize and summarize some of the incorporation of knowledge techniques for evolutionary algorithms and present a novel data structure, called efficient evaluation structure (EES), which helps the evolutionary algorithm to provide decision rules using less computational resources. The EES-based EA is tested and compared to another EA system and the experimental results show the quality of our approach, reducing the computational cost about 50%, maintaining the global accuracy of the final set of decision rules.
Paper is an extended abstract
This paper juxtaposes the probability matching paradox of decision theory and the magnitude of reinforcement problem of animal learning theory to show that simple classifier system bidding structures are unable to match the range of behaviors required in the deterministic and probabilistic problems faced by real cognitive systems. The inclusion of a variance-sensitive bidding (VSB) mechanism is suggested, analyzed, and simulated to enable good bidding performance over a wide range of nonstationary probabilistic and deterministic environments.
This work has no abstract
Nickel cadmium batteries are an important source of power for aerospace applications. One such application is being developed at the Marshall Space Flight Center (MSFC) for use with the Hubble Space Telescope. A battery testbed has been built at MSFC to aid in that development. In addition, the Nickel Cadmium Battery Expert System (NICBES) was developed by Martin Marietta Corporation to assist NASA engineers in battery management. This paper describes an extension to NICBES which will make it more effective as a battery management tool. The extension involves the incorporation of classifier system machine learning techniques into a subsystem of NICBES. The principal reason for suggesting this extension is the nature of battery management itself. There is still much which is unknown about these batteries and the factors affecting their performance [2]. Hence, battery management might be said to be as much an art as a science and relies heavily on the expertise of the battery manager. NICBES is an attempt to incorporate that battery expertise into an expert system. One difficulty, however, is that battery behavior is likely to change over time in unforseen ways. This detracts from the usefulness of the expert system. Consequently, the battery manager who is using NICBES as a tool would be required to make changes to the expert system in order to accomodate the changed parameters of battery behavior. This should be the function of the knowledge engineer, however, not the battery expert. This is an example of the familiar problem of knowledge acquisition in knowledge engineering. The solution presented here is to use machine learning techniques to help overcome the knowledge acquisition problem. The expert system then interacts at a high level with the battery manager and undertakes adaptation on itself in order to determine new rules conforming to the changed parameters of the power system. The basic principles of learning classifier systems based on genetic algorithms will be presented first. Next, a brief description of NICBES will be given, particularly the advice subsystem to which the learning component will be added. A discussion of specific techniques by which machine learning can be incorporated into this particular rule-based expert system will follow. This discussion will come under the headings of the bit-string representation of rules, the initial rule population, an evaluation function for this system, and the genetic operators. Finally, some comments will be made concerning the implementation of a user interface for a system such as this.
Symbolic induction is a promising approach to constructing decision models by extracting regularities from a data set of examples. The predominant type of model is a classification rule (or set of rules) that maps a set of relevant environmental features into specific categories or values. Classifying loan risk based on borrower profiles, consumer choice from purchase data, or supply levels based on operating conditions are all examples of this type of model-building task. Although current inductive approaches, such as ID3 and CN2, perform well on certain problems, their potential is limited by the incremental nature of their search. Genetic algorithms (GA) have shown great promise on complex search domains, and hence suggest a means for overcoming these limitations. However, effective use of genetic search in this context requires a framework that promotes the funamental model-building objectives of predictive accuracy and model simplicity. In this article we describe COGIN, a GA-based inductive system that exploits the conventions of induction from examples to provide this framework. The novelty of COGIN lies in its use of training set coverage to simultaneously promote competition in various classification niches within the model and constrain overall model complexity. Experimental comparisons with NewID and CN2 provide evidence of the effectiveness of the COGIN framework and the viability of the GA approach.
Promoting and maintaining diversity is a critical requirement of search in learning classifier systems (LCSs). What is required of the genetic algorithm (GA) in an LCS context is not convergence to a single global maximum, as in the standard optimization framework, but instead the generation of individuals (i.e., rules) that collectively cover the overall problem space. COGIN (COverage-based Genetic INduction) is a system designed to exploit genetic recombination for the purpose of constructing rule-based classification models from examples. The distinguishing characteristic of COGIN is its use of coverage of training set examples as an explicit constraint on the search, which acts to promote appropriate diversity in the population of rules over time. By treating training examples as limited resources, COGIN creates an ecological model that simultaneously accommodates a dynamic range of niches while encouraging superior individuals within a niche, leading to concise and accurate decision models. Previous experimental studies with COGIN have demonstrated its performance advantages over several well-known symbolic induction approaches. In this paper, we examine the effects of two modifications to the original system configuration, each designed to inject additional diversity into the search: increasing the carrying capacity of training set examples (i.e., increasing coverage redundancy) and increasing the level of disruption in the recombination operator used to generate new rules. Experimental results are given that show both types of modifications to yield substantial improvements to previously published results.
Evolutionary learning methods have been found to be useful in several areas in the development of intelligent robots. In the approach described here, evolutionary algorithms are used to explore alternative robot behaviors within a simulation model as a way of reducing the overall knowledge engineering effort. This paper presents some initial results of applying the SAMUEL genetic learning system to a collision avoidance and navigation task for mobile robots.
The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical decision rules from a simple flight simulator. The learning method relies on the notion of competition and employs genetic algorithms to search the space of decision policies. Several experiments are presented that address issues arising from differences between the simulation model on which learning occurs and the target environment on which the decision rules are ultimately tested.
Genetic algorithms assign credit to building blocks based on the performance of the knowledge structures in which they occur. If the knowledge structures are rules sets, then the bucket brigade algorithm provides a means of performing additional credit assignment at the level of individual rules. This paper explores one possibility for using the fine-grained feedback provided by the bucket brigade in genetic learning systems that manipulate sets of rules.
In rule discovery systems, learning often proceeds by first assessing the quality of the system's current rules and then modifying rules based on that assessment. This paper addresses the credit assignment problem that arises when long sequences of rules fire between successive external rewards. The focus is on the kinds of rule assessment schemes which have been proposed for rule discovery systems that use genetic algorithms as the primary rule modification strategy. Two distinct approaches to rule learning with genetic algorithms have been previously reported, each approach offering a useful solution to a different level of the credit assignment problem. We describe a system, called RUDI, that exploits both approaches. We present analytic and experimental results that support the hypothesis that multiple levels of credit assignment can improve the performance of rule learning systems based on genetic algorithms.
A system called SAMUEL is described for learning rules to control a process, given only a weak model of the process consisting of a set of sensors, a set of control variables, and feedback mechanism that provides intermittent performance measurements. This paper focuses on features that distinguish SAMUEL from previous systems for learning rules with genetic algorithms. In particular, a restricted high level rule language is used, and genetic operators suitable for the language are presented. An empirical study shows that SAMUEL can learn rules to control a challenging dynamic process.
Genetic algorithms gain much of their power from mechanisms derived from the field of population genetics. However, it is possible, and in some cases desirable, to augment the standard mechanisms with additional features not available in biological systems. In this paper, we examine the use of Lamarckian learning operators in the SAMUEL architecture. The use of the operators is illustrated on three tasks in multi-agent environments.
Machine learning offers the possibility of designing intelligent systems that refine and improve their initial knowledge through their own experience. This article focuses on the problem of learning sequential decision rules for multi-agent environments. We describe the SAMUEL learning system that uses genetic algorithms and other competition based techniques to learn decision strategies for autonomous agents. One of the main themes in this research is that the learning system should be able to take advantage of existing knowledge where available. This article describes some of the mechanisms for expressing existing knowledge in SAMUEL, and explores some of the issues in selecting constraints for the learning system.
SAMUEL is an experimental learning system that uses genetic algorithms and other learning methods to evolve reactive decision rules from simulations of multi-agent environments. The basic approach is to explore a range of behavior within a simulation model, using feedback to adapt its decision strategies over time. One of the main themes in this research is that the learning system should be able to take advantage of existing knowledge where available. This has led to the adoption of rule representations that ease the expression existing knowledge. A second theme is that adaptation can be driven by competition among knowledge structures. Competition is applied at two levels in SAMUEL. Within a strategy composed of decision rules, rules compete with one another to influence the behavior of the system. At a higher level of granularity, entire strategies compete with one another, driven by a genetic algorithm. This article focuses on recent elaborations of the agent model of SAMUEL that are specifically designed to respond to multiple external agents. Experimental results are presented that illustrate the behavior of SAMUEL on two multi-agent predator-prey tasks.
Truly autonomous vehicles will require both projective planning and reactive components in order to perform robustly. Projective components are needed for long-term planning and replanning where explicit reasoning about future states is required. Reactive components allow the system to always have some action available in real-time, and themselves can exhibit robust behavior, but lack the ability to explicitly reason about future states over a long time period. This work addresses the problem of creating reactive components for autonomous vehicles. Creating reactive behaviors (stimulus-response rules) is generally difficult, requiring the acquisition of much knowledge from domain experts, a problem referred to as the knowledge acquisition bottleneck. SAMUEL is a system that learns reactive behaviors for autonomous agents. SAMUEL learns these behaviors under simulation, automating the process of creating stimulus-response rules and therefore reducing the bottleneck. The learning algorithm was designed to learn useful behaviors from simulations of limited fidelity. Current work is investigating how well behaviors learned under simulation environments work in real world environments. In this paper, we describe SAMUEL, and describe behaviors that have been learned for simulated autonomous aircraft, autonomous underwater vehicles, and robots. These behaviors include dog fighting, missile evasion, tracking, navigation, and obstacle avoidance.
Evolutionary algorithms incorporate principles from biological population genetics to perform search, optimization, and learning. This article discusses issues arising in the application of evolutionary algorithms to problems in robotics.
The ability to recognize quickly and accurately what they encounter is a fundamental ability of normal intelligent human behaviour. How people perform the task of learning the categories that objects in the world fit into is still an unanswered question. However, in this thesis I follow up an idea that Holland et al. (1986) proposed and present genetic based machine learning as a model of perceptual category learning. Genetics based machine learning has a certain amount of popularity within computer science yet the ideas have been slow to cross the boundaries of disciplines. The main drive of this research is to adopt the theoretical position of Holland et al. (1986) and see if realistically genetic based machine learning can be used as a model of learning in humans. Within Psychology the domain of category learning has grown as an area of interest, for categorization is considered basic to all our intellectual abilities (Estes 1994). Categorization ``is the process of assigning objects (of whatever kind) to categories (which are collections of objects which are grouped together for some purpose)'' (Lamberts 1997). There is a bench mark set of results within perceptual category learning from Shepard et al (1961). These tasks have a well defined difficulty ordering and serve as a starting point for any model of perceptual category learning. In psychology neural networks are often used for cognitive modelling. However Sen (1996) used a simple classifier system (Newboole) to model the classic Shepard et al. (1961) tasks. I will begin with a replication of this study and then move on to use another more advanced classifier system (XCS: Wilson 1995) to see if this provides a better model of perceptual category learning. Also, I introduce a category switch task that has not previously been evaluated with these systems. Results show that although simple classifier systems can capture qualitatively results from humans, they fail to show elegant solutions to problems and are limited in the tasks they can model. XCS can solve a greater variety of problems as it can handle multi step problems as well as single step, also it models both the Shepard and the switch problems. XCS develops a covering map of knowledge giving the system much more complete knowledge of the classification problem compared to simpler systems. XCS learns not only what is a correct classification but also what is an incorrect classification. Also due to the derivation of a covering map of knowledge XCS finds simple elegant solutions to problems. Some classifier systems (XCS) may offer a realistic alternative to neural networks in cognitive modelling.
Traditionally within classifier systems the ability of a classifier to obtain reward (as measured by its strength) indicates the fitness of the classifier within the rule population. However, Wilson (1995) proposed a new approach to fitness in terms of a classifier's prediction accuracy. This paper presents experiments with two different classifier systems: Newboole (Bonelli et al. 1990) and XCS (Wilson 1995). Both systems demonstrate qualitative matches to data from perceptual category learning in humans. However, the different methods of fitness evaluation of classifiers alter the knowledge the systems learn and maintain. When fitness is based upon strength (Newboole) the system acquires knowledge to solve the classification problem. But when fitness is based on accuracy (XCS) the system acquires a more complete knowledge of the problem space. Further experiments show that the optimal covering map (Kovacs 1997) of knowledge that emerges in XCS allows the system to compensate rapidly in a dynamic classification environment. This is also more similar to human performance on comparable tasks.
Classifier systems are simple production systems working on binary messages of fixed length. Genetic algorithms are employed in classifier systems in order to discover new classifiers. We use methods of the computational complexity theory in order to analyse the inherent difficulty of learning in classifier systems. Hence our results do not depend on special (possibly genetic) learning algorithms. The paper formalises this rule discovery or learning problem for classifier systems which has been proved to be hard in general. It will be proved that restrictions on two distinct learning problems lead to problems in NC, i.e. problems which are efficiently solvable in parallel.
Genetic algorithms are employed in classifier systems in order to discover new classifiers. The paper formalises this rule discovery or learning problem for classifier systems and uses methods of computational complexity theory to analyse its inherent difficulty. It is proved that two distinct learning problems are NP-complete, i.e. not likely to be solvable efficiently. The practical relevance of these theoretical results is briefly discussed.
This paper describes a classifier tool that uses a genetic algorithm to make rule induction. The genetic algorithm uses the Michigan approach, is domain independent and is able to process continuous and discrete attributes. Some optimizations include the use of phenotypic sharing (with linear complexity) to direct the search. The results of accuracy are compared with other 33 algorithms in 32 datasets. The difference of accuracy is not statistically significant at the 10% level when compared with the best of the other 33 algorithms. The implementation allows the configuration of many parameters, and intends to be improved with the inclusion of new operators.
Classifier systems are now viewed disappointing because of their problems such as the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed a hybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions.
Induction is tested in a population of XCS-based agents, tested in the frame of the ``El Farol'' bar problem. Two reward schemes are used, selfish and co-operative, being the latter the best for the purposes of the experiment.
Paper is an extended abstract
The coordination between the sensor and motor systems is an essential feature in autonomous intelligent systems. This thesis investigates the evolutionary approach to sensorimotor control using learning classifier systems. A simple classifier system is used to solve the problem of coordinating a pair of eyes and an arm in order to catch an object. First, an analysis of the previous approaches based on neural networks is done. Then a review of what a simple classifier system is, as well as the specific implementation of the simple classifier system to solve this problem. Afterwards an analysis of the results is presented. Finally a review of the advantages and disadvantages of this approach comparing it with previous ones is offered. Results have shown that classifier systems are a promising tool, which solve this sensorimotor coordination problem, further work needs to be done to determine the limitations of this approach.
La Inteligencia Artificial tiene planteado el reto de lograr crear sistemas computacionales que desarrollen una conducta similar a la conducta humana. Esto no es tarea facil y es el quebradero de cabeza de muchos investigadores. En el presente trabajo hemos intentado estudiar algunas de las aportaciones que se estan haciendo desde el campo de los Algoritmos Geneticos y su integracion con tecnicas basadas en Conjuntos Fuzzy. Concretamente, analizaremos uno de los paradigmas de aprendizaje de los AGs, Los Sistemas Clasificadores, y algunas de sus aproximaciones fuzzy, varios modelos de Sistemas Clasificadores Fuzzy. Es importante destacar que estos sistemas son de especial interes por la implicaciones que pueden tener en el desarrollo de sistema de control de aprendizaje autonomos y adaptativos que pueden actuar de forma ``inteligente'', de modo similar a los seres humanos.
The type of cognitive system (CS) studied here has four basic parts: (1) a set of interacting elementary productions, called classifiers, (2) a performance algorithm that directs the action of the system in the environment, (3) a simple learning algorithm that keeps a record of each classifier's success in bringing about rewards, and (4) a more complex learning algorithm, called the genetic algorithm, that modifies the set of classifiers so that variants of good classifiers persist and new, potentially better ones are created in a provably efficient manner. Two ``proof-of-principle'' experiments are reported. One experiment shows CS's performance in a maze when it has only the ability to adjust the predictions about ensuing rewards of classifiers (similar to adjusting the ``weight'' of a classifier) vs. when the power of the genetic algorithm is added. Criterion was achieved an order of magnitude more rapidly when the genetic algorithm was operative. A second experiment examines transfer of learning. Placed in a more difficult maze, CS with experience in the simpler maze reaches criterion an order of magnitude more rapidly than CS without prior experience.
We asked ``What is a Learning Classifier System'' to some of the best-known researchers in the field. These are their answers.
Expert systems are powerful when working within the domain-specific boundaries of the initial design. It is widely agreed that the usefulness of such systems would be greatly enhanced if they could be made more versatile or less brittle - tolerant of changes in the domain and underlying model. This paper suggests 7 criteria for escaping brittleness (combination, parallelism, declarative and procedural information, categorization, synchronic and diachronic pointing, gracefulness and confirmation) and gives an example of a class of general purpose systems, classifier systems, that attempt to meet these criteria.
This paper does not have an absract.
Massively parallel, rule-based systems offer both a practical and a theoretical tool for understanding systems that act usefully in complex environments [see, for example, refs 1-4]. However, these systems pose a number of problems of a high order of difficulty -- problems that can be broadly characterized as problems in nonlinear dynamics. The difficulties stem from the fact that the systems are designed to act in environments with complex transition functions -- environments that, in all circumstances of interest, are far from equilibrium. Interactions with the environment thus face the systems with perpetual novelty, and the usual simplifications involving fixed points, limit cycles, etc,. just do not apply. Learning procedures (adaptive algorithms) offer a way of combating these difficulties, but an understanding of the possibilities is not a simple matter. The key question is easy enough to state informally: What kinds of environmental regularity can be exploited by learning? However, if answers that are both useful and widely applicable are to be forthcoming, the question must be reformulated in a way that gives it precision without losing generality. The usual tool for this task is a mathematical framework that suitably encompasses the subject. It is the purpose of this paper to explore a framework that gives a precise definition to the notion of an environmental regularity and then treats learning procedures as procedures for revising rules in response to detected environmental regularities. In this context, procedures for revising rules become more than a convenience, they take a central place in the design. Whether carried out by a human or a machine, rule revision requires the solution of two broad problems. First, one must rate rules as to their usefulness to the system as a whole -- the apportionment of credit problem. Then one must devise new rules that serve the system better than the least useful of the rules already in place -- the rule discovery problem. Though these two problems are sometimes treated separately, they are closely interrelated. A machine learning approach is used here to illustrate the interaction of apportionment of credit and rule discovery algorithms, and then the overall system is abstracted and translated to the mathematical framework. To give the framework a concrete subject matter, section 1 introduces a particular class of highly parallel, rule-based systems called classifier systems. The next section....
Message-passing, rule-based production systems in which many rules are active simultaneously offer attractive possibilities for the exploitation of general-purpose machine learning algorithms. In such systems each rule can be looked upon as a tentative hypothesis about some aspect of the task environment, competing against other plausible hypotheses being entertained at the same time. In this context there are two major tasks for machine learning algorithms: (1) apportionment of credit and (2) rule discovery. The apportionment-of-credit algorithm(s) must assign ``strength'' to rules on the basis of their observed usefulness to the system. The problem is complicated by the difficulty of determining which of a cluster of rules active in an early, ``stage-setting'' capacity has contributed to a later useful outcome (e.g., rules controlling early moves that make possible later a triple jump in checkers). If strengths can be assigned appropriately, then they can be used to determine a rule's ability to win against competing rules, and they can be used to determine the rule's likelihood of being used as a ``parent'' for new rules. Surprisingly, for credit apportionment algorithms of the bucket-brigade variety, one can prove fixed-point theorems that provide some guarantees of an appropriate apportionment. The task of rule discovery depends critically upon the discovery of good ``building blocks'' for generating plausible rules (hypotheses). A parallel system designed with machine learning in mind must permit a constant flux of new rules to be tested and exploited or discarded. Moreover this flux must not disturb the system's behavior in task environments for which it has well-practiced, effective procedures. Genetic algorithms, using the strengths as ``fitnesses'', offer subtle ways of discovering good building blocks, and there are new versions of theorems from mathematical genetics that enable us to understand this discovery process.
Theoretical questions about classifier systems, with rare exceptions, apply equally to other adaptive nonlinear networks (ANNs) such as the connectionist models of cognitive psychology, the immune system, economic systems, ecologies and genetic systems. This paper discusses pervasive properties of ANNs and the kinds of mathematics relevant to questions about these properties. It discusses relevant functional extensions of the basic classifier system and extensions of the extant mathematical theory. An appendix briefly reviews some of the key theorems about classifier systems.
This paper, after a general introduction to the area, discusses the architecture and learning algorithms that permit automatic parallel, distributed lookahead to emerge in classifier systems. Simple additions to a ``standard'' classifier system suffice, principally a new register called the virtual strength register, and a provision to use the bucket brigade credit-assignment algorithm in ``virtual'' mode to modify values in this register. With these additions, current actions are decided on the basis of expected values associated with the ``lookahead cones'' of possible alternatives.
A new technique for improving the classification performance of learning classifier systems (LCS) was developed and applied to a real-world data mining problem. EpiCS, a stimulus-response LCS, was adapted to perform prevalence-based bootstrapping, wherein data from training and testing sets were sampled according to the prevalence of the individual classes, rather than randomly using the class distribution inherent in the data. Prevalence-based bootstrapping was shown to improve classification performance significantly on training and testing (p<0.0001). Furthermore, this procedure was shown to enhance EpiCS's classification performance on testing compared to C4.5 when similar bootstrapping procedures were applied to the latter.
The learning classifier system (LCS) integrates a rule-based system with reinforcement learning and genetic algorithm-based rule discovery. This investigation reports on the design, implementation, and evaluation of EpiCS, a LCS adapted for knowledge discovery in epidemiologic surveillance. Using data from a large, national child automobile passenger protection program, EpiCS was compared with C4.5 and logistic regression to evaluate its ability to induce rules from data that could be used to classify cases and to derive estimates of outcome risk, respectively. The rules induced by EpiCS were less parsimonious than those induced by C4.5, but were potentially more useful to investigators in hypothesis generation. Classification performance of C4.5 was superior to that of EpiCS (P<0.05). However, risk estimates derived by EpiCS were significantly more accurate than those derived by logistic regression (P<0.05).
Missing data pose a potential threat to learning and classification in that they may compromise the ability of a system to develop robust, generalized models of the environment in which they operate. This investigation reports on the effects of three approaches to covering these data using an XCS-style learning classifier system. Using fabricated datasets representing a wide range of missing value densities, it was found that missing data do not appear to adversely affect LCS learning and classification performance. Furthermore, three types of missing value covering were found to exhibit similar efficiency on these data, with respect to convergence rate and classification accuracy.
The use of a genetics-based classifier system (CS) in generating epidemiologic hypotheses was investigated. In addition, epidemiologic analytical techniques were used to evaluate the performance of a CS in this problem domain. Five component studies were implemented, using epidemiologic surveillance data over a range of prevalences. The evaluation study investigated the use of the area under the receiver operating characteristic curve (*) as an alternative to crude accuracy (CA) during the training period. The classification study examined the ability of the CS to classify unencountered patients. The reproducibility study demonstrated the stochastic processes underlying CS performance during training and testing. The payoff-penalty parameterization study investigated the effects of differential penalty for false negative and false positive decisions on learning rate and classification ability. The risk assessment study examined the ability of a CS to derive estimates of risk for purposes of classification. At 50% prevalence, * was identical to CA over the entire training period; with decreasing prevalence, CA increasingly overestimated the learning rate, while * provided more accurate depictions of this measure. Across all four prevalences investigated, the CS was able to classify unseen patients well, with *s ranging from 0.95 at 50% prevalence to 0.78 at 10%. The classifier populations after training indicated considerable generalization; decision rules were discernible on visual examination. When trained and tested using 1,000 different data sets drawn from the same pool, the CS was fairly consistent in terms of learning rate and classification ability, although with sufficient variation to warrant investigating the use of bootstrapping techniques. Biasing the ratio of false positive to false negative (FP:FN) decisions affected the learning rate relative to prevalence. Learning rate was most enhanced at 25% and 10% prevalence by a FP:FN ratio of 4:1 and 10:1, respectively. Across all four prevalences, the CS was able to produce risk estimates that consistently outperformed decision rules derived using logistic regression. The CS was shown to be a useful adjunct to hypothesis generation during epidemiologic surveillance.
A learning classifier system, EpiCS, was used to derive a continuous measure of disease risk in a series of 250 individuals. Using the area under the receiver-operating characteristic curve, this measure was compared with the risk estimate derived for the same individuals by logistic regression. Over 20 training-testing trials, risk estimates derived by EpiCS were consistently more accurate (mean area=0.97, SD=0.01) than that derived by logistic regression (mean area=0.89, SD=0.02). The areas for the trials with minimum and maximum classification performance on testing were significantly greater (p=0.019 and p<0.001, respectively) than the area for the logistic regression curve. This investigation demonstrated the ability of a learning classifier system to produce output that is clinically meaningful in diagnostic classification.
The effect of biasing negative reinforcement levels on learning rate and classification accuracy in a learning classifier system (LCS) was investigated. Simulation data at five prevalences (base rates) were used to train and test the LCS. Erroneous decisions made by the LCS during training were punished differentially according to type: false positive (FP) or false negative (FN), across a range of four FP:FN ratios. Training performance was assessed by learning rate, determined from the number of iterations required to reach 95% of the maximum area under the receiver operating characteristic (ROC) curve obtained during learning. Learning rates were compared across the three biased ratios with those obtained at the unbiased ratio. Classification performance of the LCS at testing was evaluated by means of the area under the ROC curve. During learning, differences were found between the biased and unbiased penalty schemes, but only at unequal base rates. A linear relationship between bias level and base rate was suggested. With unequal base rates, biasing the FP:FN ratio improved the learning rate. Little effect was observed on testing the LCS with novel cases.
A ``metric toolkit'' to evaluate learning classifier system performance is proposed. The metrics are shown to be superior to crude accuracy in evaluating classification performance, especially for data with unequal numbers of positive and negative cases. In addition, these metrics provide information to the researcher that is not available from crude accuracy. When used appropriately, these metrics provide accurate depictions of learning classifier system performance during training and testing in supervised learning environments.
Applying a learning classifier system to two-class decision problems requires a special approach to performance evaluation. This paper presents a suite of quantitative tools that addresses the evaluation requirements of two-class problems. These metrics, borrowed from the domain of medical decision making, are proposed as adjuncts to commonly used evaluation methods such as crude accuracy (``percent correct''). They include sensitivity, specificity, area under the receiver operating characteristic curve, and predictive value. These metrics are shown to be superior to crude accuracy in evaluating learning classifier system performance, especially when applied to data with unequal numbers of positive and negative cases. In addition, these metrics provide information to the researcher that is not available from crude accuracy. When used appropriately, these metrics provide accurate depictions of learning classifier system performance during training and testing in supervised learning environments.
Paper is an extended abstract
A stimulus-response learning classifier system (LCS), EpiCS, was developed from the BOOLE and NEWBOOLE models to address the needs of knowledge discovery in databases used in clinical research. Two specific needs were investigated: the derivation of accurate estimates of disease risk, and the ability to deal with rare clinical outcomes. EpiCS was shown to have excellent classification accuracy, compared to logistic regression, when using risk estimates as the primary means for classification. This was especially true in data with low disease prevalence. EpiCS was designed to accommodate differential negative reinforcement when false positive or false negative decisions were made by the system. This feature was investigated to determine its effect on learning rate and classification accuracy. Tested across a range of disease prevalences, the learning rate improved when erroneous decisions were differentially negatively reinforced. However, classification accuracy was not affected by differential negative reinforcement.
Fitness sharing has been shown to be an effective niching mechanism in genetic algorithms (GAs). Sharing allows GAs to maintain multiple, cooperating ``species'' in a single population for many generations under severe selective pressure. While recent studies have shown that the maintenance time for niching equilibrium is long, it has never been shown that the time it takes to reach equilibrium is sufficiently fast. While experiments indicate that selection under fitness sharing drives the population to equilibrium just as fast and as effectively as selection alone drives the simple GA to a uniform population, we can now show analytically that this is the case.
Niching can allow a diverse population to cooperatively represent a single, distributed solution to the problem at hand. Successful niching mechanisms must promote both cooperation (i.e., co- existence of separate ``species'' for each desired niche), and competition (i.e., intensive search for the best species for each niche, and for the best niches). In this paper we seek the competitive- cooperative boundary in the space of possible niche relationships, that will allow us to successfully predict which pairs of interacting niches will survive under GA selection and which niche pairs will be resolved to yield a single winner. By combining extant models of niching equilibrium, niche maintenance, and convergence, we define the regions of cooperation and competition on a map of niching scenarios varying along the dimensions of niche overlap and relative niche fitness. We verify this predictive map of niching failure/success, and discuss its utility in allowing us to control for the competitive evolution of desired types of cooperation. Although our models are specific to the niching mechanism we call resource sharing, we believe the development of competitive-cooperative control maps is important for niching theory in general.
We approach the difficult task of analyzing the complex behavior of even the simplest learning classifier system (LCS) by isolating one crucial subfunction in the LCS learning algorithm: covering through niching. The LCS must maintain a population of diverse rules that together solve a problem (e.g., classify examples). To maintain a diverse population while applying the GA's selection operator, the LCS must incorporate some kind of niching mechanism. The natural way to accomplish niching in an LCS is to force competing rules to share resources (i.e., rewards). This implicit LCS fitness sharing is similar to the explicit fitness sharing used in many niched GAs. Indeed, the LCS implicit sharing algorithm can he mapped onto explicit fitness sharing with a one-to-one correspondence between algorithm components. This mapping is important because several studies of explicit fitness sharing, and of niching in GAs generally, have produced key insights and analytical tools for understanding the interaction of the niching and selection forces. We can now bring those results to bear in understanding the fundamental type of cooperation (a.k.a. weak cooperation) that an LCS must promote.
The credit-apportionment problem in the classifier system refers to the problem of assigning credit or blame to each rule involved in achieving a certain goal. This paper presents an enhanced credit-apportionment algorithm: the context-array bucket-brigade algorithm for solving the credit apportionment problems in the classifier system. This algorithm separates the contexts (the circumstances in which rules can fire) into context subsets. Then it employs array-valued bids (strengths) to estimate rule usefulness at the context subset level. The essence of this algorithm is that (i) it improves credit-apportionment and (ii) it provides explicit information for rule discovery. Favorable results have been observed in the tests of two versions of the algorithm.
Paper is an extended abstract
This work has no abstract
This master thesis explores the application of genetic algorithms as a supervised machine learning technique. It studies different classifications approaches (Michigan and Pittsburgh), and generates different individual codification and evaluation function alternatives. The classification centers its scope in problems described using real value attributes. All these approaches are modelled under a Pittsburgh philosophy. To validated them, and to analize they performance, they are applied to solve a real world problem: ``The automatic classification of mammary biopsy images''.
This paper describes the application of Machine Learning (ML) techniques to a real world problem: the Automatic Diagnosis (classification) of Mammary Biopsy Images. The starting point consists of a set of data (solved cases) provided by the Signal Theory Research Group of our University [9]. The techniques applied are Genetic Algorithms (GA) and Case-Based Reasoning (CBR). The paper compares our results with previous ones obtained using Neural Networks (NN) [10]. The main goals are: to efficiently solve classification problems of such a type, to compare different alternatives for ML and to study hybrid systems. The paper also introduces the systems we developed for solving this kind of classification problems: GeB-CS (Genetic Based Classifier System) for a GA approach, and CaB-CS (Case-Based Classifier System) for a CBR approach.
Utilising the expressive power of S-Expressions in Learning Classifier Systems often prohibitively increases the search space due to increased flexibility of the encoding. This work shows that selection of appropriate S-Expression functions through domain knowledge improves scaling in problems, as expected. It is also known that simple alphabets perform well on relatively small sized problems in a domain, e.g. ternary alphabet in the 6, 11 and 20 bit MUX domain. Once fit ternary rules have been formed it was investigated whether higher order learning was possible and whether this staged learning facilitated selection of appropriate functions in complex alphabets, e.g. selection of S-Expression functions. This novel methodology is shown to provide compact results (135-MUX) and exhibits potential for scaling well (1034-MUX), but is only a small step towards introducing abstraction to LCS.
This paper shows how linguistic classification knowledge can be extracted fro numerical data for pattern classification problems with many continuous attributes by genetic algorithms. Classification knowledge is extracted in the form of linguistic if-then rules. In this paper, emphasis is placed on the simplicity of the extracted knowledge. The simplicity is measured by two criteria: the number of extracted linguistic rules and the length of each rule (i.e, the number of antecedent conditions involved in each rule). The classification ability of extracted linguistic rules, which is measured by the classification rate on given training patterns, is also considered. Thus our task is formulated as a linguistic rule extraction problem with three objectives: to maximize the classification rate, to minimize the number of extracted rules, and to minimize the length of each rule. For tackling this problem, we propose a multi-objective genetics-based machine learning (GBML) algorithm, which is a hybrid algorithm of Michigan approach and Pittsburgh approach. Our hybrid algorithm is basically a Pittsburgh-style algorithm with variable string length. A Michigan-style algorithm is combined as a kind of mutation for partially modifying each string.
Fuzzy controls have been widely used in industry for its high degree of performance in human-computer interactions. DNA coding method, which is one of the coding methods in Genetic Algorithms, is based on biological DNA and a mechanism of development from the artificial DNA. This method has redundancy and overlapping of genes, and it is suitable for knowledge representation. In this paper, we propose the Parallel Genetic Algorithm using the DNA coding method. This paper applies this method to acquisition of fuzzy control rules with multiple input/output system for a mobile robot. This method can select input variables from many candidates and tune membership functions. The result of simulation shows that the robot can reach the goal quickly and efficiently. Effective fuzzy rules for the mobile robot are acquired by using this method while the length of the chromosomes in the population is automatically adjusted.
Over the last three years, we developed an inductive learning environment called DELVAUX for classification tasks that learns PROSPECTOR-style, Bayesian rules from ests of examples, using a genetic algorithm to evolve a population consists of rule-sets. Several problems comlicate the search for the best rule-set. First, the search space that is explored by DELVAUX is enormously large, which makes it difficult to predict if a particular solution is a good solution. The second problem is the problem of convergence with outliers that perform well in training but not in testing. This paper describes efforts to alleviate these two problems centering on multi-rule-set learning techniques that learn multiple rule-sets and proposes several decision-making schemes that are employed by the multi-rule-set learning environment to derive a decision. Empirical results are presented that compare the single rule-set learning environment of DELVAUX with several multi-rule-set learning environments that use different decision-making schemes. Moreover, a more sophisticated fitness function for the multi-rule-set learning approach is introduced, and a genetic algorithm approach that finds the `best' multi-rule-set for a given set of rule-sets is discussed.
In this paper, we discuss the underlying hardware and supportable learning paradigms provided by the GA-1 system. GA-1 is a system currently under development which offers unique opportunities for research into large-scale rule learning with genetic algorithms (GAs). The base hardware is the IXM2 parallel associative memory machine which enables high performance processing by using 64 T800 transputers and associative memories providing 256K parallelism. Various population/subpopulation models, mating strategies, and generation models can be implemented to investigate architectures for high performance GA-based systems. Regardless of these options, however, GA-based rule learning takes maximum advantage of the hardware through extensive use of associative memory for bit-vector matching. Preliminary experiments indicate that GA-1 exhibits high execution speeds for such an approach.
This paper surveys the Visual Auction. The Visual Auction is a pedagogical and research tool, which provides a dynamic visual representation of the matching and auction process in a classifier system. The tool allows for the visual representation of both the matching process and the process of determining the winner of the auction. The tool can be used pedagogically, for it quickly demonstrates how antecedent matching occurs in a classifier system and shows how the matched classifiers compete to win the auction. As a research tool, it assists the researcher in implementing a classifier system by providing more in-depth knowledge of the auction process, thus providing insight into how parameter settings or bid equations could be modified to generate more efficient learning.
Despite two decades of work, learning classifier systems researchers have had relatively little to say on the subject of what makes a problem difficult for a classifier system. One focus of our work has been the issue of what makes a problem difficult for XCS -- Wilson's recent accuracy-based classifier system. This document outlines the approach taken, provides some initial results and outlines possible directions for future work.
Paper is an extended abstract
Despite two decades of work learning classifier systems researchers have had relatively little to say on the subject of what makes a problem difficult for a classifier system. Wilson's accuracy-based XCS, a promising and increasingly popular classifier system, is, we feel, the natural first choice of classifier system with which to address this issue. To make the task more tractable we limit our considerations to a restricted, but very important, class of problems. Most significantly, we consider only single step reinforcement learning problems and the use of the standard binary/ternary classifier systems language. In addition to distinguishing several dimensions of problem complexity for XCS, we consider their interactions, identify bounding cases of difficulty, and consider complexity metrics for XCS. Based on these results we suggest a simple template for ternary single step test suites to more comprehensively evaluate classifier systems.
We present a bibliography of all works we could find on Learning Classifier Systems (LCS) -- the genetics-based machine learning systems introduced by John Holland. With over 400 entries, this is at present the largest bibliography on classifier systems in existence. We include a list of LCS resources on the world wide web.
We present a bibliography of all works we could find on Learning Classifier Systems (LCS) -- the genetics-based machine learning systems introduced by John Holland. With over 400 entries, this is at present the largest bibliography on classifier systems in existence. We include a list of LCS resources on the world wide web.
With over 600 entries, this is by far the most comprehensive bibliography of the machine learning systems introduced by John Holland.
This work investigates some uses of self-monitoring in classifier systems (CS) using Wilson's recent XCS system as a framework. XCS is a significant advance in classifier systems technology which shifts the basis of fitness evaluation for the Genetic Algorithm (GA) from the strength of payoff prediction to the accuracy of payoff prediction. Initial work consisted of implementing an XCS system in Pop-11 and replicating published XCS multiplexer experiments from (Wilson 1995, 1996a). In subsequent original work, the XCS Optimality Hypothesis, which suggests that under certain conditions XCS systems can reliably evolve optimal populations (solutions), is proposed. An optimal population is one which accurately maps inputs to actions to reward predictions using the smallest possible set of classifiers. An optimal XCS population forms a complete mapping of the payoff environment in the reinforcement learning tradition, in contrast to traditional classifier systems which only seek to maximise classifier payoff (reward). The more complete payoff map allows XCS to deal with payoff landscapes with more than 1 niche (i.e. those with more than 2 payoff levels) which traditional payoff-maximising CS find very difficult. This makes XCS much more suitable as the foundation of animat control systems than traditional CS. In support of the Optimality Hypothesis, techniques were developed which allow the system to highly reliably evolve optimal populations for logical multiplexer functions. A technique for auto-termination of learning was also developed to allow the system to recognise when an optimal population has been evolved. The self-monitoring mechanisms involved in this work are discussed in terms of the design space of adaptive systems.
In a standard genetic algorithm a chromosome can be fully evaluated (assigned a fitness) immediately. In classifier systems, however, a chromosome can only be fully evaluated after many interactions with the environment, since a chromosome may generalise over many environmental states. In this work it is suggested that evolutionary systems which cannot fully evaluate candidate solutions immediately will benefit from protecting them from deletion until they have been well evaluated. A new technique which protects poorly evaluated chromosomes outperforms both techniques from (Wilson, 1995) on two types of boolean function and a delayed reward problem. Next a weeding operator which deletes low fitness rules is introduced and found to improve performance on one boolean function. Results indicate the XCS classifier system is able to learn boolean functions for which no (or few) useful generalisations can be made over the input string, despite its drive towards accurate generalisation.
Wilson's recent XCS classifier system forms complete mappings of the payoff environment in the reinforcement learning tradition thanks to its accuracy based fitness. According to Wilson's Generalization Hypothesis, XCS has a tendency towards generalization. With the XCS Optimality Hypothesis, I suggest that XCS systems can evolve optimal populations (representations); populations which accurately map all input/action pairs to payoff predictions using the smallest possible set of non-overlapping classifiers. The ability of XCS to evolve optimal populations for boolean multiplexer problems is demonstrated using condensation, a technique in which evolutionary search is suspended by setting the crossover and mutation rates to zero. Condensation is automatically triggered by self-monitoring of performance statistics, and the entire learning process is terminated by autotermination. Combined, these techniques allow a classifier system to evolve optimal representations of boolean functions without any form of supervision. A more complex but more robust and efficient technique for obtaining optimal populations called subset extraction is also presented and compared to condensation.
This paper extends the work presented in (Kovacs, 1996) on evolving optimal solutions to boolean reinforcement learning problems using Wilson's recent XCS classifier system. XCS forms complete mappings of the payoff environment in the reinforcement learning tradition thanks to its accuracy based fitness, which, according to Wilson's Generalization Hypothesis, also gives XCS a tendency towards accurate generalization. (Kovacs, 1996) introduced the XCS Optimality Hypothesis which suggests that XCS systems can evolve optimal populations (representations); populations which accurately map all input/action pairs to payoff predictions using the smallest possible set of non-overlapping classifiers. The ability of XCS to evolve optimal populations for boolean multiplexer problems was demonstrated in (Kovacs, 1996) using condensation, a technique in which evolutionary search is suspended by setting the crossover and mutation rates to zero. Condensation is automatically triggered by self-monitoring of performance statistics, and the entire learning process is terminated by autotermination. Combined, these techniques allow a classifier system to evolve optimal representations of boolean functions without any form of supervision. The present work shows how condensation can be greatly accelerated by truncating each action set around its most numerous member. Following this, a more complex but more robust and efficient technique for obtaining optimal populations called subset extraction is presented and compared to condensation.
The issue of deletion schemes for classifier systems has received little attention. In a standard genetic algorithm a chromosome can be evaluated (assigned a reasonable fitness) immediately. In classifier systems, however, a chromosome can only be fully evaluated after many interactions with the environment, since a chromosome may generalise over many environmental states. A new technique which protects poorly evaluated chromosomes outperforms both techniques from (Wilson, 1995) in two very different single step problems. Results indicate the XCS classifier system is able to learn single step problems for which no (or few) useful generalisations can be made over the input string, despite its drive towards accurate generalisation.
Wilson's XCS is a clear departure from earlier classifier systems in the way it calculates the fitness of classifiers for use in the genetic algorithm. Despite the growing body of work on XCS and the advantages claimed for it, there has been no detailed comparison of XCS and traditional strength-based systems. We distinguish different definitions of overgenerality for strength and accuracy-based fitness and analyse some implications of the use of accuracy, including an advantage in exploration. We analyse the formation of strong overgenerals, a major problem for strength-based systems, and show that they require biased reward functions. We also show that all non-trivial multi step environments have biased reward functions and thus suffer from strong overgenerals. We conclude that strength-based systems are not suitable for multi step environments or indeed many single step environments.
Wilson's XCS is a clear departure from earlier classifier systems in terms of the way it calculates the fitness of classifiers for use in the genetic algorithm. Despite the growing body of work on XCS and the advantages claimed for it there has been no detailed comparison of XCS and traditional strength-based systems. This work takes a step towards rectifying this situation by surveying a number of issues related to the change in fitness. I distinguish different definitions of overgenerality for strength and accuracy-based fitness and analyse some implications of the use of accuracy, including an apparent advantage in addressing the explore/exploit problem. I analyse the formation of strong overgenerals, a major problem for strength-based systems, and illustrate their dependence on biased reward functions. I consider motivations for biasing reward functions in single step environments, and show that non-trivial multi step environments have biased Q-functions. I conclude that XCS's accuracy-based fitness appears to have a number of significant advantages over traditional strength-based fitness.
We analyse the concept of strong overgeneral rules, the Achilles' heel of traditional Michigan-style learning classifier systems, using both the traditional strength-based and newer accuracy-based approaches to rule fitness. We argue that different definitions of overgenerality are needed to match the goals of the two approaches, present minimal conditions and environments which will support strong overgeneral rules, demonstrate their dependence on the reward function, and give some indication of what kind of reward functions will avoid them. Finally, we distinguish fit overgeneral rules, show how strength and accuracy-based fitness differ in their response to fit overgenerals and conclude by considering possible extensions to this work.
Using data from the world's most comprehensive Learning Classifier Systems (LCS) bibliography, we examine trends in LCS publication and attempt to account for them. We find support for the notions of a classic period and an ongoing LCS renaissance, and find that Wilson's XCS has become a major focus of LCS research.
This article lists currently available sources of information on classifier systems and classifier systems research, both on-line and in print. The need for new resources, and improvements to certain existing ones, are suggested.
This work suggests two ways of looking at Michigan classifier systems; as Genetic Algorithm-based systems, and as Reinforcement Learning-based systems, and argues that the former is more suitable for traditional strength-based systems while the latter is more suitable for accuracy-based XCS. The dissociation of the Genetic Algorithm from policy determination in XCS is noted, and the two types of Michigan classifier system are contrasted with Pittsburgh systems.
We consider the issues of how a classifier system should learn to represent a Boolean function, and how we should measure its progress in doing so. We identify four properties which may be desirable of a representation; that it be complete, accurate, minimal and non-overlapping. We distinguish two categories of learning metric, introduce new metrics and evaluate them. We demonstrate the superiority of population state metrics over performance metrics in two situations, and in the process find evidence of XCS's strong bias against overlapping rules.
Continuation processes in chemical and/or biotechnical plants always generate a large amount of time series data. However, since conventional process models are described as a set of control models, it is difficult to explain complicated and active plant behaviors. To uncover complex plant behaviors, this paper proposes a new method of developing a process response model from continuous time-series data. The method consists of the following phases: (1) Reciprocal correlation analysis; (2) Process response model; (3) Extraction of control rules; (4) Extraction of a workflow; and (5) Detection of outliers. The main contribution of the research is to establish a method to mine a set of meaningful control rules from a Learning Classifier System using the Minimum Description Length criteria and Tabu search method. The proposed method has been applied to an actual process of a biochemical plant and has shown its validity and effectiveness.
ATNoSFERES is a Pittsburgh style Learning Classifier System (LCS) in which the rules are represented as edges of an Augmented Transition Network. Genotypes are strings of tokens of a stack-based language, whose execution builds the labeled graph. The original ATNoSFERES, using a bitstring to represent the language tokens, has been favorably compared in previous work to several Michigan style LCSs architectures in the context of Non Markov problems. Several modifications of ATNoSFERES are proposed here: the most important one conceptually being a representational change: each token is now represented by an integer, hence the genotype is a string of integers; several other modifications of the underlying grammar language are also proposed. The resulting ATNoSFERES-II is validated on several standard animat Non Markov problems, on which it outperforms all previously published results in the LCS literature. The reasons for these improvement are carefully analyzed, and some assumptions are proposed on the underlying mechanisms in order to explain these good results.
We analyze XCS learning capabilities in stochastic environments where the result of agent actions can be uncertain. We show that XCS can deal when degree of environmental uncertainty is limited. We analyze our experimental results and propose an extension to XCS, called XCS_mu, which can learn optimal solutions for higher degrees of uncertainty. We test XCS_mu when the uncertainty affects agent actions in the whole environment and when the uncertainty is limited to some areas. Finally, we show that XCS_mu is a proper extension of XCS in that it coincides with it when it is applied to deterministic environments.
In 1989 Wilson and Goldberg presented a critical review of the first ten years of learning classifier system research. With this paper we review the subsequent ten years of learning classifier systems research, discussing the main achievements and the major research directions pursued in those years.
Wilson's (1994) bit-register memory scheme was incorporated into the XCS classifier system and investigated in a series of non-Markov environments. Two extensions to the scheme proved important for reaching optimal performance in the harder environments. The first was an exploration strategy in which exploration of external actions was probabilistic as in Markov environments, but internal ``actions'' (register settings) were selected deterministically. The second was use of a register having more bit-positions than were strictly necessary to resolve environmental aliasing. The origins and effects of the two extensions are discussed.
Wilson's (1994) bit-register memory scheme was incorporated into the XCS classifier system and investigated in a series of non-Markov environments. Two extensions to the scheme were important in obtaining near-optimal performance in the harder environments. The first was an exploration strategy in which exploration of external actions was probabilistic as in Markov environments, but internal ``actions'' (register settings) were selected deterministically. The second was use of a register having more bit-positions than were strictly necessary to resolve environmental aliasing. The origins and effects of the two extensions are discussed.
This paper introduces XCSF extended with tile coding prediction: each classifier implements a tile coding approximator; the genetic algorithm is used to adapt both classifier conditions (i.e., to partition the problem) and the parameters of each approximator; thus XCSF evolves an ensemble of tile coding approximators instead of the typical monolithic approximator used in reinforcement learning. The paper reports a comparison between (i) XCSF with tile coding prediction and (ii) plain tile coding. The results show that XCSF with tile coding always reaches optimal performance, it usually learns as fast as the best parametrized tile coding, and it can be faster than the typical tile coding setting. In addition, the analysis of the evolved tile coding ensembles shows that XCSF actually adapts local approximators following what is currently considered the best strategy to adapt the tile coding parameters in a given problem.
We study how different prediction update algorithms influence the performance of XCSF. We consider three classical parameter estimation algorithms (NLMS, RLS, and Kalman filter) and four gain adaptation algorithms (K1, K2, IDBD, and IDD). The latter have been shown to perform comparably to the best algorithms (RLS and Kalman), but they have a lower complexity. We apply these algorithms to update classifier prediction in XCSF and compare the performances of the seven versions of XCSF on a set of real functions. Our results show that the best known algorithms still perform best: XCSF with RLS and XCSF with Kalman perform significantly better than the others. In contrast, when added to XCSF, gain adaptation algorithms perform comparably to NLMS, the simplest estimation algorithm, the same used in the original XCSF. Nevertheless, algorithms that perform similarly generalize differently. For instance: XCSF with Kalman filter evolves more compact solutions than XCSF with RLS and gain adaptation algorithms allow better generalization than NLMS.
XCS with computed prediction, namely XCSF, extends XCS by replacing the classifier prediction with a parametrized prediction function. Although several types of prediction functions have been introduced, so far XCSF models are still limited to evolving classifiers with the same prediction function. In this paper, we introduce XCSF with heterogeneous predictors, XCSFHP, which allows the evolution of classifiers with different types of prediction function within the same population. We compared XCSFHP to XCSF on several problems. Our results suggest that XCSFHP generally performs as XCSF with the most appropriate prediction function for the given problem. In particular, XCSFHP seems able to evolve, in each problem subspace, the most adequate type of prediction function.
This paper presents an approach to analyze population evolution in classifier systems using a symbolic representation. Given a sequence of populations, representing the evolution of a solution, the method simplifies the classifiers in the populations by reducing them to their “canonical form”. Then, it extracts all the subexpressions that appear in all the classifier conditions and, for each subexpression, it computes the number of occurrences in each population. Finally, it computes the trend of all the subexpressions considered. The expressions which show an increasing trend through the course of evolution are viewed as building blocks that the system has used to construct the solution.
We analyze generalization with the XCS classifier system when the system is applied to animat problems in grid-worlds. The aim of the paper is to give an unified view of generalization with XCS, in order to explain some of the phenomena reported in the literature. Initially, we extend the results previously presented in the literature applying XCS to three environments. Our results confirm what already reported in the literature, showing that the generalization mechanism of XCS may prevent the system from converging to optimal performance. Accordingly, we study XCS generalization mechanism analyzing Wilson's generalization hypothesis and the conditions under which it may fail. We draw a hypothesis in order to explain the results we report. We hypothesize that XCS fails to learn an optimal solution when, due to the environment structure and to the exploration strategy employed, the system is not able to explore all the environmental niches frequently. We validate our hypothesis introducing a new exploration strategy which guarantees a frequent exploration of all the areas of the environment, we call it teletransportation. Teletransportation is not introduced as a real solution to the problems we evidenced, since it is not feasible for real applications; rather, we exploit teletransportation as a tool to validate our hypothesis. Nevertheless, as we subsequently show, the ideas which support teletransportation can be implemented integrating XCS with a model of the environment, learned by experience, in a dyna architecture. We end of the paper discussing another important aspect of generalization in XCS: the conditions under which XCS may fail to produce a compact representation of a learned task. We show this is likely to happen in environments where there is no direct relation between the number of don't care symbols a classifier condition has, and the number of environmental conditions the classifier matches. Accordingly, we discuss the role of subsumption deletion for the problem of evolving a compact representation of the learned task.
We analyze the generalization behavior of the XCS classifier system in environments in which only a few generalizations can be done. Experimental results presented in the paper evidence that the generalization mechanism of XCS can prevent it from learning even simple tasks in such environments. We present a new operator, named Specify, which contributes to the solution of this problem. XCS with the Specify operator, named XCSS, is compared to XCS in terms of performance and generalization capabilities in different types of environments. Experimental results show that XCSS can deal with a greater variety of environments and that it is more robust than XCS with respect to population size.
XCS is a classifier system recently introduced by Wilson that differs from Holland's framework in that classifier fitness is based on the accuracy of the prediction instead of the prediction itself. According to the original proposal, XCS has no internal message list as traditional classifier systems does; hence XCS learns only reactive input/output mappings that are optimal in Markovian environments. When the environment is partially observable, i.e. non-Markovian, XCS evolves suboptimal solutions; in order to evolve an optimal policy in such environments the system needs some sort of internal memory mechanism. In this paper, we add internal memory mechanism to the XCS classifier system. We then test XCS with internal memory, named XCSM, in non-Markovian environments of increasing difficulty. Experimental results, we present, show that XCSM is able to evolve optimal solutions in simple environments, while in more complex problems the system needs special operators or special exploration strategies. We show also that the performance of XCSM is very stable with respect to the size of the internal memory involved in learning. Accordingly, when complex non-Markovian environments are faced XCSM performance results to be more stable when more bits than necessary are employed. Finally, we extend some of the results presented in the literature for classifier system in non-Markovian problems, applying XCSM to environments which require the agent to perform sequences of actions in the internal memory. The results presented suggest that the exploration strategies currently employed in the study of XCS are too simple to be employed with XCSM; accordingly, other exploration strategies should be investigated in order to develop better classifier systems
We add internal memory to the XCS classifier system. We then test XCS with internal memory, named XCSM, in non-Markovian environments with two and four aliasing states. Experimental results show that XCSM can easily converge to optimal solutions in simple environments; moreover, XCSM's performance is very stable with respect to the size of the internal memory involved in learning. However, the results we present evidence that in more complex non-Markovian environments, XCSM may fail to evolve an optimal solution. Our results suggest that this happens because, the exploration strategies current employed with XCS, are not adequate to guarantee the convergence to an optimal policy with XCSM, in complex non-Markovian environments.
We analyze the memory mechanism of XCSM, the extension of XCS with internal memory. Our aim is to explain some of the results reported in the literature, which show that XCSM fails to learn an optimal policy in complex partially observable environments. The analysis we present reveals that the XCSM's memory management strategy cannot guarantee the convergence to an optimal solution. We thus extend XCSM introducing a novel hierarchical exploration technique and modifying the technique used for updating internal memory. We apply the novel version of XCSM, called XCSMH, to a set of partially observable environments of different complexity. Our results show that XCSMH is able to learn an optimal policy in all the environments, outperforming XCSMH in more difficult problems.
We analyze generalization with the XCS classifier system when the system is applied to animat problems in grid-worlds. The aim of the paper is to give an unified view of generalization with XCS, in order to explain some of the phenomena reported in the literature. First, we extend the results previously presented in the literature applying XCS to two environments. Our results confirm what already reported in the literature, showing that the generalization mechanism of XCS may prevent the system from converging to optimal performance. Accordingly, we study XCS generalization mechanism analyzing the conditions under which it may fail to evolve an optimal solution. We draw a hypothesis in order to explain the results reported so far. Our hypothesis suggest that XCS fails to learn an optimal solution when, due to the environment structure and to the exploration strategy employed, the system is not able to explore all the environmental niches frequently. We test our hypothesis introducing a new exploration strategy which guarantees a frequent exploration of all the areas of the environment, we call it teletransportation. We then apply XCS with teletransportation the environments previously introduced in order to validate our hypothesis experimentally.
This thesis investigates a learning paradigm which merge together the concept of learning by interactions, which comes from the research field of reinforcement learning, with the concept of learning through evolution, which was introduced by John H. Holland, the father of genetic algorithms, in 1978 who called this paradigm learning classifier systems. This work is devoted to the study of a particular model of learning classifier system called XCS, introduced by Stewart W. Wilson in 1995, in which the most interesting ideas of Holland's paradigm co-exist with more recent ideas borrowed from reinforcement learning. The aim of this thesis is to take XCS beyond the very first results presented by Wilson in his original paper and to investigate the peculiarities of this new model of learning classifier systems with respect to other learning paradigms that come from reinforcement learning and evolutionary computation. In doing this, I propose a number of extensions to the original Wilson's framework that improves its performance and its applicability to a larger number of problems. Initially, I take XCS beyond its very first environments and parameter settings, and I show that in certain difficult sequential environments, XCS performance can fail dramatically. I propose an extension to XCS that improve its performance in such types of applications. I present an experimental analysis of generalization in XCS to understand the failures in XCS performance I observed, and to justify the improvements that the extension I propose introduces. Then, I introduce an extension to XCS for dealing with the possible incomplete perceptions that the system may experience in certain applications. This is done in two steps. First, following an idea suggested by Wilson, I implement a simple extension to XCS for dealing with incomplete perception. I show that this is capable to learn optimal behaviors in particular cases but not in general ones. I analyze this result and present an experimental study which points out the reasons underlying the poor performance of this extension. I suggest how the system should be extended in order to evolve optimal solutions in a wider set of applications. I implement this new extension of XCS and present experimental results showing that indeed the new system can learn in more difficult applications. Finally, I analyze XCS behavior in those applications where the results of the agent actions can be affected by uncertainty. I show that Wilson's XCS can deal with these types of applications when the uncertainty on agent action is limited, other-wise the system may fail to converge to an optimal performance. I study this phenomenon and develop an explanation which results in an extension of XCS that is able to deal with higher degree of uncertainty. I end the thesis presenting some initial results of the research directions I am currently pursuing.
The XCS classifier system represents a major advance in classifier systems research because (1) it has a sound and accurate generalization mechanism, and (2) its learning mechanism is based on Q-learning, a recognized learning technique. In taking XCS beyond its very first environments and parameter settings, we show that, in certain difficult sequential (``animat'') environments, performance is poor. We suggest that this occurs because in the chosen environments, some conditions for proper functioning of the generalization mechanism do not hold, resulting in overly general classifiers that cause reduced performance. We hypothesize that one such condition is a lack of sufficiently wide exploration of the environment during learning. We show that if XCS is forced to explore its environment more completely, performance improves dramatically. We propose a technique based on Sutton's Dyna concept, through which wider exploration would occur naturally. Separately, we demonstrate that the compactness of the representation evolved by XCS is limited by the number of instances of each generalization actually present in the environment. The paper shows that XCS's generalization mechanism is effective, but that the conditions under which it works must be clearly understood.
We present an initial study on the alternative representations of classifier conditions. Instead of considering the representations that have been suggested in the literature, S-expressions or real-coding, we focus on messy representation of classifier conditions which has the interesting characteristic of being independent from the sensors position coding scheme. We develop an extension of the XCS classifier system in which variable-length messy chromosomes replace original bitstring representation. With a series of experiments we show that to reach optimal performance covering, matching, and mutation must be adequately defined in order to avoid overgeneralization due to underspecification of classifier conditions. These conclusions appear to be very general since they are independent of the messy representation scheme. Accordingly, they can be used as guidelines for general variable-length representations like S-Expressions.
In this paper we present the results of the second part of our research which is aimed to the study of alternative representations of classifier conditions. In particular we introduce an extension of the XCS classifier system in which bitstring representation is replaced by S-expressions. We show that XCS with S-expressions, SXCS, can reach optimal performance in different types of applications with different complexity. The results we present also suggest that great care must be taken in choosing the representation language; in particular we show that in certain cases the use of ``or'' clauses may lead to an unstable performance. Overall, our initial results show that this is a promising approach for a future development of a general purpose representation of classifier conditions and that there are still many issues which shall be investigated.
Perceptual aliasing is a serious problem for adaptive agents. Internal memory is a promising approach to extend reinforcement learning algorithms to problems involving perceptual aliasing. In this paper we investigate the effectiveness of internal memory for tackling perceptual aliasing problems with adaptive agents and reinforcement learning. Specifically, we try to give a unified view of some interesting results that have been presented in different frameworks, i.e.: tabular reinforcement learning and learning classifier systems.
In this paper we approach the problem of defining learning classifier systems from the perspective of reinforcement learning. The peculiarity of this approach is that it does not assume any knowledge on learning classifier systems but try to develop classifier systems ``from scratch'', i.e., starting from one of the most known reinforcement learning technique: Q-learning. We begin considering some basic elements of reinforcement learning: a problem modelled as a Markov Decision Process and tabular Q-learning. We introduce a formal framework to define a general purpose rule-based representation which we use to implement tabular Q-learning. We analyze different methods to add generalization capabilities to the rule-based representation. We argue that genetic algorithms are probably the most general method for adding generalization to classifiers, although they might be not the only solution.
The synthesis of artifacts reproducing behaviors and properties of living beings is one of the main goals of Artificial Life. These artificial entities often evolve according to algorithms based on models of modern genetics. Evolutionary algorithms generally produce micro-evolution in these entities, by mutation and crossover applied on their genome. The aim of this paper is to present Non-Homogeneous Classifier Systems, NHCS, integrating the process of macroevolution. A NHCS is a derived type of classical classifier systems, CS. In a standard CS, all classifiers are built on the same structure and own the same properties. With a NHCS, the behavior of an artificial creature is defined by the co-evolution between several differently structured classifiers making its organism. These agents, moving in a 2D discreet environment with obstacles and resources, must adapt themselves and breed to build viable populations. Finally, ecological niches and particular behaviors, individual and collective, appear according to initial parameters of agents and environment.
The synthesis of artifacts reproducing behaviors and properties of living beings is one of the main goals of Artificial Life. These artificial entities often evolve according to algorithms based on models of modern genetics. Evolutionary algorithms generally produce micro-evolution in these entities, by applying mutation and crossover on their genotype. The aim of this paper is to present Non-Homogenous Classifier Systems, NHCS, integrating the process of macro-evolution. A NHCS is a derived type of classical Classifier Systems, CS. In a CS, all classifiers are built on the same structure and own the same properties. With a NHCS, the behavior of artificial creatures is defined by the co-evolution between several differently structured classifiers. These agents, moving in a 2D environment with obstacles and resources, must adapt themselves and breed to build viable populations. Finally, ecological niches and specific behaviors, individual and collective, appear according to initial parameters of agents and environment.
This paper studies decision-making with rules of thumb in the context of dynamic decision problems and compares it to dynamic programming. A rule is a fixed mapping from a subset of states into actions. Rules are compared by averaging over past experiences. This can lead to favoring rules which are only applicable in good states. Correcting this good state bias requires solving the dynamic program. We provide a general framework and characterize the asymptotic properties. We apply it to provide a candidate explanation for the sensitivity of consumption to transitory income.
Current trading strategy learning models often proceed in three separate phases, i.e., training, validation, and application (testing). After a specific time span of application, a new learning process is started to adapt the trading strategy to the new environment states. The time span of application is usually fixed and determined according to experiences. This may result ni earning losses as compared to the perfect trading strategy which trades at each turning point of the stock price movement. Some learning methods, such as neural networks, are hard to explain intuitively and unstable in some dynamic environment states. Other learning models like simple genetic algorithms result in a single trading rule which is applied for a specific time span without being adapted even when the environment has changed. This paper adopts learning classifier systems (LCSs) technique to provide a dynamic trading strategy learning model (DTSLM), which makes continuous and instant learning while executing real prediction and prodcues a trading rule set to deal with different environment states. The simulation results show that this model could get a remarkable profit.
We investigate classifier system learning of Boolean concepts. We introduce a symmetric reward-penalty mechanism, speciation, generality thresholds and rule evaluation by queries. These enable the classifier system to learn the twenty multiplexor significantly faster than previously reported for classifier systems. Conversely, we provide theoretical analyses that suggest that classifier systems are not competitive with the best known learning algorithms for stationary deterministic Boolean problems. We suggest instead that they are particularly well suited to non-stationary problems for which the target concept evolves over time.
Although fuzzy logic controllers and expert systems have been successfully applied in many complex industrial processes, they experience a deficiency in knowledge acquisition and rely to a great extent on empirical and heuristic knowledge, which in many cases cannot be objectively elicited. Among the problems to be resolved in fuzzy controller design are the determination of the linguistic state space, definition of the membership functions of each linguistic term and the derivation of the control rules. Some of these problems can be solved by application of machine learning. First, it is desirable to simplify and automate the specification of linguistic rules. Secondly, it is also desirable that modification of control rules be possible in order to cope with previously unknown, or changes in process dynamics. Machine learning methods have in recent years emerged from the use of learning algorithms modelled on natural and biological systems. These methods attempt to abstract the advanced mechanisms of learning exhibited by such systems, which can, consequently, be applied to intelligent control. One of these new algorithms is the genetic algorithm which is modelled on the processes of natural evolution. This paper develops the application of genetic algorithm techniques for fuzzy controller design. Genetic algorithms are used to automate and introduce objective criteria in defining fuzzy controller parameters.
This paper describes an extension of a GA-based, separate-and-conquer propositional rule incuction algorithm called SIA. While the original algorithm is computationally attractive and is also able to handle both nominal and continuous attributes efficiently, our algorithm further improves it by taking into account of the recent advances in the rule induction and evolutionary computation communities. The refined system has been compared to other GA-based and non GA-based rule learning algorithms on a number of benchmark datasets from the UCI machine learning Repository. Results show that the proposed system can achieve higher performance while still produces a smaller number of rules.
ALE is a classication-oriented model based on cooperative agent aggregates spread over a two dimensional board. The model solves classi cation problems descrived by continuous-valued imputs. This paper describes ALE focussing on a key point, resources allocation. The main apportation is that using an accurate resources allocation we can clearly improve convergence speed at the same time that agents aggregate complexity reduces
Learning classifier systems tend to inherit -a priori- a given knowledge representation language for expressing the concepts to learn. Hence, even before getting started, this choice biases what can be learned, becoming critical for some real-world applications like data mining. However, such bias may be minimized by hybridizing different knowledge representations via evolutionary mixing. This paper presents a first attempt to produce an evolutionary framework that evolves mixed decision trees of heterogeneous knowledge representations.
This paper reviews a competent Pittsburgh LCS that automatically mines important substructures of the underlying problems and takes problems that were intractable with first-generation Pittsburgh LCS and renders them tractable. Specifically, we propose a χ-ary extended compact classifier system (χeCCS) which uses (1) a competent genetic algorithm (GA) in the form of χ-ary extended compact genetic algorithm, and (2) a niching method in the form restricted tournament replacement, to evolve a set of maximally accurate and maximally general rules. Besides showing that linkage exist on the multiplexer problem, and that χeCCS scales exponentially with the number of address bits (building block size) and quadratically with the problem size, this paper also explores non-traditional rule encodings. Gene expression encodings, such as the Karva language, can also be used to build χeCCS probabilistic models. However, results show that the traditional ternary encoding 0,1,# presents a better scalability than the gene expression inspired ones for problems requiring binary conditions
In this paper we introduce XCSF with support vector prediction:the problem of learning the prediction function is solved as a support vector regression problem and each classifier exploits a Support Vector Machine to compute the prediction. In XCSF with support vector prediction, XCSFsvm, the genetic algorithm adapts classifier conditions, classifier actions, and the SVM kernel parameters.We compare XCSF with support vector prediction to XCSF with linear prediction on the approximation of four test functions.Our results suggest that XCSF with support vector prediction compared to XCSF with linear prediction (i) is able to evolve accurate approximations of more difficult functions, (ii) has better generalization capabilities and (iii) learns faster.
The estimation of the classifier error plays a key role in accuracy-based learning classifier systems. In this paper we study the current definition of the classifier error in XCSF and discuss the limitations of the algorithm that is currently used to compute the classifier error estimate from online experience. Subsequently, we introduce a new definition for the classifier error and apply the Bayes Linear Analysis framework to find a more accurate and reliable error estimate. This results in two incremental error estimate update algorithms that we compare empirically to the performance of the currently applied approach. Our results suggest that the new estimation algorithms can improve the generalization capabilities of XCSF, especially when the action-set subsumption operator is used.
We propose a new learning agent architecture for collaborative learning. To learn any complicated task in multi-agent environment, simple reinforcement architectures have limitations on learning. Therefore, we propose splitting learning mechanism into three separate layers to learn required behavior, which are respectively organized by the Classifier System [Holland86]. It can learn to communicate with other agents, to make plans, and to select actions based on the plans and other agents' behavior. We show that these agents can select cooperative actions as a collaborative group.
A selectionist recognition system is presented that categorizes inputs generation by pattern generators.
This work has no abstract.
This paper analyzes the suitability of reinforcement learning for both programming and adapting situated agents. In the the first part of the paper we discuss two specific reinforcement learning algorithms: Q-learning and the Bucket Brigade. We introduce a special case of the Bucket Brigade, and analyze and compare its performance to Q- learning in a number of experiments. The second part of the paper discusses the key problems of reinforcement learning: time and space complexity, input generalization, sensitivity to parameter values, and selection of the reinforcement function. We address the tradeoff between the amount of built in and learned knowledge in the context of the number of training examples required by a learning algorithm. Finally, we suggest directions for future research.
Approaches for rule based control often rely heavily on the pre-classification of the state space for their success. In the pre-determined regions individual or groups of rules may be learned. Clearly, the success of such strategies depends on the quality of the partitioning of the state space. When no such a priori partitioning is available it is a significantly more difficult task to learn an appropriate division of the state space as well as the associated rules. Yet another layer of potential difficulty is the nature of the reinforcement applied to the rules since it is not always possible to generate an immediate reinforcement signal to supply judgement on the efficacy of activated rules. One approach to combine the joint goals of determining partitioning of the state space and discovery of associated rules is to use a genetic algorithm employing a restricted mating policy to generate rule clusters which dominate regions of the state space thereby effecting the required partitioning. Such rule clusters are termed niches. A niching algorithm, which includes a `creche' facility to protect `inexperienced' classifiers, and the results of determining a simple 2D state space using an immediate reward scheme are presented. Details of how the algorithm may modified to incorporate a delayed reinforcement scheme on a real-world beam balancing control problem are reported.
This article describes a learning classifier system (LCS) approach to relational reinforcement learning (RRL). The system, Foxcs-2, is a derivative of XCS that learns rules expressed as definite clauses over first-order logic. By adopting the LCS approach, Foxcs-2, unlike many RRL systems, is a general, model-free and “tabula rasa” system. The change in representation from bit-strings in XCS to first-order logic in Foxcs-2 necessitates modifications, described within, to support matching, covering, mutation and several other functions. Evaluation on inductive logic programming (ILP) and RRL tasks shows that the performance of Foxcs-2 is comparable to other systems. Further evaluation on RRL tasks highlights a significant advantage of Foxcs-2’s rule language: in some environments it is able to represent policies that are genuinely scalable; that is, policies that are independent of the size of the environment.
Machine learning methods usually represent knowledge and hypotheses using attribute-value languages, principally because of their simplicity and demonstrated utility over a broad variety of problems. However, attribute-value languages have limited expressive power and for some problems the target function can only be expressed as an exhaustive conjunction of specific cases. Such problems are handled better with inductive logic programming (ILP) or relational reinforcement learning (RRL), which employ more expressive languages, typically languages over first-order logic. Methods developed within these fields generally extend upon attribute-value algorithms; however, many attribute-value algorithms that are potentially viable for RRL, the younger of the two fields, remain to be extended. This thesis investigates an approach to RRL derived from the learning classifier system XCS. In brief, the new system, FOXCS, generates, evaluates, and evolves a population of ``condition-action'' rules that are definite clauses over first-order logic. The rules are typically comprehensible enough to be understood by humans and can be inspected to determine the acquired principles. Key properties of FOXCS, which are inherited from XCS, are that it is general (applies to arbitrary Markov decision processes), model-free (rewards and state transitions are ``black box'' functions), and ``tabula rasa'' (the initial policy can be unspecified). Furthermore, in contrast to decision tree learning, its rule-based approach is ideal for incrementally learning expressions over first-order logic, a valuable characteristic for an RRL system. Perhaps the most novel aspect of FOXCS is its inductive component, which synthesizes evolutionary computation and first-order logic refinement for incremental learning. New evolutionary operators were developed because previous combinations of evolutionary computation and first-order logic were non-incremental. The effectiveness of the inductive component was empirically demonstrated by benchmarking on ILP tasks, which found that FOXCS produced hypotheses of comparable accuracy to several well-known ILP algorithms. Further benchmarking on RRL tasks found that the optimality of the policies learnt were at least comparable to those of existing RRL systems. Finally, a significant advantage of its use of variables in rules was demonstrated: unlike RRL systems that did not use variables, FOXCS, with appropriate extensions, learnt scalable policies that were genuinely independent of the dimensionality of the task environment.
Reinforcement learning (RL) consists of methods that automatically adjust behaviour based on numerical rewards and penalties. While use of the attribute-value framework is widespread in RL, it has limited expressive power. Logic languages, such as first-order logic, provide a more expressive framework, and their use in RL has led to the field of relational RL. This thesis develops a system for relational RL based on learning classifier systems (LCS). In brief, the system generates, evolves, and evaluates a population of condition-action rules, which take the form of definite clauses over first-order logic. Adopting the LCS approach allows the resulting system to integrate several desirable qualities: model-free and "tabula rasa" learning; a Markov Decision Process problem model; and importantly, support for variables as a principal mechanism for generalisation. The utility of variables is demonstrated by the system's ability to learn genuinely scalable behaviour - behaviour learnt in small environments that translates to arbitrarily large versions of the environment without the need for retraining.
In this paper, we study the means of developing an imitation process allowing to improve learning in the framework of learning classifier systems. We present three different approaches in the way a behavior observed may be taken into account through a guidance interaction: two approaches using a model of this behavior, and one without modelling. Those approaches are evaluated and compared in different environments when they are applied to three major classifier systems: ZCS, XCS and ACS. Results are analyzed and discussed. They highlight the importance of using a model of the observed behavior to enable an efficient imitation. Moreover, they show the advantages of taking this model into account by a specialized internal action. Finally, they bring new results of comparison between ZCS, XCS and ACS.
A methodology is described for studying the dynamical behavior of classifier systems. The methodology is useful because of the current lack of analytical results describing interactions among the various components of classifier systems. A mapping is defined between classifier systems and an equivalent dynamical system (Boolean networks). The mapping provides a way to understand and predict classifier system behaviors by observing the dynamical behavior of the Boolean networks. The paper reports initial results produced by the methodology and discusses the implications of this approach for classifier systems.
Genetic algorithms are computational models of evolution that play a central role in many artificial-life models. We review the history and current scope of research on genetic algorithms in artificial life, using illustrative examples in which the genetic algorithm is used to study how learning and evolution interact, and to model ecosystems, immune system, cognitive systems, and social systems. We also outline a number of open questions and future directions for genetic algorithms in artificial-life research.
Human economic decisions are characterized by a number of factors which make them difficult to model with standard mathematical tools. Decisions can be more easily described by a set of rules, and some of them may be rules of thumb. Economic behaviour is adaptive, in that people are able to adjust to a changing environment. It is argued in this paper that the classifier system framework is a suitable means of modelling human economic decisions. A case of a simple economic decision of finding an optimal price is discussed, which is later made more complex by introducing an input variable that effects the optimal price. It is shown that classifier systems can be used in both tasks, and their performance is compared to human decisions in the same set of circumstances.
There are few contributions to robot autonomous navigation applying Learning Classifier Systems (LCS) to date. The primary objective of this work is to analyse the performance of the strength-based LCS and the accuracy-based LCS, named EXtended Learning Classifier System (XCS), when applied to two distinct robotic tasks. The first task is purely reactive, which means that the action to be performed can rely only on the current status of the sensors. The second one is non-reactive, which means that the robot might use some kind of memory to be able to deal with aliasing states. This work presents a rule evolution analysis, giving examples of evolved populations and their peculiarities for both systems. A review of LCS derivatives in robotics is provided together with a discussion of the main findings and an outline of future investigations.
Paper is an extended abstract
There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications.
This paper presents a reinforcement learning algorithm: ``Partitioning Q-learning'', designed for generating an adaptive behavior of a reactive system with local perception in a complex and changing environment. This algorithm includes two dynamics: the learning algorithm based on the Q-learning and Bucket Brigade algorithms, and the structural dynamics (the partitioning of regions of the state-action space) that modelizes the acquisition of expert knowledge. The combination of these two dynamics intends to solve the problem of the combinatory explosion of the number of qualities to be estimated (the generalization problem), by dividing the state-action space into a minimal number of homogeneous regions using the formalism of Classifier Systems. This algorithm is applied to the simulation of a reactive robot which tries to cut weeds and to avoid plants in a cultivated field.
Generality is a recurrent theme in automated inductive systems. Induction of general patterns/rules is of course complicated by several factors. For example, higher levels of uncertainty and error are naturally introduced by generality. Moreover, it is not clear what sort of trade-off should be sought between increasing generality and decreasing predictive power. As a result, specific criteria to guide the search for useful general rules do not abound. In this paper, I reconsider these issues in the context of the generalized, fuzzy-like classifier system first proposed by Frey and Slate (1991) and later equipped with a Bayesian learning component by Muruzabal (1998). A crucial feature of this approach is that uncertainty is probabilistically measured at each classifier in the population. A new reinforcement policy exploiting this probabilistic structure and priming cooperation among general classifiers is introduced and shown to promote the stability of niches of reasonably high predictive power. The underlying genetic algorithm contributes effectively to learning although it somehow counteracts the built-in bias towards generality.
It is difficult to represent free-form shape features in a solid geometric model which is capable of holding and manipulating them after synthesis. In this paper, we propose a new model of representation of free-form shape features for this task. The key concept of this study is the Shape Feature Generation Process model (SFGP model) which consists of numerous sets of rules using a classifier system (CS). The idea of developmental biology is applied to a development of a computational model of the representation called the cell division model. In this model, the rules are evolved through interaction between cells and its growing free-form shapes. Finally, a computer program is developed for evaluation of the model by combination of two existing shapes for verification that the shape features are preserved in the combined shapes. It is demonstrated that the model can produce a variety of combined shapes with original, often exaggerated, features and the rules which hold features can be specified.
Abstract¿In this paper we show, in a constructive way, that there are problems for which the use of genetic algorithm based learning systems can be at least as effective as traditional symbolic or connectionist approaches. To this aim, the system REGAL* is briefly described, and its application to two classical benchmarks for Machine Learning is discussed, by comparing the results with the best ones published in the literature.
The detection of intrusions over computer networks (i.e., network access by non-authorized users) can be cast to the task of detecting anomalous patterns of network traffic. In this case, models of normal traffic have to be determined and compared against the current network traffic. Data mining systems based on Genetic Algorithms can contribute powerful search techniques for the acquisition of patterns of the network traffic from the large amount of data made available by audit tools. We compare models of network traffic acquired by a system based on a distributed genetic algorithm with the ones acquired by a system based on greedy heuristics. Also we provide an empirical proof that representation change of the network data can result in a significant increase in the classification performances of the traffic models. Network data made available from the Information Exploration Shootout project and the 1998 DARPA Intrusion Detection Evaluation have been chosen as experimental testbed.
We present a Genetic Algorithm (GA)-based method for automatically inducing control rules for a dynamic physical system. A GA is used to induce control rules for a dynamic physical system: a simulated pole-cart system. The task is to move a wheeled cart, with a rigid pole hinged on top of it, along a bounded straight track without the pole falling beyond a predefined vertical angle and without the cart going off the ends of the track limits. This is achieved by applying a force of fixed magnitude to the left or right of the cart. The dynamics of the system are unknown to the GA. The only information for evaluating performance is a failure signal indicating that the pole-cart system is out of control. This presents a genuinely difficult assignment problem. We compare the performance of the method with other learning algorithm for the same task. We also compare the ability of the algorithms to adapt to changing conditions. We conclude that genetic algorithm is both effective and robust.
In this paper, an application of learning classifier systems is presented. An artificial multi-agent environment has been designed. Mate finding problem, a learning task inspired by nature, is considered which needs cooperation by two distinct agents to achieve the goal. The main feature of our system is existence of two parallel learning subsystems which have to agree on a common communication protocol to succeed in accomplishing the task. Apart from standard learning algorithms, a unification mechanism has been introduced to encourage coordinated behavior among the agents belonging to the same class. Experimental results are presented which demonstrate the effectiveness of this mechanism and the learning capabilities of classifier systems.
The relationship between genetics-based learning, neural network learning and typical AI-type symbolic learning approaches is highlighted by showing how each of the approaches can be mapped onto a common mathematical framework. This involves the mapping of the respective representations onto a structure called a lattice. We describe how the graphical representation of a lattice constructed and explain how it models the learning processes manifested in all three the approaches mentioned.
This paper provides a deep insight into the learning mechanisms of UCS, a learning classifier system (LCS) derived from XCS that works under a supervised learning scheme. A complete description of the system is given with the aim of being useful as an implementation guide. Besides, we review the fitness computation, based on the individual accuracy of each rule, and introduce a fitness sharing scheme to UCS. We analyze the dynamics of UCS both with fitness sharing and without fitness sharing over five binary-input problems widely used in the LCSs framework. Also XCS is included in the comparison to analyze the differences in behavior between both systems. Results show the benefits of fitness sharing in all the tested problems, specially those with class imbalances. Comparison with XCS highlights the dynamics differences between both systems.
This paper analyzes the scalability of the population size required in XCS to maintain nichesthat are infrequently activated.Facetwise models have been developed to predict the effect of the imbalance ratio-ratio betweenthe number of instances of the majority class and the minority class that are sampled to XCS-on population initialization, andon the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially. The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers' parameters, mutation, and subsumption are analyzed, and improvements in XCS's mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
This paper presents Fuzzy-UCS, a Michigan-style Learning Fuzzy-Classifier System designed for supervised learning tasks. Fuzzy-UCS combines the generalization capabilities of UCS with the good interpretability of fuzzy rules to evolve highly accurate and understandable rule sets. Fuzzy-UCS is tested on a large collection of real-world problems, and compared to UCS and three highly-used machine learning techniques: the decision tree C4.5, the support vector machine SMO, and the fuzzy boosting algorithm Fuzzy LogitBoost. The results show that Fuzzy-UCS is highly competitive with respect to the four learners in terms of performance, and that the fuzzy representation permits a much better understandability of the evolved knowledge. These promising results of the online architecture of Fuzzy-UCS allow for further research and application of the system to new challenging problems.
During the last decade, research on Genetic-Based Machine Learning has resulted in several proposals of supervised learning methodologies that use evolutionary algorithms to evolve rule-based classification models. Usually, these new GBML approaches are accompanied by little experimentation and there is a lack of comparisons among different proposals. Besides, the competitiveness of GBML systems with respect to non-evolutionary, highly-used machine learning techniques has only been partially studied. This paper reviews the state of the art in GBML, selects some of the best representatives of different families, and compares the accuracy and the interpretability of their models. The paper also analyzes the behavior of the GBML approaches with respect to some of the most influential machine learning techniques that belong to different learning paradigms such as decision trees, support vector machines, instance-based classifiers, and probabilistic classifiers. The experimental observations emphasize the suitability of GBML systems for performing classification tasks. Moreover, the analysis points out the strengths of the different systems, which can be used as recommendation guidelines on which systems should be employed depending on whether the user prefers to maximize the accuracy or the interpretability of the models.
This paper presents a learning methodology based on a substructural classification model to solve decomposable classification problems. The proposed method consists of three important components: (1) a structural model, which represents salient interactions between attributes for a given data, (2) a surrogate model, which provides a functional approximation of the output as a function of attributes, and (3) a classification model, which predicts the class for new inputs. The structural model is used to infer the functional form of the surrogate. Its coefficients are estimated using linear regression methods. The classification model uses a maximally-accurate, least-complex surrogate to predict the output for given inputs. The structural model that yields an optimal classification model is searched using an iterative greedy search heuristic. Results show that the proposed method successfully detects the interacting variables in hierarchical problems, groups them in linkages groups, and builds maximally accurate classification models. The initial results on non-trivial hierarchical test problems indicate that the proposed method holds promise and also shed light on several improvements to enhance the capabilities of the proposed method.
In this article, we present a new approach to fuzzy classifier system learning, and experiment with it on three different functions. This new approach is generic in the sense that it not only allows learning of fuzzy rules, but also of membership functions and output weights. Moreover, our algorithm is simple and yet exhibits good results.
ALECSYS, an implementation of a learning classifier system (LCS) on a net of transputers was utilised to train a robot arm to solve a light approaching task. This task, as well as more complicated ones, has already been learnt by ALECSYS implemented on AutonoMouse, a small autonomous robot. The main difference between the present and previous applications are, one, the robot arm has asymmetric constraints on its effectors, and two, given its higher number of internal degrees of freedom, and its non anthropomorphic shape, it was not obvious, as it was with the AutonoMouse, where to place the visual sensors and what sort of proprioceptive (the angular position of the arm joints) information to provide to support learning. We report results of a number of exploratory simulations of the robot arm's relative success in learning to perform the light approaching task with a number of combinations of visual and proprioceptive sensors. On the bases of results of such trials it was possible to derive a near optimum combination of sensors which is now being implemented on a real robot arm (an IBM 7547 with a SCARA geometry). Finally, the implications these findings, particularly with reference to LCS based evolutionary approach to learning, are discussed.
The adaptive capabilities of biological systems have provided the inspiration for numerous developments in computer science, this article focuses on evolutionary computation. Following a review the general selectionist agenda in evolutionary computing, a number of new biological sources ideas are discussed. Two experimental designs are discussed: directed adaptation and an enhancement to Genetic Based Learning in a Classifier System called the Life History GA. Some preliminary results using these ideas are then presented. The concluding section reflects on some general issues associated with the design of evolutionary systems.
Recent developments in computer science have made it possible to simulate whole populations of artificial agents within the confines of a single personal computer. These artificial agents can be programmed to act in ways that mimic the behaviour of physical, biological and economic agents. Such developments come at a time when researchers in strategic management have been complaining about a lack of data to test theoretical predictions. There is a perception that the ability to generate novel hypotheses has greatly exceeded the ability to empirically test them. This dissertation investigates whether agent-based simulation can be used to test hypotheses in strategic management. SELESTE is a highly-abstract artificial world that was developed using concepts from Amit and Schoemaker's (1993) integrated theory of strategy. SELESTE provided the environment, or strategic landscape, for our artificial agents to explore. The agents themselves were modelled using an algorithm from artificial intelligence known as a learning classifier system. Artificial agents in SELESTE were shown to behave in similar ways to human agents. Three studies were selected to showcase the range of problems that agent-based simulation can address. The first problem investigated whether differences in the cognitive capacity of firms led to sustainable differences in performance. The importance of differences in cognitive capacity was shown to decline as the absolute level of cognitive capacity increased. The second study investigated the conditions under which imitation proved a superior strategy to innovation. It was revealed that imitation was a powerful strategy under all but the most extreme conditions. The final study revealed that firms that divided their operations into multiple profit-centres or 'patches' performed better than firms organised as a single profit-centre. It was concluded that agent-based simulation represented a useful method for exploring problems in strategic management. Calls were made to establish a research program in agent-based simulation to build on these findings and to refine existing techniques.
We present two architectures, each designed to search 2-Dimensional mazes in order to locate a ``goal'' position, both of which perform on-line learning as the search proceeds. The first architecture is a form of Adaptive Heuristic Critic which uses a Genetic Algorithm to determine the Action Policy and a Radial Basis Function Neural Network to store the acquired knowledge of the Critic. The second is a stimulus-response Classifier System (CS) which uses a Genetic Algorithm, applied ``Michigan'' style, for rule generation and the ``Bucket Brigade'' algorithm for rule reinforcement. Experiments conducted using agents based upon each architectural model lead us to a comparison of performance, and some observations on the nature and relative levels of abstraction in the acquired knowledge.
We describe two architectures that autonomously acquire fuzzy control rules to provide reactive behavioural competencies in a simulated mobile robotics application. One architecture is a `Pittsburgh'-style Fuzzy Classifier System (Pitt1). The other architecture is a `Michigan'-style Fuzzy Classifier System (Mich1). We tested the architectures on the ability to acquire an ``investigative'' obstacle avoidance competency. We found that Mich1 implemented a more local incremental search than the other architecture. In simpler environments Mich1 was typically able to find adequate solutions with significantly fewer fitness evaluations. Since fitness evaluation can be very time consuming in this application, it could be a strong positive factor. However, when the rule set must implement a competency in more complex environments, the situation is somewhat different. The superior ability of Pitt1 to retain a number of schema in the population during the process of optimisation, is then a crucial strength.
We present a coevolutionary approach to learning sequential decision rules which appears to have a number of advantages over non-coevolutionary approaches. The coevolutionary approach encourages the formation of stable niches representing simpler subbehaviors. The evolutionary direction of each subbehavior can be controlled independently, providing an alternative to evolving complex behavior using intermediate training steps. Results are presented showing a significant learning rate speedup over a noncoevolutionary approach in a simulated robot domain. In addition, the results suggest the coevolutionary approach may lead to emergent problem decompositions.
In this paper, we introduce a case-based method of initializing genetic algorithms that are used to guide search in changing environments. This is incorporated in an anytime learning system. Anytime learning is a general approach to continuous learning in a changing environment. The agent's learning module continuously tests new strategies against a simulation model of the task environment, and dynamically updates the knowledge base used by the agent on the basis of the results. The execution module includes a monitor that can dynamically modify the simulation model based on its observations of the external environment; an update to the simulation model causes the learning system to restart learning. Previous work has shown that genetic algorithms provide an appropriate search mechanism for anytime learning. This paper extends the approach by including strategies, which are learned under similar environmental conditions, in the initial population of the genetic algorithm. Experiments show that case-based initialization of the population results in a significantly improved performance.
We discuss a case-based method of initializing genetic algorithms that are used to guide search in changing environments. This is incorporated in an anytime learning system. Anytime learning is a general approach to continuous learning in a changing environment. A genetic algorithm with a case-based component provides an appropriate search mechanism for anytime learning. When the genetic algorithm is restarted, strategies which were previously learned under similar environmental conditions are included in the initial population of the genetic algorithm. We have evaluated the system by comparing performance with and without the case-based component, and case-based initialization of the population results in a significantly improved performance.
This paper stems from the observation that in a game of poker, if our opponents follow a strategy and do not play randomly, there must be some set of rules such that if we follow those rules we will maximise our profits. Perhaps a Genetic Algorithm would be able to discover such a set of rules. This report looks into how feasible such an approach might be, and after some analysis, makes an attempt at an implementation.
A learning classifier system complex is developed in order to accomplish the broader goal of developing a methodology to perform generalized zeroth-order two-and three-dimensional shape optimization. Specifically, the methodology has the objective of determining the optimal boundary to minimize mass while satisfying constraints on stress and geometry. Even with the enormous advances in shape optimization no method has proven to be satisfactory across the broad spectrum of optimization problems facing the modern engineer. Similarly the available software in the field of learning classifier systems is so embryonic that a new software package has to be developed for this application. The shape optimization via hypothesizing inductive classifier system complex (SPHINcsX) instantiates the methodology in a software package overcoming many of the limitations of today's conventional shape optimization techniques, while advancing the state-of-the-art in classifier system software tools.
A methodology to perform generalized zeroth-order two- and three-dimensional shape optimization utilizing a learning classifier system is developed and applied. To this end, the applicability of machine learning to mechanical engineering is investigated. Specifically, the methodology has the objective of determining the optimal boundary to minimize mass while satisfying constraints on stress and geometry. Even with the enormous advances in shape optimization no method has proven to be satisfactory across the broad spectrum of optimization problems facing the modern engineer. The methodology developed in this book is based upon a classifier system (CS) and exploits the CS's adaptability and generality. It thereby overcomes many of the limitations of today's conventional shape optimization techniques. A CS learns rules, postulated as if-then statements, in order to improve its performance in an arbitrary environment, (which for this investigation consists of stress and mass information from components). From this input, and from a population of initially randomly generated rules, the classifier system is expected to learn to make the appropriate component shape modifications to reach a minimum mass design while satisfying all stress constraints. The CS learns by utilizing the design improvement success or failure feedback. Nearly all shape optimization algorithms developed to date depend on sensitivity information in order to function. This research does not present sensitivity information to the classifier system. Thus, the classifier system must not only learn from a clean slate, but confronts the additional challenge of learning without information that most other shape optimization algorithms deem essential. Therefore, the main deliverable is a zeroth-order shape optimization methodology. After a review of mechanical engineering shape optimization methods, an explanatory presentation of CSs and their underlying genetic algorithm (GA) describes how classifier systems learn from feedback and the GA. With this foundation set, the coupling of the shape optimization domain with the classifier system proceeds to form, the Shape oPtimization via Hypothesizing Inductive classifier system compleX (SPHINcsX). The complex learns shape optimization by its application to a suite of sizing optimization problems. The most tangible artifact of this research is the successful development of the zeroth-order shape optimization complex. The complex proved adept at solving both two- and three-dimensional shape optimization problems. The research also provides a demonstrative example of the power and flexibility of machine learning in general and CSs in particular -- how they may be leveraged as tools for mechanical engineering design, and insights into their proper application.
In Holland-type classifier systems the bucket brigade algorithm allocates strength (``credit'') to classifiers that lead to rewards from environment. This paper presents results that show the bucket brigade algorithm basically works as designed -- strength is passed down sequences of coupled classifiers from those classifiers that receive rewards directly from the environment to those that are stage setters. Results indicate it can take a fairly large number of trials for a classifier system to respond to changes in its environment by reallocating strength down competing sequences of classifiers that implement simple reflex and non-reflex behaviors. However, ``bridging classifiers'' are shown to dramatically decrease the number of times a long sequence must be executed in order reallocate strength to all the classifiers in the sequence. Bridging classifiers also were shown to be one way to avoid problems caused by sharing classifiers across competing sequences.
Learning systems that operate in environments with huge numbers of states must be able to categorize the states into equivalence classes that can be treated alike. Holland-type classifier systems can learn to categorize states by building default hierarchies of classifiers (rules). However, for default hierarchies to work properly classifiers that implement exception rules must be able to control the system when they are applicable, thus preventing the default rules from making mistakes. This paper presents results that show the standard bucket brigade algorithm does not lead to correct exception rules always winning the competition with the default rules they protect. A simple modification to the bucket brigade algorithm is suggested, and results are presented that show this modification works as desired: default hierarchies can be made to achieve payoff rates as near to optimal as desired.
Sequences of coupled classifiers are a basic component of higher level knowledge and control structures in classifier systems that follow the ``Michigan'' approach. This paper explores mechanisms for promoting the emergence of coupled sequences in task domains that provide payoff only after a series of coordinated actions. The results indicate that useful coupled chains do emerge once some minor changes are made to the basic classifier system.
Default hierarchies have been proposed as a way for classifier systems to categorize events efficiently and as accurately as necessary. This paper studies the emergence of default hierarchies for tasks that favor use of default hierarchies over homomorphic models. The results indicate that default hierarchies do rapidly emerge, but that they also tend to be replaced by homomorphic models.
Classifier systems (CSs) have been used to simulate and describe the behavior of adaptive organisms, animats, and robots. However, classifier system implementations to date have all been reactive systems, which use simple S-R rules and which base their learning algorithms on trial-and-error reinforcement techniques similar to the Hullian Law of Effect. While these systems have exhibited interesting behavior and good adaptive capacity, they cannot do other types of learning which require having explicit internal models of the external world, e.g., using complex plans as humans do, or doing latent learning of the type observed in rats. This paper describes a classifier system that is able to learn and use internal models both to greatly decrease the time to learn general sequential decision tasks and to enable the system to exhibit latent learning.
A classifier system is used to model human performance on a simple deterministic discrimination task in which subjects must acquire categories based on their experience. The performance of the classifier system is compared to data from experiments with humans and to the performance of an adaptive neural net model described by Gluck and Bower. The classifier system is able to replicate data on human performance, including one observation not replicated by the neural net model. The classifier system also misses one prediction the neural net makes correctly. Three keys to the classifier system's performance are: (1) default hierarchies in which exceptions usually overrule more general rules; (2) a bucket brigade algorithm in which each classifier pays the average bid made by co-winning rules that produce the same message it does (rather than just paying its own bid) and receives an equal share of any payoff; and (3) the use of a bid tax.
Paper is an extended abstract
This paper suggests that in the context of autonomous agents and generation of intelligent behavior for such agents, a more important focus should be held on the symbolic context that forms the basis of computer programs. Basically, software agents are symbolic entities living in a symbolic world and this has an effect on how one should think about designing frameworks for their evolution or learning. We will relate the problem of symbol grounding to that of sensory information available to agents. We will then introduce an experimental environment based on virtual worlds called EMuds, where both human and artificial agents can interact. Next, we show how it can be applied in the framework of multi-agent systems to address emergence based problems and report preliminary results. We then conclude with some ongoing and future work.
In the short history of Genetic Algorithms, there have been a plethora of techniques used. Even within the subfield of classifier systems, many differing implementation exist. It becomes difficult to compare ones results with others, and to determine the cause of actual performance differences. To that end I have attempted a rational reconstruction encompassing two systems described in the literature: Wilson's Animat and the CS-1 system of Holland and Reitman. The results obtained differ, sometimes markedly, from the published versions they attempt to duplicate.
Classifier systems were developed as a scheme for applying genetic algorithms to problems where they would otherwise be difficult to utilize. The bucket brigade algorithm has been used to handle the credit assignment problem associated with classifier systems. This research demonstrates an alternative to bucket brigade based upon dynamic planning and Q-learning. The advantages of the new system, Dyna-Q-CS, include the construction of an explicit world model that speeds learning in the absence of plentiful reward information and is amenable to sophisticated planning techniques. The experimental results show that Dyna-Q-CS learns as fast, or faster than the particular bucket brigade system used (based upon Wilson's Animat), but that the asymptotic performance of Dyna-Q-CS needs improvement. Some form of annealing could solve this asymptotic performance problem.
This paper describes two classifier systems that learn. These are rulebased systems that use genetic algorithms, which are based on an analogy with natural selection and genetics, as their principal learning mechanism, and an economic model as their principal mechanism for apportioning credit. CFS-C is a domain independent learning system that has been widely tested on serial computers. *CFS is a parallel implementation of CFS-C that makes full use of the inherent parallelism of classifier systems and genetic algorithms, and that allows the exploration of large-scale tasks that were formerly impractical. As with other approaches to learning, classifier systems in their current form in their current form work well for moderately-sized tasks but break down for larger tasks. In order to shed light on this issue, we present several empirical studies of known issues in classifier systems, including the effects of population size, the actual contribution of genetic algorithms, the use of rule chaining in solving higher-order tasks, and issues of task representation and dynamic population convergence. We conclude with a discussion of some major unresolved issues in learning classifier systems and some possible approaches to making them more effective on complex tasks.
A new evolutionary search algorithm, called BGP, to be used for classification tasks in data mining, is introduced. It is different from existing evolutionary techniques in that it does not use indirect representations of a solution, such as bit strings or grammars. The algorithm uses decision trees of various sizes as individuals in the populations and operators, e.g. crossover, are performed directly on the trees. when compared to C4.5 and CN2 on a benchmark of problems, BGP shows very good results.
Over the last decade Learning Classifier Systems (LCS) have received increasing attention from researchers motivated to develop flexible machine learning devices. However, they were not scalable; i.e., if used to solve real world problems, LCS are outperformed. One has to deal with vast sets of classifiers with large bit length. Learning in distributed environments, on the other hand, constitutes another potential application area for the LCS, if one can dispose of conceptual models for LCS' distribution. One way to by-pass these problems is implementing LCS on parallel hardware, improving the systems performance, and the use of efficient structural organisational models to avoid complexity. Working on this direction an agent-oriented architecture based on the blackboard paradigm, a model that brings parallelism to the LCS functionalities and caters for the distribution of classifiers and associates processes, was object of study. The resulting system is a distributed processing system called DICE (A DIstributed Computational Environment for Genetic-Based Classifier Systems). To achieve these goals, various domains of knowledge were exploited including distributed processing systems, agent-based systems, blackboard-based systems, logic programming and parallel genetic algorithms. The DICE system is the consolidation of such knowledge, a computation medium that improves LCS' performance and allows for the parallelization and distribution of classifiers in a modular and flexible way, augmenting the scope of application of LCS.
Dans le domaine des interfaces pour les dispositifs de la réalité virtuelle, les périphériques de dialogue perturbent les sens et augmentent les difficultés de la communication homme/machine. Afin de minimiser cette perturbation et pour fournir une assistance efficace à chaque utilisateur, nous proposons d'utiliser un système d'apprentissage basé sur les outils de la vie artificielle.
This document has no abstract
This paper presents a learning system based on artificial life that uses short-term memory and knowledge sharing. Inspired from classifier systems, the model allows to generate behaviors for agents integrated in a multi-task environment. The user, immersed in the scene, interacts through his clone with autonomous actors. By his own behavior, he influences the agents' one. An agent perceives the world through sensors and acts through effectors in order to produce rules (called classifiers). Rewards from the environment allow to adjust the strength of every rule that is used in order to define the best behavior. The ``sending message'' protocol has been included to increase the performances of t he system in complex environment. By combining communication and evolution, we then produce a realtime application (a virtual soccer) where the user plays with the other agents. After a short period of adaptation, the simulation gives some positive results: a coherent global behavior is built by the teams.
This paper presents a new architecture of a classifier system for learning in virtual environments. The model will be integrated in our multi-user platform to provide interaction between intelligent agents and user clones. An agent is an autonomous entity equipped with sensors and effectors. Its behavior is guided by rewards coming from the environment that produce rules called classifiers. The knowledge is shared between agents by using the ``sending-message'' protocol to increase the global efficiency of the group. The classifier system is specially adapted to a multi-task environment and incorporates a short-term memory to record the recent events of the simulation. These ideas have been implemented and used to develop a virtual soccer where the user plays with autonomous agents that combine communication and evolution.
In this paper, we present a behavioral system based on artificial life for animating actors in a virtual reality application. Through a virtual soccer game, we show how a set of autonomous players (called agents) can cooperate and communicate to perform common tasks. The user is immersed in the game. He interacts with the other agents and he is integrated in the cooperation and in the communication systems. Every entity reacts in real-time by using a classifiers system which is composed of a set of binary rules and a reward system. The originality of such method is the ability to build a behavior (by emergence) without initial knowledge. The analysis of the simulation gives interesting results: after convergence, the global behavior of the teams produces coherent movements. Moreover, the introduction of disturbances does not affect the performances of the classifiers system.
It has been known for some time that Learning Classifier Systems (Holland, 1986) have potential for application as Data Mining tools. Parodi and Bonelli (1990) applied the Boole LCS (Wilson, 1985) to a Lymphography data set and reported 82% classification rates. More recent work, such as GA-Miner (Flockhart,1995) has sought to extend the application of LCS to larger commercial data sets, introducing more complex attribute encoding techniques, static niching, and hybrid genetic operators in order to address the problems presented by large search spaces. Despite these results, the traditional LCS formulation has shown itself to be unreliable in the formation of accurate optimal generalisations, which are vital for the reduction of results to a human readable form. XCS (Wilson, 1995, 1998) has been shown to be capable of generating a complete and optimally accurate mapping of a test environment (Kovacs, 1996) and therefore presents a new opportunity for the application of Learning Classifier Systems to Data Mining. As part of a continuing research effort this paper presents some first results in the application of XCS to a Data Mining task. It demonstrates that XCS is able to produce a classification performance and rule set which exceeds the performance of most current Machine Learning techniques when applied to the Monk's problems (Thrun, 1991).
It has been known for some time that Learning Classifier Systems (LCS) have potential for application as Data Mining tools. Parodi and Bonelli applied the Boole LCS to the Lymphography data set and reported 82 classification rates. More recent work, such as GA-Miner has sought to extend the application of the GA-based classification system to larger commercial data sets, introducing more complex attribute encoding techniques, static niching, and hybrid genetic operators in order to address the problems presented by large search spaces. Despite these results, the traditional LCS formulation has shown itself to be unreliable in the formation of accurate optimal generalisations, which are vital for the reduction of results to a human readable form. XCS has been shown to be capable of generating a complete and optimally accurate mapping of a test environment and therefore presents a new opportunity for the application of Learning Classifier Systems to the classification task in Data Mining. As part of a continuing research effort this paper presents some first results in the application of XCS to a particular Data Mining task. It demonstrates that XCS is able to produce a classification performance and rule set which exceeds the performance of most current Machine Learning techniques when applied to the Monk's problems.
This text presents a classifier system (CS), which is able to adapt to an environment by adjusting th activation probabilities of the rules and changing the rules itself. The operators for changing the rules are incorporated into the CS, thus allowing for an adaption of the rates of change on-line during the search process for better rules. An age is attached to the rules. Removal of rules from the rule set is done according to age. Experiments show that this approach to adapting the rule set by means of internal genetic operators (GO) is superior to exogenous genetic operators.
In this paper we describe a simple model of adaptive agents of different types, represented by Learning Classifier Systems (LCS), which make investment decisions about a risk free bond and a risky asset under a well defined stock market environment. Our main aim is to explore the degree of reliability that artificially intelligent agents can have when applied to real life economic problems. We do this by evaluating whether an LCS is able to represent competent traders in a real market scenario in which daily stock prices and dividends are given to the agents exogenously, so permitting us to focus on the dynamics and evolution of the behavior of these evolving traders without having to be concerned about how their actions affect the market. We present results of adaptive and non-adaptive simulations over a period of ten years of real data of a specific stock and show that the artificial agents, by displaying different and rich behaviours evolved throughout the simulations, are able to discover and refine novel and successful sets of market strategies that can outperform baseline strategies such as buy-and-hold or merely keeping money in the bank at a good rate of interest, even though the agents pay commission on every trade.
Paper is an extended abstract
This paper reports on a number of experiments where three different groups of artificial agents learn, forecast and trade their holdings in a real stock market scenario given exogeneously in the form of easily-obtained stock statistics such as various price moving averages, first difference in prices, volume ratios, etc. These artificial agent-types trade while learning during -- in most cases -- a ten year period. They normally start at the beginning of the year 1990 with a fixed initial wealth to trade over two assets (a bond and a stock) and end in the second half of the year 2000. The adaptive agents are represented as Learning Classifier Systems (LCSs), that is, as sets of bit-encoded rules. Each condition bit expresses the truth or falsehood of a certain real market condition. The actual conditions used differ between agents. The forecasting performance is then compared against the performance of the buy-and-hold strategy, and trend-following strategy and finally against the bank investment over the same period at a fixed compound interest rate. To make the experiments as real as possible, agents pay comissions on every trade. The results so far suggest that this is an excellent approach to make trading decisions in the stock market.
This paper discusses the use of evolutionary computation to evolve behaviors that exhibit emergent intelligent behavior. Genetic algorithms are used to learn navigation and collision avoidance behaviors for robots. The learning is performed under simulation, and the resulting behaviors are then used to control the actual robot. Some of the emergent behavior is described in detail.
The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile. The learning method relies on the notion of competition and employs genetic algorithms to search the space of decision policies. Experiments are presented that address issues arising from differences between the simulation model on which learning occurs and the target environment on which the decision rules are ultimately tested. Specifically, either the model or the target environment may contain noise. These experiments examine the effect of learning tactical plans without noise and then testing the plans in a noisy environment, and the effect of learning plans in a noisy simulator and then testing the plans in a noise-free environment. Empirical results show that, while best result are obtained when the training model closely matches the target environment, using a training environment that is more noisy than the target environment is better than using using a training environment that has less noise than the target environment.
Classifier systems are currently in vogue as a way of using genetic algorithms to demonstrate machine learning. However, there are a number of difficulties with the formalization that can influence how knowledge is represented and the rate at which the system can learn. Some of the problems are inherent in classifier systems, and one must learn to cope with them, while others are pitfalls waiting to catch the unsuspecting implementor. This paper identifies some of these difficulties, suggesting directions for the further evolution of classifier systems.
This research develops and applies a genetic classifier system (CS) to triage patients presenting with symptoms of upper respiratory infections (URIs). The CS searches among 66 dichotomous patient signs and symptoms to evolve classifiers that best explain care provider triage decisions. The systems search is directed by specifying relative costs of false positives and false negatives. The model achieved a sensitivity and specificity of 100% and 42%, respectively, when applied to a triage case base of URI patients. A split-sample validation of the system shows its accuracy is comparable to that achieved with a triage protocol developed by Infectious Disease Specialists.
NEWBOOLE is a simple stimulus-response classifier system that has been used successfully in a number of supervised concept learning and classification problems. In this paper we use NEWBOOLE on a relatively simple categorization problem involving medical diagnosis, and compare its performance to that of human subjects on the same problem. The system is provided with the relevant rules and learning involves generating appropriate strengths for the different rules so as to categorize both seen and unseen examples. Results obtained with the simple classifier system exhibit the same trends as demonstrated by the humans. We present an analysis explaining the working of the system on the given task. Other experiments presented in this paper include experiments to replicate human performance on filtering and condensation tasks as described by Kruschke Kruschke92:ALCOVE, and experiments involving learning to attend to relevant dimensions for proper categorization Shepard61:Learning. In the latter task, the system is not provided the relevant rule set, and hence has to discover the appropriate rules as well as learn to assign credit to the useful ones.
Paper is an extended abstract
The paper is devoted to the problem of learning decision policies in multi-agent games. This problem is a simple, but appealing computational model of several important real-world problems in such domains as parallel computing, optimization, and control on one hand, and economy, social, and political sciences on the other hand. We describe a general framework for studying games of intelligent agents, extending the basic model of games with limited interactions, and its specific realization based on learning classifier systems. Simulation results are presented that illustrate the convergence properties of the resulting system. Avenues for future work in this area are outlined.
Genetic algorithms have given rise to two new fields of research where (global) optimisation is of crucial importance: `Genetic Programming' and `Genetic based Machine Learning' (GBML). In this paper the second domain (GBML) will be introduced. An overview of one of the first GBML implementations by Holland, also known as the Learning Classifier Systems (LCS) will be given. After describing and solving a well-known basic (educational) problem a more complex application of GBML is presented. The goal of this application is the automatic development of a rule set for an industrial production process. To this end, the case study on generating a rule set for predicting the spinnability in the fibre-to-yarn production process will be presented. A largely modified LCS, called Fuzzy Efficiency based Classifier System (FECS), originally designed by one of the authors, is used to solve this problem successfully.
Evolutionary Learning Classifier Systems (LCSs) combine reinforcement learning or supervised learning with effective genetics-based search techniques. Together these two mechanisms enable LCSs to evolve solutions to decision problems in the form of easy to interpret rules called classifiers. Although LCSs have shown excellent performance on some data mining tasks, many enhancements are still needed to tackle features like high dimensionality, huge data sizes, non-uniform distribution of classes, etc. Intrusion detection is a real world problem where such challenges exist and to which LCSs have not previously been applied. An intrusion detection problem is characterised by huge network traffic volumes, difficult to realize decision boundaries between attacks and normal activities and highly imbalanced attack class distribution. Moreover, it demands high accuracy, fast processing times and adaptability to a changing environment. We present the results and analysis of two classifier systems (XCS and UCS) on a subset of a publicly available benchmark intrusion detection dataset which features serious class imbalances and two very rare classes. We introduce a better approach for handling the situation when no rules match an input on the test set and recommend this be adopted as a standard part of XCS and UCS. We detect little sign of overfitting in XCS but somewhat more in UCS. However, both systems tend to reach near-best performance in very few passes over the training data. We improve the accuracy of these systems with several modifications and point out aspects that can further enhance their performance. We also compare their performance with other machine learning algorithms and conclude that LCSs are a competitive approach to intrusion detection.
This chapter examines genetic algorithms (GA) and machine learning using the game of tic-tac-toe. After `learning' acceptable strategies for playing the game, the GA-driven computer player is able to play a competent game of tic-tac-toe. Results obtained using a GA are compared to results obtained using alternative AI techniques.
This work has no abstract
Genetic-algorithms-based learning classifier systems suffer from a number of problems that cause system instability, resulting in poor performance. These problems include genetic operation disruptions and difficulties in maintaining good classifiers and classifier structures in the population. A method is proposed in which structural ties are used to achieve coherence, impose cooperation and encourage co-adaptation among classifiers. A hierarchically structured classifier system (HCS) has been implemented to show the effect of this structuring. At the lowest level, classifiers (individuals) are grouped into families. Higher-order structures, such as a community of families, can be introduced if necessary. The experimental results show a significant improvement in system performance and stability. The relationships between the HCS framework and the Michigan and Pittsburgh approaches are discussed.
Genetic-algorithms-based learning classifier systems suffer from a number of problems that cause instability, resulting in poor performance. These problems include genetic operation disruptions and difficulties in maintaining good classifiers and classifier structures in the population. A method is proposed in which structural ties are used to achieve coherence, impose cooperation and encourage co-adaptation among classifiers. A hierarchically structured classifier system (HCS) has been implemented to show the effect of this structuring. At the lowest level, classifiers (individuals) are grouped into families. Higher-order structures, such as communities of families, can be introduced if necessary. The experimental results show a significant improvement in system performance and stability. The relationships between the HCS framework and the Michigan and Pittsburgh approaches are discussed.
Paper is an extended abstract
This paper suggests a simple analogy between learning classifier systems (LCSs) and neural networks (NNs). By clarifying the relationship between LCSs and NNs, the paper indicates how techniques from one can he utilized in the other. The paper points out that the primary distinguishing characteristic of the LCS is its use of a co-adaptive genetic algorithm (GA), where the end product of evolution is a diverse population of individuals that cooperate to perform useful computation. This stands in contrast to typical GA/NN schemes, where a population of networks is employed to evolve a single, optimized network. To fully illustrate the LCS/NN analogy used in this paper, an LCS-like NN is implemented and tested. The test is constructed to run parallel to a similar GA/NN study that did not employ a co-adaptive GA. The test illustrates the LCS/NN analogy and suggests an interesting new method for applying GAs in NNs. Final comments discuss extensions of this work and suggest how LCS and NN studies can further benefit each other.
A learning classifier system (LCS) is a machine learning system that incorporates a production-system framework and a genetic algorithm (GA) for rule discovery (Goldberg, 1989; Holland, 1975). A primary feature of LCSs is their potential to exploit overlapping sets of rules called default hierarchies. Default hierarchies increase rule set parsimony, enlarge the solution set, and lend themselves to graceful refinement by the GA (Holland, Holyoak, Nisbett, & Thagard, 1986). Traditionally, auction-based, specificity-biased credit allocation (CA) and conflict resolution (CR) schemes have been used to encourage default hierarchy formation in an LCS. Analyses presented in this paper suggest that these schemes cannot be expected to perform adequately in arbitrary LCS environments. This paper presents an alternate CA/CR that associates two measures with each classifier in place of the single, traditional strength measure. The first measure is a payoff estimate, which is tuned by the linear-update scheme usually used for strength. The second measure is a priority factor that is tuned to control the outcome of a necessity auction. In the necessity auction the winning classifier pays out the payoff estimate of its nearest competitor, rather than a fraction of its own payoff estimate. Results and analyses are presented that show that this CA/CR scheme can induce variable bid separation that responds to the demands of the LCS environment. Additional analyses show that this scheme allows an LCS to adequately exploit a broader class of default hierarchies than traditional schemes. Several avenues are suggested for further study.
Paper is an extended abstract
This paper introduces an autonomous systems strategy that combines two biological inspirations: neural networks and genetic algorithms (GAs). These ideas have been combined in a variety of ways in other systems, but the scheme presented here has several unique features. The system presented is based on an analogy between learning classifier systems (LCSs) and neural networks first presented by Smith and Cribbs [Evolutionary Computation 2(1) (1994) 19-36]. However, Smith and Cribbs focused on supervised learning. The work presented in this paper transfers these ideas to the realm of autonomous systems by considering reinforcement learning. In the new system, a neural network is used to map environmental states to Q value. The neural network structure is based on an LCS. The GA acts to shape neural connectivity, and the number of hidden layer nodes. The GAs action is similar to its action in the LCS. The suggested system is evaluated in a simulated mobile robot test environment. Experimental results suggest that the system is effective in learning and evolving parsimonious strategy representations for autonomous systems. Future directions for investigation of this system are discussed.
This paper introduces a new variety of learning classifier system (LCS), called MILCS, which utilizes mutual information as fitness feedback. Unlike most LCSs, MILCS is specifically designed for supervised learning. We present preliminary results, and contrast them to results from XCS. We discuss the explanatory power of the resulting rule sets and introduce a new technique for visualizing explanatory power. Final comments include future directions of this research, including investigations in neural networks and other systems.
In this paper a set of simultaneous equations is used to examine the development of rule sets in a classifier system using a genetic algorithm (GA) as its primary discovery device. These equations are developed for a stimulus-response classifier system with a single, two-bit condition and a binary action. Computations are presented that show the importance of niching operators in a classifier system's GA. Further experiments compare and contrast the effects of fitness sharing and mating restriction as niching and speciation operators in the classifier system.
There are a number of common difficulties and open issues that pertain to the ``traditional'' LCS model. Many of these topics were central at The First International Workshop on Learning Classifier Systems (Houston, Texas, 1992). Since the first workshop, several significant, theoretically-supported advances in LCS practice have addressed these issues. However, a system employed by the authors to acquire novel fighter aircraft maneuvers from combat simulation is more akin to the traditional LCS model than to more recent systems. Given the difficulties often experienced in LCS research on simple problems, one must ask how a relatively primitive LCS has had consistent success in the complex domain of fighter aircraft maneuvering? This paper overviews the troublesome issues discussed at the first workshop, and recent advances. It then presents the fighter aircraft LCS, in greater detail than in previous publications. Positive results from the system are discussed. The paper then focuses on the primary reasons the fighter aircraft LCS has avoided the difficulties of the traditional LCS. The authors believe the system's success has three primary origins: differences in credit assignment, differences in action encoding, and (possibly most importantly) a difference in system goals. In the fighter aircraft system, the goal has been simply the discovery of innovative, novel tactics, rather than online control. The paper concludes by discussing the most salient features of the fighter aircraft learning system, and how those features may be profitably combined with other LCS developments.
A system employed by the authors to acquire novel fighter aircraft manoeuvres from combat simulation is more akin to the traditional LCS model than to more recent systems. Given the difficulties often experienced in LCS research on simple problems, one must ask how a relatively primitive LCS has had consistent success in the complex domain of fighter aircraft manoeuvering. This paper presents the fighter aircraft LCS, in greater detail than in previous publications. Positive results from the system are discussed. The paper then focuses on the primary reasons the fighter aircraft LCS has avoided the difficulties of the traditional LCS. The authors believe the system's success has three primary origins: differences in credit assignment, differences in action encoding, and (possibly most important) a difference in system goals. In the fighter aircraft system, the goal has been simply the discovery of innovative, novel tactics, rather than online control. The paper concludes by discussing the most salient features of the fighter aircraft system, and how those features may be profitably combined with other LCS developments.
This paper has no abstract
Paper is an extended abstract
Paper is an extended abstract
The design of neural networks and fuzzy systems can involve complex, nonlinear, and ill-conditioned optimization problems. Often, traditional optimization schemes are inadequate or inapplicable for such tasks. Genetic Algorithms (GA's) are a class of optimization procedures whose mechanics are based on those of natural genetics. Mathematical arguments show how GAs bring substantial computational leverage to search problems, without requiring the mathematical characteristics often necessary for traditional optimization schemes (e.g., modality, continuity, availability of derivative information, etc.). GA's have proven effective in a variety of search tasks that arise in neural networks and fuzzy systems. This presentation begins by introducing the mechanism and theoretical underpinnings of GA's. GA's are then related to a class of rule-based machine learning systems called learning classifier systems (LCS's). An LCS implements a low-level production-system that uses a GA as its primary rule discovery mechanism. This presentation illustrates how, despite its rule-based framework, an LCS can be thought of as a competitive neural network. Neural network simulator code for an LCS is presented. In this context, the GA is doing more than optimizing and objective function. It is searching for an ecology of hidden nodes with limited connectivity. The GA attempts to evolve this ecology such that effective neural network performance results. The GA is particularly well adapted to this task, given its naturally-inspired basis. The LCS/neural network analogy extends itself to other, more traditional neural networks. Conclusions to the presentation discuss the implications of using GA's in ecological search problems that arise in neural and fuzzy systems.
Learning classifier systems (LCSs) offer a unique opportunity to study the adaptive exploitation of memory. Because memory is manipulated in the form of simple internal messages in the LCS, one can easily and carefully examine the development of a system of internal memory symbols. This study examines the LCS applied to a problem whose only performance goal is the effective exploitation of memory. Experimental results show that the genetic algorithm forms a relatively effective set of internal memory symbols, but that this effectiveness is directly limited by the emergence of parasite rules. The results indicate that the emergence of parasites may be an inevitable consequence in a system that must evolve its own set of internal memory symbols. The paper's primary conclusion is that the emergence of parasites is a fundamental obstacle in such problems. To overcome this obstacle, it is suggested that the LCS must form larger, multirule structures. In such structures, parasites can be more accurately evaluated and thus eliminated. This effect is demonstrated through a preliminary evaluation of a classifier corporation scheme. Final comments present future directions for research on memory exploitation in the LCS and similar evolutionary computing systems.
This work has no abstract
We address the complexity of strategies that simulate those used in repeated pairwise social interactions by individuals capable of cooperating (C) or defecting (D). Strategies are composed of some number of interacting rules, each specifying a response (C or D) to the recent history of previous responses by the two individuals. Here we consider the extent of memory and especially the number of rules as components of complexity. Using a classifier system based on small populations playing the Iterated Prisoner's Dilemma game, we show that mutual cooperation (and thus fitness) is maximized for strategies of about 20 rules when the number of rules is fixed; that less memory generally yields more mutual cooperation; and that longer interaction sequences generate more mutual cooperation -- all in accord with previous work. Allowing rule number to evolve with short interaction sequences produces mutual cooperation near or below the low levels associated with random responses. But even with these short sequences, weak selection on rule number can be detected. Selection seems to favor fewer rules (approximately 7) when rule number can evolve than the fixed number of rules that maximzes mutual cooperation. We expect stronger selection on rule number for longer interaction sequences and larger population sizes.
Two applications of Anticipatory Classifier Systems (ACS) in robotics are discussed. The first one is a simulation of an experiment about latent learning in rats with a mobile robot. It shows than an ACS is able to learn latently, i.e. in the absence of environmental reward and that ACS can do action planning. The second one is about learning of the hand-eye coordination of a robot arm in conjunction with a camera. Goal-directed learning will be introduced. This combination of action planning and latent learning leads to a substantial reduction of the number of trials which are required to learn a complete model of a prototypical environment.
This paper adds a new viewpoint to the Anticipatory Classifier System (ACS). It approaches the system from a psychological perspective and thus provides new insights to the current system. By simulating previously published rat experiments, the paper compares the behavior of the ACS with the behavior of the rats. Two further cognitive mechanisms are introduced to the ACS resulting in an animal-like behavior in the presented simulations. Moreover, the paper gives empirical evidence that the evolving generalized, internal environmental model is usable in the ACS for the mental adaptation of actions and thus enables reinforcement learning by mental simulation.
A classifier system is a machine learning system that learns a collection of rules, called classifiers. Mostly, classifiers can be regarded as simple stimulus-response rules. A first level of learning called credit assignment level, consists of reinforcement learning on these classifiers. A classifier is reinforced in dependence on the result of an interaction between the CS and its environment. A second level that is independent of the first one consists of rule discovery. For that a CS usually uses genetic algorithms that can only use very indirect information about the interaction between the system and the environment in the form of rule strengths. It is often the problem with CSs that hierarchical chunks of classifiers are destroyed when the rule discovery is applied. Therefore in some applications CSs don't use the rule discovery level or don't delete classifiers (e.g. Riolo 1991). This paper gives an introduction to a new kind of CSs that learn with anticipatory behavioral control. These classifier systems are called anticipatory classifier systems (ACSs). Anticipatory behavioral control is a development of reinforcement learning on stimulus-response units and enables us to learn an internal model of an external environment. The main difference between ACSs and other CSs is that in an ACS the rule discovery level is integrated into the credit assignment level. The rule discovery algorithm of an ACS uses immediate environmental information, i.e. it's a kind of intentional rule discovery. This is a particular feature of ACSs. For example, there are no problems with hierarchical chunks of classifiers. After the introduction we prove the performance of ACSs by comparing them with other CSs. A simulation of an experiment about the latent learning of rats is then discussed and it is shown that ACSs solve the locality/globality dilemma for reactive classifier systems.
Abstract: Antizipative Classifier Systems oder kurz ACSs sind Classifier Systems, die mittels antizipativer Verhaltenssteuerung lernen. Classifier Systems wurden 1978 von J. Holland eingeführt und bilden neben künstlichen neuronalen Netzen und Multi-Agenten-Systemen eine wichtige Klasse lernender Systeme in der Künstlichen Intelligenz. Antizipative Verhaltenssteuerung, wie sie 1992 von J. Hoffmann postuliert wurde, ist eine psychologische Lerntheorie, bei der Verhalten eine Grundvoraussetzung für Lernen ist. In der vorliegenden Arbeit ist es gelungen, antizipative Verhaltenssteuerung in Classifier Systems zu integrieren und somit zu einem Lernalgorithmus weiterzuentwickeln. Dabei wurde das Ziel verfolgt, die Ideen der antizipativen Verhaltenssteuerung möglichst unmittelbar im Algorithmus wiederzufinden. Im einzelnen gliedert sich die Arbeit in 5 Kapitel. In Kapitel 1 wird eine kurze Einführung in die Theorie der antizipativen Verhaltenssteuerung gegeben. Kapitel 2 umfaßt eine ausführliche Diskussion verschiedener Varianten von Cassifier Systems. Der Kern der Arbeit besteht aus Kapitel 3. Hier werden auf der Grundlage von Kapitel 1 und Kapitel 2 ACSs formal definiert. Das 4. Kapitel dient der Evaluation von ACSs. Dazu werden zwei Anwendungen antizipativer Classifier Systems diskutiert. Zum einen wird ein Tierexperiment aus der Verhaltensforschung und zum anderen eine Lernaufgabe für einen Roboter simuliert. Im 5. Kapitel werden Grenzen und Erweiterungsmöglichkeiten von ACSs diskutiert.
Anticipatory classifier systems (ACSs) are a new kind of classifier systems (CSs) that learn by using the cognitive mechanism of anticipatory behavioral control. At first this paper gives a brief introduction to ACSs. Then two applications of ACSs are discussed. The first one is a simulation of an experiment about latent learning that was done by Seward (1949) and first simulated by Riolo (1991). The second one consists of a simulation of a robot that has to learn its eye-hand coordination, starting without any knowledge, that was described and simulated by Birk (1995) using Drescher's schemata (Drescher, 1991 p.9).
Anticipatory Classifier Systems (ACS) are a new kind of classifier system (CS) that learn by using the cognitive mechanism of anticipatory behavioral control that was introduced in cognitive psychology by Hoffmann (1992). This paper gives at first a brief introduction to Hoffmann's learning mechanism. Then ACS are introduced. To prove the performance of ACS they are compared to Riolo's CFSC2(1991). In addition to a theoretical comparison a simulation of an experiment about latent learning in rats is discussed that was developed by Seward (1949).
Seward (1949) developed an experiment about latent learning in rats. During a learning phase rats learn the topology of a T-maze without getting any reward. This experiment is replicated with a Khepera robot that latently learns by using an Anticipatory Classifier System (ACS). The robot and its environment are simulated with the Open Mobile Robots Simulator Webots. Latent learning is defined as learning in the absence of reinforcement. Therefore this experiment cannot be simulated with usual reinforcement learning techniques. A mobile robot can observe its environment only partially, so that the Markov property is not necessarily given. Indeed, the T-maze used here is a Non-Markov environment for a Khepera robot. Non-Markov environments can be learned by adding memory to Learning Classifier Systems (cf. Cliff & Ross 1995, Lanzi 1998). An alternative for ACS is to use classifiers with behavioral sequences. This alternative is discussed. Besides it must be possible to test whether the topology of the T-maze is learned or not. For this purpose the robot is told to reach a certain point in the maze. If the robot needs more than one behavioral act to do this, then it is necessary to have a mechanism that enables the robot to do look ahead planning (cf. Riolo 1991). Such a mechanism is introduced.
Anticipatory Classifier Systems (ACS) are classifier systems that learn by using the cognitive mechanism of anticipatory behavioral control which was introduced in cognitive psychology by Hoffmann. They can learn in deterministic multi-step environments. A stepwise introduction to ACS is given. We start with the basic algorithm and apply it in simple ``woods'' environments. It will be shown that this algorithm can only learn in a special kind of deterministic multi-step environments. Two extensions are discussed. The first one enables an ACS to learn in any deterministic multi-step environment. The second one allows an ACS to deal with a special kind of non-Markov state.
Paper is an extended abstract
Based on the proposals of Wilson and Goldberg we introduce a macro-level evolutionary operator which creates structural links between rules in the ZCS model and thus forms ``corporations'' of rules within the classifier system population. Rule co-dependencies influence both the behaviour of the discovery components of the system and the production system, where a corporation can take control for a number of time-steps. The system is compared to ZCS and also ZCSM in a number of maze environments which include Woods1 and Woods7. The corporate classifier system is shown to be the most suitable design to tackle a range of these types of problems.
Previously we have applied rule linkage to ZCS and shown that the resultant system demonstrates performance improvements over ZCS in a series of sequential tasks, particularly tasks which present ambiguous stimuli to the system. In this paper we show that similar benefits can be gained by applying rule linkage to the more complex XCS. We then show that the benefits of rule-linkage can be increased by further XCS specific modifications to the system's rule-linkage mechanisms.
Our previous implementation of a Corporate Classifier System (CCS), which introduces rule-linkage to a ZCS-based system, has been shown to demonstrate performance improvements over ZCS in a series of sequential tasks, particularly those which present arbitrary sensory ambiguities to the system. In this paper, the functionality of our CCS is enhanced to provide increased benefits regarding the same class of sequential evaluation tasks.
It has long been recognised that increased co-operation amongst the classifiers in a Michigan-style classifier system may resolve some of the established difficulties associated with the design. One approach to this was proposed by Wilson and Goldberg -- the ``corporate'' classifier system. In this paper we implement the ``corporate'' classifier system design, within Wilson's ZCS, in such a way that it complies with their theoretical proposals. In the resultant system , a zeroth-level corporate classifier system, all classifiers initially stand alone but during the course of evolution, a mutation-type operator is used to couple together classifiers by means of structural links. Linked classifiers are considered to represent a corporation, and are treated as a unit by the discovery mechanism of the system. This is achieved by the use of a macro-level evolutionary operator called ``corporate crossover''. In this design the production system remains oblivious to corporations and operates as ZCS. A technique referred to as concept analysis is introduced which is used to clarify the effects of such rule associations, as implemented here, within a Michigan-style classifier system.
Previously we have applied rule linkage to ZCS and shown that the resultant system demonstrates performance improvements over ZCS in a series of sequential tasks, particularly tasks which present ambiguous stimuli to the system. In this paper we show that similar benefits can be gained by applying rule linkage to the more complex XCS. We then show that the benefits of rule-linkage can be increased by further XCS specific modifications to the system's rule-linkage mechanisms.
From a system developer's perspective, designing a spoken dialogue system can be a time-consuming and difficult process. A developer may spend a lot of time anticipating how a potential user might interact with the system and then deciding on the most appropriate system response. These decisions are encoded in a dialogue strategy, essentially a mapping between anticipated user inputs and appropriate system outputs. To reduce the time and effort associated with developing a dialogue strategy, recent work has concentrated on modelling the development of a dialogue strategy as a sequential decision problem. Using this model, reinforcement learning algorithms have been employed to generate dialogue strategies automatically. These algorithms learn strategies by interacting with simulated users. Some progress has been made with this method but a number of important challenges remain. For instance, relatively little success has been achieved with the large state representations that are typical of real-life systems. Another crucial issue is the time and effort associated with the creation of simulated users. In this thesis, I propose an alternative to existing reinforcement learning methods of dialogue strategy development. More specifically, I explore how XCS, an evolutionary reinforcement learning algorithm, can be used to find dialogue strategies that cover large state spaces. Furthermore, I suggest that hand-coded simulated users are sufficient for the learning of useful dialogue strategies. I argue that the use of evolutionary reinforcement learning and hand-coded simulated users is an effective approach to the rapid development of spoken dialogue strategies. Finally, I substantiate this claim by evaluating a learned strategy with real users. Both the learned strategy and a state-of-the-art hand-coded strategy were integrated into an end-to-end spoken dialogue system. The dialogue system allowed real users to make flight enquiries using a live database for an Edinburgh-based airline. The performance of the learned and hand-coded strategies were compared. The evaluation results show that the learned strategy performs as well as the hand-coded one (81% and 77% task completion respectively) but takes much less time to design (two days instead of two weeks). Moreover, the learned strategy compares favourably with previous user evaluations of learned strategies.
Multi-state artificial environments such as mazes represent a class of tasks that can be solved by many different multi-step methods. When different rewards are available in different places of the maze, a problem solver is required to evaluate different positions effectively and remembers the best one. A new hillclimbing strategy for the Michigan style classifier system is suggested which is able to find the shortest path and discarding sub-optimal solutions. Knowledge reuse is also shown to be possible.
The Classifier System is a learning mechanism that explores the space of steps leading to a reward. It credits rules leading to a reward through a temporal difference method called a bucket brigade in which each step passes some of its reward to the step that preceded it. The bucket brigade rewards all the steps in a chain, given enough time. However, the classifiers in a population are in competition with each other, so delays in rewarding key steps may result in those steps being crowded out by others. The main hypothesis of this proposal is that the combination of a clustering operator with a classifier system will aid in the formation of long chains. This combination has potential applications for discovering useful variable-length representations for classifier systems, further enhancing their ability to learn, and can be applied to the field of Genetic Programming to discover useful building blocks.
The Dynamic Classifier System extends the traditional classifier system by replacing its fixed-width ternary representation with Lisp expressions. Genetic programming applied to the classifiers allows the system to discover building blocks in a flexible, fitness directed manner. In this paper, I describe the prior art of problem decomposition using genetic programming and classifier systems. I then show how the proposed system builds on work in these two areas, extending them in a way that provides for flexible representation and fitness directed discovery of useful building blocks.
Effective reinforcement learning methods are essential for credit assignment in learning classifier systems in order to guide the structural rule-base modifications performed by genetic algorithms. Currently, a number of algorithms have been proposed to fulfill this fundamental need; the most frequently employed being the bucket-brigade algorithm. In this paper, an experimental evaluation of a number of reinforcement learning algorithms for a variety of parameter settings is presented. A simplified learning classifier system is used in order to minimize system complexity in order to isolate the behavior of the reinforcement learning algorithms. The problem domain tackled is that of the control of an unstable dynamic system. It was discovered that exploitation of current information is highly favored over exploration for new information and that a hybrid bucket brigade-backward averaging algorithm produced the fastest convergence to a solution.
Machine-based learning will eventually be applied to solve real-world problems. Here, an associative architecture teams with hybrid AI algorithms to solve a letter prediction problem with promising results.
This paper describes a non-generational genetic algorithm for multiobjective optimisation. The fitness of each individual in the population is calculated incrementally based on the degree in which it is dominated in the Pareto sense, or close to other individuals. The closeness of individuals is measured using a sharing function. The performance of the algorithm presented is compared to previous efforts on three multiobjective problems of growing difficulty. The behavior of each algorithm is analyzed with regard to the visited search space, the quality of the final population attained,and the percentage of non-dominated individuals in the population through time. According to all these performance measures, the algorithm presented clearly outperforms previous efforts based on genetic algorithms.
In this paper a new analysis tool for classifier systems is presented: the Boolean analysis of classifier sets. This tool is applied to determine the minimal classifier sets that perform a general learning task in stimulus-response mode. Modifications to classical Boolean minimization techniques to accommodate for the minimization of default hierarchies are studied. This Boolean analysis is used to determine the relation between the size of a Boolean function and the minimal number of levels required in its minimal default hierarchy. Finally, the concept of parsimony or savings of rules produced by the formation of default hierarchies is discussed.
This document has no abstract.
This paper presents the fuzzy classifier system which merges the ideas behind classifier systems and fuzzy controllers. The fuzzy classifier system learns by creating fuzzy rules which relate the values of the input variables to internal or output variables. It has credit assignment mechanisms which reassemble those of common classifier systems, but with a fuzzy nature. The fuzzy classifier system employs a genetic algorithm to evolve adequate fuzzy rules. Preliminary results show that the fuzzy classifier system can effectively create fuzzy rules that imitate the behavior of simple static systems.
Paper is an extended abstract
Systems which take raw data and categorize them into discrete classes are ubiquitous in computer science, having applications in fields such as vision, expert systems, and game playing. These systems work by extracting features from the data and then combining the values of the features to form a judgement. While much work has been done on ways to automatically combine feature values, the task of automatic discovery of these features is recognized to be much more difficult, and so has become one of the holy grails of machine learning. Classifier systems, an outgrowth of genetic algorithms, seemed a promising approach to automatic feature discovery, but it is difficult to get the full power of the classifier system from existing implementations. This thesis simplifies the classifier system into a variant of the genetic algorithm, called the Population Genetic Algorithm (PGA). PGAs are used to automatically discover features for tic-tac-toe and checkers endgame positions, and these features are automatically combined using Bayesian statistics to classify each position as won, lost, or drawn. The theoretical maximum performance of the PGAs is determined by using an exhaustive enumeration technique to serve as a baseline comparison. The results indicate that while PGAs can be made to perform at near-optimal levels, the optimal solution is insufficient to perfectly classify any of the domains studied.
A model of decentralized trade is simulated with firms that produce a given commodity, and consumers who repeatedly wish to purchase one unit of that commodity. Consumers 'shop around', while firms may attract the attention of potential customers by sending information signals and offering good service. The main objective of this paper is to present an example of a computational approach to address the following question: How do self-organized markets emerge in the economy, and what are their characteristics?
Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of LCS diverges; and (2) methods to avoid such divergence. Based on our previous work that showed equivalence between LCS's reinforcement process and Reinforcement Learning (RL) with Function approximation (FA) method, we present a counter-example for LCS with Q-bucket-brigade based on the 11-state star problem, a counter-example originally proposed to show the divergence of Q-learning with linear FA. Furthermore, the empirical results applying the counter-example to LCS verified the results predicted from the theory: (1) LCS with Q-bucket-brigade diverged under the prediction problem, where the action selection policy was fixed; and (2) such divergence was avoided by using implicit-bucket-brigade or applying residual gradient algorithm to Q-bucket-brigade.
We present an experimental comparison of the reinforcement process between Learning Classifier System (LCS) and Reinforcement Learning (RL) with function approximation (FA) method, regarding their generalization mechanisms. To validate our previous theoretical analysis that derived equivalence of reinforcement process between LCS and RL, we introduce a simple test environment named Gridworld, which can be applied to both LCS and RL with three different classes of generalization: (1) tabular representation; (2) state aggregation; and (3) linear approximation. From the simulation experiments comparing LCS with its GA-inactivated and corresponding RL method, all the cases regarding the class of generalization showed identical results with the criteria of performance and temporal difference (TD) error, thereby verifying the equivalence predicted from the theory.
This paper describes ClaDia, a learning classifier system applied to the Wisconsin breast cancer data set, using a fuzzy representation of the rules, a median-based fuzzy combination rule, and separate subpopulations for each class. The system achieves a classification rate of over 90%, for many sets of system parameter values.
Classifier systems constitute a general model of low-level rule-based systems that are capable of environmental interaction and learning. A central characteristic and drawback of the traditional approaches to learning in such systems is that they exclusively work on the rule level, without taking into consideration that the individual rules possess a very complex activity behavior. This article investigates an alternative, action-oriented perspective of learning in classifier systems which does not suffer from this drawback. According to this perspective learning is realized on the finer action level instead of the coarser rule level. Comparative theoretical and experimental results are presented that show the advantages of the action-oriented over the traditional perspective.
The bucket brigade does indeed suffer from detrimental biases. Profit sharing can avoid these only if productions are properly penalized for eligibility, and more importantly, if the span over which profit is shared is long enough. Such a long span introduces much sampling noise, with its often unacceptable danger of premature convergence. The bucket brigade is designed to reduce this noise.
The paper examines models that are homomorphic images of the first component of a particular two component cascade decomposition of the environment. The bucket brigade is used to estimate model state values. The discussion is limited to finite automaton environments whose successive input symbols are selected by the system probabilistically, with independent probabilities, according to a probability distribution over the input symbols.
Redundant classifiers waste space. This paper suggests an approach to the problem of classifier redundancy based on the cost of gene replication. The approach stems from the view that reduction and simplification are the essence of the evolutionary creative process, and that the most advanced organisms are Prokaryotes, not Eukaryotes.
Classifier system formalism provides valuable conceptual test beds that help us come to grips with fundamental problems of credit assignment in learning systems. Here we discuss the horizon problem, the sampling noise problem, the problem of removing redundancy to save space, and the freeloader problem, this last being a collection of problems including the problem of ill formed conditions. Classifier systems can help provide answers. But until we have some answers, experimental systems will remain flawed.
Wilson's error measurement is an important conceptual tool in the study of classifier system reward schemes. It tests classifiers to see where the Markov property detrimentally fails. Incorporating this test directly into a reward scheme involves difficulties. But the use of the error measurement in analysis, and indeed in experimentation, should help advance our understanding of credit assignment issues.
This thesis recasts the debate between Michigan-style and Pitt-style classifier systems to a debate on appropriately sizing organizations within a learning classifier system. Motivated by the economic study of transaction costs, an organizational classifier system (OCS) combining explicit use of multiple reputation values and organization sizing operators better distinguishes parasitic (less than optimal) classifiers than a simple classifier system (SCS). The results show that by building a system that autonomously adjusts the degree of individual to collective behavior, it is possible for it to be both efficient and resilient to problem difficulty.
The current state of classifier system development is examined with emphasis on challenges and unsolved problems. Suggestions related to the bucket-brigade architecture, the mechanics of bidding and payments, and classifier syntax follow a review of past research.
Based on Hubel & Wiesel's physiological findings on the projection from retina to cortex, a schematic model of that stage of visual processing is constructed and its properties investigated. The projection or mapping appears to carry out an automatic ``normalization of description'' for the same object independent of retinal image size. This property suggests new concepts regarding (1) contrast sensitivity, (2) the nature and role of indirect vision, (3) the role of eye movements and (4) the recognition of patterns and the analysis of scenes.
It is shown that a certain model of the primate retino-cortical mapping ``sees'' all centered objects with the same ``object-resolution'', or number of distinct signals, independent of apparent size. In an artificial system, this property would permit recognition of patterns using templates in a cortex-like space. It is suggested that with an adaptive production system such as Holland's classifier system, the recognition process could be made self-organizing.
Results are presented of experiments with a simple artificial animal model acting in a simulated environment containing food and other objects. Procedures within the model that lead to improved performance and perceptual generalization are discussed. The model is designed in the light of an explicit definition of intelligence which appears to apply to all animal life. It is suggested that study of artificial animal models of increasing complexity would contribute to understanding of natural and artificial intelligence.
A Simplified classifier system was given the task of learning a relatively difficult boolean function drawn from the machine learning literature. The system solved the problem in times comparable to or less than times required by a network method. Classifiers present in the final populations corresponded closely to terms of an efficient boolean representation of the solution. Achievement of the results depended on a selection regime that emphasized classifiers which were both general and accurate. The theoretically predicted superiority of the crossover genetic operator to the point mutation operator was observed. Most experiments used a fixed crossover rate, but in one series the system itself advantageously controlled the rate based on an environment-independent definition of classifier system entropy.
This paper characterizes and investigates, from the perspective of machine learning and, particularly, classifier systems, the learning problem faced by animals and autonomous robots (here collectively termed animats). We suggest that, to survive in their environments, animats must in effect learn multiple disjunctive concepts incrementally under payoff (needs-satisfying) feedback. A review of machine learning techniques indicates that most relax at least one of these constraints. In theory, classifier systems satisfy the constraints, but tests have been limited. We show how the standard classifier system model applies to the animat learning problem. Then, in the experimental part of the paper, we specialize the model and test it in a problem environment satisfying the constraints and consisting of a difficult, disjunctive Boolean function drawn from the machine learning literature. Results include: learning the function in significantly fewer trials than a neural-network method; learning under payoff regimes that include both noisy payoff and partial reward for suboptimal performance; demonstration, in a classifier system, of a theoretically predicted property of genetic algorithms: the superiority of crossovers to point mutations; and automatic control of variation (search) rate based on system entropy. We conclude that the results support the classifier system approach to the animat problem, but suggest work aimed at the emergence of behavioral hierarchies of classifiers to offset slower learning rates in larger problems.
Learning systems which engage in sequential activity face the problem of properly allocating credit to steps or actions which make possible later steps that result in environmental payoff. In the classifier systems studied by Holland and others, credit is allocated by means of a ``bucket-brigade'' algorithm through which, over time, environmental payoff in effect flows back to classifiers which take early, stage-setting actions. The algorithm has advantages of simplicity and locality, but may not adequately reinforce long action sequences. We suggest an alternative form for the algorithm and the system's operating principles designed to induce behavioral hierarchies in which modularity of the hierarchy would keep all bucket-brigade chains short, thus more reinforceable and more rapidly learned, but overall action sequences could be long.
Classifier systems (Holland, 1986) have a distinct Darwinian flavor, and in this respect contrast sharply with most other learning systems. In this paper we bring out various aspects of the contrast, and provide an example of classifier system learning which illustrates its quasi-Darwinian operation.
Experiments were conducted with respect to two classifier system mechanisms: the bid competition and the use of classifier specificity in bidding and payments. The experiments employed a simplified classifier system and so may not accurately reflect the behavior of the standard system. Nevertheless, the results indicated that, in general, (1) specificity should not be factored into amounts deducted from a classifier's strength, (2) the bid competition does not improve performance and does not encourage default hierarchies, and (3) default hierarchies will form under a somewhat different algorithm than the standard one.
Learning systems which engage in sequential activity face the problem of properly allocating credit to steps or actions which make possible later steps that result in environmental payoff. In the classifier systems studied by Holland and others, credit is allocated by means of a ``bucket-brigade'' algorithm through which, over time, environmental payoff in effect flows back to classifiers which take early, stage-setting actions. The algorithm has advantages of simplicity and locality, but may not adequately reinforce long action sequences. We suggest an alternative form for the algorithm and the system's operating principles designed to induce behavioral hierarchies in which modularity of the hierarchy would keep all bucket-brigade chains short, thus more reinforceable and more rapidly learned, but overall action sequences could be long.
A scheme is described for simulating the evolution of multicellular systems. The scheme is based on a representation for biological development in which the genotypes are sets of production-like growth rules that are executed to produce cell aggregates-the phenotypes. Evolution of populations, through phenotype selection and genotype variation, occurs according to the method of the genetic algorithm. Some examples of the development representation in 1-dimensional creatures are given.
Perceptrons were evolved that computed a rather difficult nonlinear Boolean function. The results with this early and basic form of emergent computation suggested that when genetic search is applied to its structure, a perceptron can learn more complex tasks than is sometimes supposed. The results also suggested, in the light of recent work on classifier systems, that to hasten the emergence of an emergent computation it is desirable to provide evaluative feedback at a level as close as possible to that of the constituent local computations.
A research methodology is proposed for understanding intelligence through simulation of artificial animals (``animats'') in progressively more challenging environments while retaining characteristics of holism, pragmatism, perception, categorization, and adaptation that are often underrepresented in standard AI approaches to intelligence. It is suggested that basic elements of the methodology should include a theory/taxonomy of environments by which they can be ordered in difficulty-one is offered-and a theory of animat efficiency. It is also suggested that the methodology offers a new approach to the problem of perception.
Paper is an extended abstract
Paper is an extended abstract
A basic classifier system, ZCS, is presented that keeps much of Holland's original framework but simplifies it to increase understandability and performance. ZCS's relation to Q-learning is brought out, and their performances compared in environments of two difficulty levels. Extensions to ZCS are proposed for temporary memory, better action selection, more efficient use of the genetic algorithm, and more general classifier representation
In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is given by a measure of the prediction's accuracy. The system executes the genetic algorithm in niches defined by the match sets, instead of panmictically. These aspects of XCS result in its population tending to form a complete and accurate mapping X x A -> P from inputs and actions to payoff predictions. Further, XCS tends to evolve classifiers that are maximally general, subject to an accuracy criterion. Besides introducing a new direction for classifier system research, these properties of XCS make it suitable for a wide range of reinforcement learning situations where generalization over states is desirable.
Within a reinforcement learning framework, ten strategies for autonomous control of the explore/exploit decision are reviewed, with observations from initial experiments on four of them. Control based on prediction error or its rate of change appears promising. Connections are made with explore/exploit work by Holland (1975), Thrun (1992), and Schmidhuber (1995).
This paper studies two changes to XCS, a classifier system in which fitness is based on prediction accuracy and the genetic algorithm takes place in environmental niches. The changes were aimed at increasing XCS's tendency to evolve accurate, maximally general classifiers and were tested on previously employed ``woods'' and multiplexer tasks. Together the changes bring XCS close to evolving populations whose high-fitness classifiers form a near-minimal, accurate, maximally general cover of the input and action product space. In addition, results on the multiplexer, a difficult categorization task, suggest that XCS's learning complexity is polynomial in the input length and thus may avoid the ``curse of dimensionality'', a notorious barrier to scale-up. A comparison between XCS and genetic programming in solving the 6-multiplexer suggests that XCS's learning rate is about three orders of magnitude faster in terms of the number of input instances processed.
Classifier systems have traditionally taken binary strings as inputs, yet in many real problems such as data inference, the inputs have real components. A modified XCS classifier system is described that learns a non-linear real-vector classification task.
XCS is a new kind of learning classifier system that differs from the traditional one primarily in its definition of classifier fitness and its relation to contemporary reinforcement learning. Advantages of XCS include improved performance and an ability to form accurate maximal generalizations. This paper reviews recent research on XCS with respect to representation, predictive modelling, internal state, noise, and underlying theory and technique. A notation for environmental regularities is introduced.
Classifier systems have traditionally taken binary strings as inputs, yet in many real problems such as data inference, the inputs have real components. A modified XCS classifier system is described that learns a non-linear real-vector classification task.
Paper is an extended abstract
The classifier system XCS was investigated for data mining applications where the dataset discrimination surface (DS) is generally oblique to the attribute axes. Despite the classifiers' hyper-rectangular predicates, XCS reached 100% performance on synthetic problems with diagonal DS's and, in a train/test experiment, competitive performance on the Wisconsin Breast Cancer dataset. Final classifiers in an extended WBC learning run were interpretable to suggest dependencies on one or a few attributes. For data mining of numeric datasets with partially oblique discrimination surfaces, XCS shows promise from both performance and pattern discovery viewpoints.
XCS is a new kind of learning classifier system that differs from the traditional kind primarily in its definition of classifier fitness and its relation to contemporary reinforcement learning. Advantages of XCS include improved performance and an ability to form accurate maximal generalizations. This paper reviews recent research on XCS with respect to representation, internal state, predictive modelling, noise, and underlying theory and technique. A notation for environmental regularities is introduced.
The classifier system XCSF was modified to use gene expression programming for the evolution and functioning of the classifier conditions. The aim was to fit environmental regularities better than is typically possible with conventional rectilinear conditions. An initial experiment approximating a nonlinear oblique environment showed excellent fit to the regularities.
Emotional states, such as happiness or sadness, pose particular problems for information processing theories of mind. Hedonic components of states, unlike cognitive components, lack representational content. Research within Artificial Life, in particular the investigation of adaptive agent architectures, provides insights into the dynamic relationship between motivation, the ability of control sub-states to gain access to limited processing resources, and prototype emotional states. Holland's learning classifier system provides a concrete example of this relationship, demonstrating simple `emotion-like' states, much as a thermostat demonstrates simple `belief-like' and `desire-like' states. This leads to the conclusion that valency, a particular form of pleasure or displeasure, is a self-monitored process of credit-assignment. The importance of the movement of a domain-independent representation of utility within adaptive architectures is stressed. Existing information processing theories of emotion can be enriched by a `circulation of value' design hypothesis. Implications for the development of emotional animats are considered.
Emotional states, such as happiness or sadness, pose particular problems for information processing theories of mind. Hedonic components of states, unlike cognitive components, lack representational content. Research within Artificial Life, in particular the investigation of adaptive agent architectures, provides insights into the dynamic relationship between motivation, the ability of control sub-states to gain access to limited processing resources, and prototype emotional states. Holland's learning classifier system provides a concrete example of this relationship, demonstrating simple `emotion-like' states, much as a thermostat demonstrates simple `belief-like' and `desire-like' states. This leads to the conclusion that valency, a particular form of pleasure or displeasure, is a self-monitored process of credit-assignment. The importance of the movement of a domain-independent representation of utility within adaptive architectures is stressed. Existing information processing theories of emotion can be enriched by a `circulation of value' design hypothesis. Implications for the development of emotional animats are considered.
In this paper, the relatively new branch of mathematics known as Evolutionary Game Theory is proposed as a potentially useful tool when seeking to resolve certain of the more global, unanswered questions related to classifier systems. In particular, it is proved that, under certain mild assumptions, the performance of a classifier system plans will, if the Bucket Brigade Algorithm is adopted, conform to what is referred to as `an evolutionary stable state'. A simple example is also provided to confirm the theoretical findings.
In an environment where input information to machine learning (ML) systems using production rules has many ``properties'' and the amount is huge enough, the authors aim for an ML systems with effective performance in finding solutions. When finding solutions, all of the properties of the input information are not always required. Therefore, the authors assume that a certain mechanism which can select specific properties to be focused on will contribute to this purpose. For the realization and discussion of this mechanism, the authors have focused on the Classifier System (CS) which has more advantages than other ML systems. From the author's point of view, operation processes in the CS are thought to involve this mechanism. However, the CS also involves such ``duality'' that both the optimization processes of rules for solution finding and the abstraction processes of input information are in a single process, which may lead to problems. In this paper, the authors propose a computational model in which these two processes are explicitly separated. The key concept of the proposed model is the Viewpoint-Forming Process for the purpose of using rules for selecting properties to be focused on. This is separate from the standard rules for finding solutions. A computer system is developed to evaluate the utility of this model. The results acquired by applying the model to an example problem are reported here.
First we introduce new metrics for classifying the complexity of mazes based on agent-independent and agent-dependent characteristics of maze environments. We analyze 50 mazes used in the literature by the metrics and then introduce 351 new maze environments, including 271 aliasing mazes of increased difficulty. The purpose of preparing the extensive set of maze environments is to provide a suitable evaluation environment for alternative learning agent architectures. To fulfil our second goal we analyze the major learning theories, design the psychological model of Associative Perception Learning, integrate it into the Reinforcement Learning framework and define a new Learning Classifier System (LCS), AgentP, that utilizes explicitly imprinted images of the environment states. AgentP is designed specifically to find the shortest route through aliasing mazes with rewards only on transitions to terminal states. Such mazes contain the areas that look alike for a learning agent but may be associated with different optimal actions. The mazes represent a form of Partially Observable Markov Decision Processes (POMDP). Unlike many other LCS, AgentP does not generalize over the states. It learns a one-step transition model of the environment and uses two deterministic heuristics. AgentP has a rule structure similar to Anticipatory Classifier Systems. However, unlike them, AgentP perceives consecutive environmental states not only as a cause-effect time vector, but also as a single perceptive image, which is compared with previously memorized images for differentiation purposes. As a result AgentP is able to recognize aliasing in both the initial and resulting environment states, while ACS is meant to recognize aliasing in the initial state only. Each classifier in AgentP is supplemented with an ID system for a refined differentiation of aliasing squares. AgentP uses a distance-based reinforcement process where the expected difference between two successive learning coefficients remains the same with increased distance to food. It eliminates the disadvantages associated with the behaviour of the Q-learning based reinforcement procedure, commonly employed by LCS, in long-distanced mazes. The distance-based reinforcement procedure introduces certain limitations as AgentP is only able to handle specific kinds of reward function. The environment should be discrete and the agent is not able to operate on multi-motivational tasks. However, the reinforcement procedure in its present form provides simple and reliable test facilities for our main purpose, development of the operators for refined differentiation of aliasing squares, while the limitation can be overcome in future versions of AgentP when it is necessary. While experimenting with two versions of AgentP, we discover the phenomenon of aliasing clones, i.e. aliasing conglomerates of a similar graphic pattern that include more than two aliasing states and are located in different areas of the maze. We investigate the impact which makes the presence of aliasing clones in a maze on the ability of AgentP to solve mazes. We find that AgentP is able to solve optimally extensive mazes with dozens of aliasing squares and numerous aliasing conglomerates, provided they are free from aliasing clones. At the same time, for a maze containing at least three aliasing states that are grouped into an aliasing clone, the risk of an error for both versions of AgentP becomes non zero. We analyze the performance of AgentP in detail and show that it is able to solve optimally the majority of aliasing mazes used in the experiments and may be performing better than other LCS agents. We then discuss the potential of the learning model, possible improvements into the agent's structure and the most promising approaches to future research.
This paper presents an extension to the classifier system model that provides mechanisms to store solutions to learned tasks in a long term memory, to match the stored units against new problems as they arise, and to use the stored knowledge to learn new tasks in an increasingly efficient manner. The extended model has been implemented in a system called CSM (Classifier System with Memory). Experimental results with CSM demonstrate the benefits of learning by analogy in a robot navigation task domain and show significant improvements compared with the current classifier system model.