Evolving Ensembles

Next: Evolving Neural Networks Up: GBML Areas Previous: Genetic Programming Contents

Subsections

3.3 Evolving Ensembles

Ensembles, also called Multiple Classifier Systems and Committee Machines, is the field which studies how to combine predictions from multiple sources. Ensemble methods are widely applicable to evolutionary systems where a population provides multiple predictors. Ensemble techniques can be used with any learning method, although they are most useful for unstable learners, whose hypotheses are sensitive to small changes in their training. Ensembles can be heterogeneous (composed of different types of predictors) in which case they are called hybrid ensembles. Relatively few studies of hybrid systems exist [37] but see e.g. [313,72,69] for examples. Ensembles some enjoy good theoretical foundations [37,279], perform very well in practice [64] and were identified by Dietterich as one of four current directions for machine learning in 1998 [80]. While the key advantage of using ensembles is better test set generalisation there are others: ensembles can perform more complex tasks than individual members, the overall system can be easier to understand and modify and ensembles are more robust / degrade more gracefully than individual predictors [249].

Working with an ensemble raises a number of issues. How to create or select ensemble members? How many members are needed? When to remove ensemble members? How to combine their predictions? How to encourage diversity in members? There are many approaches to these issues, among the best known of which are bagging [33,34] and boosting [97,98,202].

Creating a good ensemble is an inherently multiobjective problem [281]. In addition to maximising accuracy we also want to maximise diversity in errors; after all, having multiple identical predictors provides no advantage On the other hand, an ensemble of predictors which make different errors is very useful since we can combine their predictions so that the ensemble output is at least as good on the training set as the average predictor [165]. Hence we want to create accurate predictors with diverse errors [80,117,165,216,215] In addition we may want to minimise ensemble size in order to reduce run time and to make the ensemble easier to understand. Finally, evolving variable-length chromosomes without pressure towards parsimony results in bloat [231], in which case we have a reason to minimise the size of individual members.

3.3.1 Evolutionary Ensembles

Although most ensembles are non-evolutionary, evolution has many applications within ensembles. i) Classifier creation and adaptation: providing the ensemble with a set of candidate members. ii) Voting: [169,272,273,71] evolve weights for the votes of ensemble members. iii) Classifier selection: the winners of evolutionary competition are added to the ensemble. iv) Feature selection: generating diverse classifiers by training them on different features (see §3.1 and [167] §8.1.4). v) Data selection: generating diverse classifiers by training on different data (see §3.1). All these approaches have non-evolutionary alternatives. We now go into more detail on two of the above applications.

Classifier Creation and Adaptation

Single-objective evolution is common in evolving ensembles. For example, [191] combines accuracy and diversity into a single objective. In comparison, multi-objective evolutionary ensembles are rare [69] but they are starting to appear e.g. [2,69,68]. In addition to upgrading GBML to multi-objective GBML other measures can be taken to evolve diversity, for example fitness sharing e.g. [192] and the co-evolutionary fitness method we describe next. Gagné et al. [100] compare boosting and co-evolution of learners and problems - both gradually focus on cases which are harder to learn - and argue that co-evolution is less likely to overfit noise. Their co-evolution inspired fitness works as follows: Let Q be a set of reference classifiers. The hardness of a training input

is based on how many members of Q misclassify it. The fitness of a classifier is the sum of hardnesses of the inputs

it classifies correctly. This method results in accurate yet error-diverse classifiers since both are required to obtain high fitness. Gagné et al. exploit the population of classifiers to provide Q. They also introduce a greedy margin-based scheme for selection of ensemble members. They find that a simpler off-line version of their EEL (Evolving Ensemble Learning) approach dominates their on-line version as the latter lacks a way to remove bad classifiers. Good results were obtained compared to Adaboost on 6 UCI [9] datasets.

Evolutionary Selection of Members

There are two extremes. Usually each run produces one member of the ensemble and many runs are needed. Sometimes, however, the entire population is eligible to join the ensemble, in which case only one run is needed. The latter does not resolve the ensemble selection problem: which candidates to use? There are many combinations possible from just a small pool of candidates, and, as with selecting a solution set from a Michigan populations, the set of best individuals may not be the best set of individuals (that is, the best ensemble). The selection problem is formally equivalent to the feature selection problem [100] §3.2. See e.g. [251,241] for evolutionary approaches.

3.3.2 Conclusions

Research directions for evolutionary ensembles include multi-objective evolution [69], hybrid ensembles [69] and minimising ensemble complexity [192].

Reading

Key works include Opitz and Shavlik's classic 1996 paper on evolving NN ensembles [216], Kuncheva's 2004 book on ensembles [167], Chandra and Yao's 2006 discussion of multi-objective evolution of ensembles [68], Yao and Islam's 2008 review of evolving NN ensembles [317] and Brown's 2005 and 2010 surveys of ensembles [37,35]. We cover evolving NN ensembles in § 3.4.

Next: Evolving Neural Networks Up: GBML Areas Previous: Genetic Programming Contents

T Kovacs 2011-03-12