README.txt for a UCS implementation in Java
7 April 2010

This version is 07-04-10b. If you update it please update the version number.

------------------------------------------------------------------------------
ADEPT project members: This code has been committed to the ADEPT dropbox account 
which is the official and possibly only copy. Please only replace that copy with 
changes that are ready for everyone to see.
------------------------------------------------------------------------------

This code implements the UCS supervised learning classifier system in
Java. It also implements a variant called UCSpv. For specifications of
both algorithms see:

Gavin Brown, Tim Kovacs, James Marshall, UCSpv: principled voting in UCS rule populations. 
Proceedings of the 2007 Genetic and Evolutionary Computation Conference (GECCO 2007). ISBN 
978-1-59593-697-4, pp. 1774–1781. July 2007. 

This code should be considered a beta release: it is not fully developed or debugged.

The code was produced as part of the ADEPT project run by Gavin Brown
at the University of Manchester and Tim Kovacs at the University of
Bristol.

The code was written by Gavin Brown in 2006-7 and extended by Tim
Kovacs in January 2010.

The code should be available from
http://www.cs.manchester.ac.uk/~gbrown/ucs/download.php


Organisation of the code
------------------------

/data    -- where data files are stored
/ucs     -- the UCS code
/demos   -- code which defines and starts UCS experiments
/dataprocessing -- utility classes used by the UCS code
/doc     -- javadoc for the above
/notes   -- other documentation
/scripts -- some shell scripts for extracting data from standard output

Compiling and running the code
------------------------------

This system compiles without the installation of any other code. To compile
from the prompt you must be in the folder which contains the file
you are reading (README.txt).

To compile UCS itself:
       javac ucs/UCS.java

This will report:
"Note: ucs/UCS.java uses unchecked or unsafe operations.
 Note: Recompile with -Xlint:unchecked for details."

Recompiling with -Xlint:unchecked reveals shows that this refers to calls to ArrayList.addAll and ArrayList.add,
and presumably refers to the use of untyped ArrayLists. So it can be ignored.

To compile the file which starts the basic 11-mux experiment:
       javac demos/BasicUCS.java

Note that the two compilations above are independent. If you modify the contents
of /ucs and the contents of /demos you need to recompile them both independently.

To run the BasicUCS.java experiment
       java demos/BasicUCS


Features
--------

Here we list some of the things the system can do along with demo
files which demonstrate the features. Note that some demos demonstrate
many features and are more complex than they need to be to demonstrate
one of their features. Note also that the same feature often appears in
multiple demos. Finally, note that all experiments are started by
files in /demos (not just those intended for demonstrations). In fact
the system makes no distinction between the purpose of the experiments
and calls all experiments "demos".

Algorithmic variations
- Implements standard UCS. See /demos/BasicUCS.java
- Implements UCSpv. See /demos/BasicUCSpv.java
- Can set the v parameter separately for the GA and for prediction. See /demos/IndependentV.java

Data handling
- Data must be specified in csv files. 
- Can run either in either online or offline modes
  - online uses all the data in the file as for both train and test. Testing is done intermittently 
    on the last 50 inputs trained on. See demos/BasicUCS.java
  - offline splits the data into train and test sets. Separate train and test files can be specified. 
    See /demos/TrainTestDemo1.java. Alternatively the system will split a single csv file into
    train and test sets of equal size (sampling uniform randomly). See /demos/TrainTestDemo2.java
    Finally, crossvalidation can be used. See /demos/CrossValidateDemo1.java
- Multiple runs with differernt parameter settings can be specified in the startup file.
  See /demos/IndependentV.java

Statistics
- Computes voting margins. See /demos/Margin.java
- Can generate statistics from multiple runs e.g. average accuracy, average area under the accuracy.
  See /demos/IndependentV.java
- Scripts are provided to extract statistics like accuracy over time from a results file
- The avgCol.pl script can be used to average statistics from multiple runs e.g. average
  accuracy over time. --- Is this needed or has internal stats generation replaced it?


Known bugs 
----------

- as of 7/4/2010 the system enters an endless loop (freezes) after a random number
  of iterations if the data file contains examples of only one class
- as of 18/1/2010 systemPrediction in SystemPredictor.java is often empty, which
  shouldn't happen (due to covering). Actually we should arguably not cover on the test set,
  but we could use a nearest neighbour classification. This empty system prediction
  messes up the margin calculation.
- as of 18/1/2010 the normalisation of the supports only causes the two highest to
  sum to one. If there are more than 2, should they all sum to one?
- I don't know whether this implements macroclassifiers or just simulates them.
  UCS.java.invokeGA() checks for subsumption and if it occurs it generates a clone
  of the subsuming classifier and inserts it into the population.
- If we run a test on every time step (or even every 50) with IndV6mux the accuracy
  starts very high (about .95!!), drops suddenly and then rises. Why does it start so high?


With 15% noise the system prediction array is often empty for v>= 20. I guess this
is because v is killing off the low-fitness rules in the GA, which are sometimes
the only ones which cover an input. I.e. v is forcing holes in the covering map.


Discussion of design
====================

How to load data

There are several ways to specify the test data for an experiment (see
UCS.java's loadData()):

- set onlinelearning to true (and folds to 1). These are the defaults
  in ucs/UCSconfig.java.  This uses all the data in the loaded file as
  both train and test. Testing is done intermittently on the last 50
  inputs trained on. See demos/BasicUCS.java

- set onlinelearning to false and folds to > 1. This implements n-fold
  crossvalidation.

- set onlinelearning to false and specify a separate test file. See
  demos/TrainTestDemo1.java

- set onlinelearning to false and don't specify a separate test file.
  This results in the input file being split 50/50 into train and
  test. This ratio could easily be modified but at the moment there's
  no mechanism to do that. See demos/TrainTestDemo2.java and
  datasource/getTrainingData().


How to extract statistics

If onlinelearning is true the system runs a test every 50 time steps.

If onlinelearning is false then when starting the experiment we pass
the run method of the UCS object the number of iterations to run for
and the test interval. If UCSconfig.verbosity is 1 or more the system
prints out the results of the test, using "RESULTS" as the header of
the string in case you want to extract such lines out of the results
file.

Note that setting onlinelearning to false doesn't imply that you can't
get statistics over time. To do that all you need to do is set the
test interval to a small fraction of the time steps of the experiment
when calling UCS.run(). 'onlinelearning' just implies that the last 50
inputs will be used as a test set every 50 time steps.

When running multiple runs (e.g. using multiple folds in
crossvalidation) they come out one after another on the standard
output. We could split each fold out of the stream and then
e.g. average them together. However, the approach taken is to collect
statistics behind the scenes as the system is running and print out
averages at the very end. This way no post-processing of the standard
output is needed. See demos/IndependentV.java for a sophisticated
example.



History / version numbers
-------------------------

Version numbers consist of the date the version was last modified,
plus a letter for any subsequent minor edits.

07-04-10c -- the /results/ folder was removed. It contained results of some
experiments which are destined for a GECCO'11 paper but which do not need to
be distributed with this UCS implementation.

I also added a little code in Sept. 2010 to count the number of GA invocations
and report it if params.verbosity >= 2. (See ucs/UCS.java)

07-04-10b -- minor fix to documentation (name of root folder in .zip file)

07-04-10 was developed by Tim in August 2007 and January 2010. It was tidied up in 
April 2010. There is a small chance there is some code in Gavin's May 21 2007 
version which did not make it into Tim's version but that's unlikely.

Changes to Gavin's version of 21/05/2007 found in 07-04-10:

Bug fixes:
- Covering added to test phase so that correct set is never empty
- Covering now adds covered rule to match and correct sets
- deleteIfNecessary() now removes from match and correct sets, not just pop
- currentIteration initialised to 1 so that test phase is not
  triggered on first iteration (unless STEP == 1)
- Some rules generated by the GA were not being initialised to have
  accuracy = experience = numCorrect = 1.  This was fixed by moving
  these initialisations into the Indiv constructor.
- Covering inserts a new rule, then if the pop size limit is exceeded
  it deletes it. This sometimes happens to be the same rule, which
  undoes the covering. Now after deletion, covering checks to see if
  the action set is empty and if so it calls itself to perform covering
  again.

Upgrades:
- Stores accuracy, AUCAccuracy and margin from the most recent test phase
- demos/Margin.java shows how UCS can execute multiple runs and average their outputs,
  and how a series of runs with different parameters can be made
- UCSconfig.java now has a verbosity parameter
- demos/IndependentV.java shows how to set v separately for the GA and prediction
- Crossvalidation added

Design changes:
- The UCS constructor no longer clones the UCSconfig object. This was done to implement
  crossvalidation as it allows different UCS objects to share the same UCSconfig (which
  is useful for crossvalidation as we get UCSconfig to remember which fold is
  currently being used as the test fold). No demos made use of the fact that the UCS
  constructor cloned its UCSconfig parameter. If we do want UCS objects to have different
  UCSconfigs then we can just create different ones before passing them to the constructor.
- data files are now stored in /data/ instead of in the root folder
