Epicurus

Epicurus

Epicurus Epicurus is a freely available machine learning prediction library. Epicurus predicts both nominal and numeric attributes.

In it’s current version it supports k-nearest neighbours (nominal/numeric) and Naive Bayes (nominal). There are a few small sample applications that can be downloaded separately from the library.

Epicurus was written by Len Trigg and Stuart Inglis. It’s not yet at version 1.0 so it may stop with assertion failures instead of nice error messages. All failures should indicate problems with the input files.

All of the sample applications are for coding demonstration purposes, please don’t complain about them :-) Just donate more examples. For help using the samples run with the “-h” option.

Example: Sequence

To investigate the Epicurus machine learning library, download and install the Epicurus tarball as well as the Epicurus-apps tarball. Compile up the library and the applications.

Run ./Sequence and you will see:

    Welcome to the 'Sequence' Epicurus testThis application tests how well you can generate random
    letters. You have three choices, a, b or c. Type in lots
    of a's, b's and c's, and end input with ctrl-d. You may
    be surprised to see how well you can be predicited.

    If your input was completely random, Epicurus would get
    a 33,3% correct rate (66.6% errors)

    Enter next value:

If you enter:

    abcabcbabcbbacbacbacbcbacbacbacbacbabacacbacbac <enter>
    <control-d>

You will get output that looks like:

    Total Number of Instances              47
    Correctly Classified Instances         30     63.83 %
    Incorrectly Classified Instances       13     27.66 %
    UnClassified Instances                  0      0.00 %
    Multiply Classified Instances           4      8.51 %

This shows that the Epicurus library can correctly predict the next character you enter 63.8% of the time. You can see that my numbers were not very random at all! Epicurus had learnt the sequence. If the sequence was a repetitive sequence ‘abcabc’ the answer would approach 100% (in the limit).

Example: NaiveBayes

The ‘Sequence’ example predicted each new character based on a history or “context” that was generated by the two preceding characters.

For a more realistic (and complicated) example, examine the file arff/iris.arff that is distributed with Epicurus-apps. If you are looking at the iris dataset right now you’ll see that there are five columns (or attributes). The classification column (or class) is the last column.  If you wish to generate a model that attempts to capture the information in the dataset run:

    ./NaiveBayes -s model -t arff/iris.arff

The -s option saves the description of the data (or model) as a file called model. To load and evaluate the model use the -l option.

    ./NaiveBayes -l model -T arff/iris.arff

The -t means train on the data, the -T means test on the data.

Example: Stock market prediction

To contrive another example, let’s take the stock market as an example domain.

If you have a collection of measured indicators (call them Closing, Opening, Volume, CC100, SM10) and a prediction (PriceGain) you could create an arff3 file that looks like:

%
% Comments
%
@ARFF 3
@RELATION stock_prediction
@ATTRIBUTE Closing REAL
@ATTRIBUTE Opening REAL
@ATTRIBUTE Volume REAL
@ATTRIBUTE CC100 REAL
@ATTRIBUTE SM10 REAL
@ATTRIBUTE PriceGain REAL
@DATA

5.43, 5.48, 130000, 5.55, 1,    0.01
5.48, 5.49, 190000, 5.58, 1.2,  0.00
5.49, 5.49, 120000, 5.57, 1.1, -0.02
.
.
.

We treat this file as the training file (train.arff). To actually make predictions you need another file with the five indicators but this time you don’t know the PriceGain attribute. You file (assuming the same header) may look like:

.
.
5.51, 5.50, 19000,  5.05, 1,    _
5.50, 5.50, 290000, 5.06, 1.2,  _
5.50, 5.49, 120000, 5.03, 1,    _
.
.

You make the predictions running KNN (or NaiveBayes if the prediction is nominal) by:

./KNN -t train.arff -T test.arff -V PREDICTIONS

This will output a new arff file that is the test file with the predictions inserted instead of the _’s.