Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

Classification

mindnumb edited this page May 1, 2015 · 35 revisions

Standard classification problems

In a classification problem, we would typically have some input vectors x and some desired output labels y. Let's consider then a simple classification problem called the yin-yang problem. In this problem, we have two classes of elements. Elements belonging to the positive class, shown in blue; and elements belonging to the negative class, shown in red.

This data can be downloaded in Excel format here. In order to load this data into an application, let's use the ExcelReader class together with some extensions methods from the Accord.Math namespace. Add the following using namespace clauses on top of your source file:

using Accord.Controls;
using Accord.IO;
using Accord.Math;

Then, let's write the following code:

// Read the Excel worksheet into a DataTable
DataTable table = new ExcelReader("examples.xls").GetWorksheet("Sheet1");

// Convert the DataTable to input and output vectors
double[][] inputs = table.ToArray<double>("X", "Y");
int[] outputs = table.Columns["G"].ToArray<int>();

// Plot the data
ScatterplotBox.Show("Yin-Yang", inputs, outputs).Hold();

After we run and execute this code, we will get the following scatter plot shown on the screen:

Models

Naive Bayes

// In our problem, we have 2 classes (samples can be either
// positive or negative), and 2 inputs (x and y coordinates).

var nb = new NaiveBayes<NormalDistribution>(classes: 2,
    inputs: 2, prior: new NormalDistribution());

// The Naive Bayes expects the class labels to 
// range from 0 to k, so we convert -1 to be 0:
//
outputs = outputs.Apply(x => x < 0 ? 0 : x);

// Estimate the Naive Bayes
double error = nb.Estimate(inputs, outputs);

// Classify the samples using the model
int[] answers = inputs.Apply(nb.Compute);

// Plot the results
ScatterplotBox.Show("Expected results", inputs, outputs);
ScatterplotBox.Show("Naive Bayes results", inputs, answers)
    .Hold();

Support Vector Machines

Linear

// Create a linear binary machine with 2 inputs
var svm = new SupportVectorMachine(inputs: 2);

// Create a L2-regularized L2-loss optimization algorithm for
// the dual form of the learning problem. This is *exactly* the
// same method used by LIBLINEAR when specifying -s 1 in the 
// command line (i.e. L2R_L2LOSS_SVC_DUAL).
//
var teacher = new LinearCoordinateDescent(svm, inputs, outputs);

// Teach the vector machine
double error = teacher.Run();

// Classify the samples using the model
int[] answers = inputs.Apply(svm.Compute).Apply(System.Math.Sign);

// Plot the results
ScatterplotBox.Show("Expected results", inputs, outputs);
ScatterplotBox.Show("LinearSVM results", inputs, answers);

// Grab the index of multipliers higher than 0
int[] idx = teacher.Lagrange.Find(x => x > 0); 

// Select the input vectors for those
double[][] sv = inputs.Submatrix(idx);

// Plot the support vectors selected by the machine
ScatterplotBox.Show("Support vectors", sv).Hold();

Kernel

// Estimate the kernel from the data
var gaussian = Gaussian.Estimate(inputs);

// Create a Gaussian binary support machine with 2 inputs
var svm = new KernelSupportVectorMachine(gaussian, inputs: 2);

// Create a new Sequential Minimal Optimization (SMO) learning 
// algorithm and estimate the complexity parameter C from data
var teacher = new SequentialMinimalOptimization(svm, inputs, outputs)
{
    UseComplexityHeuristic = true
};

// Teach the vector machine
double error = teacher.Run();

// Classify the samples using the model
int[] answers = inputs.Apply(svm.Compute).Apply(System.Math.Sign);

// Plot the results
ScatterplotBox.Show("Expected results", inputs, outputs);
ScatterplotBox.Show("GaussianSVM results", inputs, answers);

// Grab the index of multipliers higher than 0
int[] idx = teacher.Lagrange.Find(x => x > 0);

// Select the input vectors for those
double[][] sv = inputs.Submatrix(idx);

// Plot the support vectors selected by the machine
ScatterplotBox.Show("Support vectors", sv).Hold();

Decision Trees

// In our problem, we have 2 classes (samples can be either
// positive or negative), and 2 continuous-valued inputs.
DecisionTree tree = new DecisionTree(attributes: new[] 
{
    DecisionVariable.Continuous("X"),
    DecisionVariable.Continuous("Y")
}, outputClasses: 2);

C45Learning teacher = new C45Learning(tree);

// The C4.5 algorithm expects the class labels to
// range from 0 to k, so we convert -1 to be zero:
//
outputs = outputs.Apply(x => x < 0 ? 0 : x);

double error = teacher.Run(inputs, outputs);

// Classify the samples using the model
int[] answers = inputs.Apply(tree.Compute);

// Plot the results
ScatterplotBox.Show("Expected results", inputs, outputs);
ScatterplotBox.Show("Decision Tree results", inputs, answers)
    .Hold();

Neural Networks

See Resilient Backpropagation.

Logistic Regresssion

See Logistic Regression.

Variations

Multi-label problems

In some problems, samples can belong to more than one single class at a time. Those problems are denoted multiple label classification problems and can be solved in different manners. One way to attack a multi-label problem is by using a 1-vs-all support vector machine.

See Multi-label SVM.

Sequence classification

A sequence classification problem is a classification problem where input vectors can have varying length. Those problems can be attacked in multiple ways. One of them is to use a classifier that has been specifically designed to work with sequences. The other one is to extract a fixed number of features from those varying length vectors, and then use them with any standard classification algorithms, such as support vector machines.

For an example on how to transform sequences into fixed length vectors, see Dynamic Time Warp Support Vector Machine.

For examples of sequence classifiers, see Hidden Markov Classifier Learning and Hidden Conditional Random Field Learning.

  1. Accord.NET Framework
  2. Getting started
  3. Published books
  4. How to use
  5. Sample applications

Help improve this wiki! Those pages can be edited by anyone that would like to contribute examples and documentation to the framework.

Have you found this software useful? Consider donating only U$10 so it can get even better! This software is completely free and will always stay free. Enjoy!

Clone this wiki locally