Friday, February 18, 2011

Multilayer Perceptron - ANN (BackPropagation)

According to OpenCV Book, MLP (Multi-layer Perceptron) is slow to train but able to make quick predict due to it's simplicity. And it's good for text recognition.

API documentation also give a good introduction. What follows is basically paraphrasing from there:

There are 2 implementations in OpenCV:
  1. classical random-sequential back-propagation
  2. batch RPROP (default)
There are 3 choices of activation function - used for all layers of neurons:
  1. Identity - simply the weighted sum
  2. Sigmoid (default) - result would be converted to -1 to 1 (zero-crossing at x=0)
  3. Gaussian (not completely supported?)
Things to consider when deciding network topology (# layers, # hidden nodes)
  • Too big causes over-fitting
  • Takes long time to train
Recommends pre-process the inputs with CalcPCA (Principal Component Analysis) to reduce the dimension of feature-vector - speed up training.

ANN is designed for numerical data, the API documentation suggests a work-around to handle categorical data.

Parameters
  • Training Method
  • Activation function
  • Network topology: Number of nodes at each layer and number of layers
  • Free parameters: activation functions (alpha, beta) - shape the Sigmoid
Sample (letter-recog.cpp, for comparing with Random Trees and AdaBoost)
  • 80% sample data used for training
  • Topology: Input layers 16 nodes, 2 hidden layer @ 100 nodes each, Output layer 26 nodes
  • Unroll categorical response data (A...Z) to numerical. In this case, each response is a 26-element vector, conceptually representing a bit-vector. For example, if the response is 'C', then the 3rd element is set to 1, others kept at 0. The choice of 26-element vector because it corresponds to the 26 output layer nodes.
  • Is there any easy way in OpenCV to create a new matrix out of an arbitrary set of columns from an existing matrix (without copying)?
Observations (16000 training samples, 16-variable feature-vector, 26 classes)
  • Classical Random-Sequential BP (bp_bw_scale=0.001): train 95.2% test 93.5% (Build-time 3367.66 secs !)
  • RPROP (rp_dw0=0.05): train 75.6% test 74.3% (Build-time 1350.67 secs)
  • RPROP(rp_dw0=0.1): train 78.9% test 78% (Build-time 1333.09 secs)
  • RPROP(rp_dw0=1.0): train 15.2% test 14.3% (Build-time 1329.59 secs)
Code
RPROP parameter rp_dw0 is initialized to different values (0.1,  1.0) in various occasions. Why?

Readings
  • http://en.wikipedia.org/wiki/Backpropagation . Wikipedia article about the back-propagation algorithm.
  • LeCun, L. Bottou, G.B. Orr and K.-R. Muller, “Efficient backprop”, in Neural Networks—Tricks of the Trade, Springer Lecture Notes in Computer Sciences 1524, pp.5-50, 1998.
  • Riedmiller and H. Braun, “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm”, Proc. ICNN, San Francisco (1993).

4 comments:

  1. Grate - but would you mind sharing some code?

    ReplyDelete
  2. @Oleg I am not sure which code you are referring to. I was using 'letter_recog.cpp' sample from OpenCV 2.2.

    ReplyDelete
  3. Hi,
    I have 2 questions:
    1. where your samples come from, and what are they look like?
    2. could you tell me more about the principal composant analysis you performed on your datas?
    I work on a guide robot project, I do computer vision, and I will probably use NN to recognize classrooms numbers.

    Thx!!

    ReplyDelete
  4. I don't have real-life samples for this. I simply used the feature-vector and responses from letter-recognition.data. And that file is located at OpenCV/samples/cpp/.
    I have not performed the PCA either because I don't have the samples and feature-extraction method.
    Just now I looked up from the OpenCV book and I think you could find more information here:
    http://yann.lecun.com/exdb/mnist/
    And the paper by LeCun et al:
    Gradient-Based Learning Applied to Document Recognition
    Hope this helps with your robot project!

    ReplyDelete