Sunday, March 6, 2011

Calonder Descriptor, Generic Trees and Randomized Trees

Summary of both papers (see Reading)

The first paper proposes to use ML classification techniques to do fast keypoint matching. Time will be spent in offline training and resulting in a shorter matching time. They found that Randomized Trees is a good candidate. It supports multi-class classification and by experiment they give good matching results. If the node-split-criteria is chosen randomly also (Extreme Randomize Trees?), not only the training time be reduced, but also better matching results. The forest will classify an input key-point from the set of training key-points. So given a key-point from input image, the forest is able to characterize whether it matches one of the original keypoints (or none-of-the-above). The training key-points are chosen by its ability to be recovered from a set of distorted views. Additionally, they found that using simple key-point detector (that inspired the BRIEF descriptor?) is good enough to achieve illumination invariance. The paper devises a way to achieve invariants with this simple descriptor by 'reproducing' each training image multiple times into a 'view set'. Each original image is randomly rotated and scaled into 100 separate images for the 'view set'. All images from the view-set will be used to train the classifier so that it will be able to detect the same keypoint patch from various viewpoints at run-time. Scale invariants is improved by building image pyramids from which key-points are extracted also. (Small) Position invariants is enhanced by injected random 'noise' (in terms of positions?) to the view-set. Placing the training patches on random backgrounds so that the classifier could pick out those trained key-points from cluttered background. Such Binary Intensity Descriptor like this together with the view-set performs very well against sophisticated SIFT descriptor, provided that the random trees is able to divide the keypoint space up to a certain granularity.

The second paper is a continuation of first paper. The focus this time is try recognizing objects by key-point matching fast enough to use in real time video. An important application is SLAM where offline learning is not practical as objects cannot be learned ahead of time. The authors propose a Generic Tree Algorithm. First, a randomized tree classifier is trained with a set of key-points called 'base-set'. The key points are selected from only a small number of training images. And similarly, the images are warped in many ways for training to achieve invariants. At run time, a query key-point will be go down the classifier, resulting in a set of probability for n-classes. This set is treated as a Signature (descriptor-vector) for this key-point. The Signature would have n-elements (corresponding to the n-classes). Each element is a thresholded value of the class output. The matching between key-point signatures is done using Euclidean Distance.. The theory is that any new correspondence keypoint-pair will have similar classifier output, even though they do not belong to the base-set.


OpenCV defines CalonderDescriptor class that could produce the Signature of a query point-of-interest from a given Randomized Trees. RTreeClassifier class represents the forest and it is trained with a given base-set and a patch-generator. The base-set is basically collection of key-point locations from a training image. The size of the base-set is the number of classes trained to classify. PatchGenerator objects are used to warp an image using the specified ranges - angles for rotation, intensities for background colors, deltas for position-noises, lambda for scales.

Demo code (find_obj_calonder.cpp)

Dataset - Dundee University set.

The demo code trains Randomized Tree Classifier of 48-trees of 9 levels deep. Took more than 30 minutes to train 176 keypoints (selected out of 1423) from a single image. PatchGenerator creates 100 views for each key-point. The classifier will be saved to a file after training. At run-time, it uses SURF to pick interest points from reference and query images, extracts Calonder Descriptor with the classifier and performs Brute-Force(L2) matching. By default the input image from command-line argument will be used as a reference image. The query image is a warped version of itself. All images are converted to gray-scale for training and tests.

Wrote another test function so that instead of warping the input image, user supplies another image for matching.

Results and Observations
  • Trained (one-at-a-time) with upright object-only image: book1, book2, ball
  • Finding the object image from the object-in-a-bigger-picture images did not do well. Many false matches.
  • Most time spent on loading the classifier data file (~16MB).
Used one of the trained classifiers. Run the default test-case - matching between a query image and its warped version. The testing images are not related to the training image. The warping is not too severe. The results are satisfactory.

Site for image databases and associated 3D model (stereo-vision):

  • Keypoint Recognition with Randomized Trees, Lepetit & Fua
  • Keypoint Signatures for Fast Learning and Recognition, Calonder, Lepetit & Fua

No comments:

Post a Comment