Tuesday, February 8, 2011

BRIEF descriptor

According to the paper, BRIEF is designed to speed up keypoint-matching. The descriptor is simple to create and compare. The trade off is the current design is variant to scale and orientation.

The descriptor is a binary string. Each bit represents a simple comparison between two points inside a 'patch'. The key-point is centered at the patch area. Bit is set to '1' if first point is of higher intensity than the other, '0' otherwise. The paper suggests different ways of selecting points. The empirical results with best recognition rates are points chosen with Gaussian distributed at the image center.

The image needs to be smoothed before comparing pixels to reduce noise.

Hamming distance is used for matching. This takes advantage of the XOR and bit-counting CPU instructions(SSE).

OpenCV Implementation

A typical BRIEF descriptor is made of 16, 32 or 64 (x8) comparisons. The patch size is 48 of length. The spatial-distribution of the comparing-pixels is defined in test-pairs.txt. Integral of the image is computed once. And subsequent comparisons are done in terms of the sum of kernel (9x9). Implying that only the KERNEL_SIZE parameter could be adjusted.

Sample (brief_match_test.cpp)

Using data-sets provided by Visual Geometry Group from Oxford, UK. The results is as expected. The paper uses the same set of data. It could tolerate small rotation but not scaling or change in perspective.
The brief_match_test uses the matches to warp the second image to match the first one(homography). It shows a differential picture by subtracting the warped image from the  first. The more the good quality matches, the better the 'difference' would look like a Canny edge image.

Sample (video_homography.cpp)

This sample makes use of FAST detector, BRIEF descriptor and BruteForce (Hamming distance) matching to correlate Key-Points between video frames. Tested with video clips. It is able to track larger movements  if the camera zooms in to an object, or in cases where the camera pan while the objects stays relatively stationary. By default, it is not able to correlate the points on the car moving in the road-side camera. The tracks does show up after disabling the match_mask to drawMatchesRelative(). A new homography matrix is computed from the matches of every new frame. The match_mask makes up the keyPoints that fits the transform. But since there is no perspective-change from the fixed road-side camera, the mask becomes over-constraint.

The sample introduces a OpenCV class GridAdaptedFeatureDetector. Caller specifies an arbitrary grid size (default 4x4) and maximum feature points. The detection process would divide the max feature points among the grid cells. The grid spreads the feature points around the frame area. The per-cell max-feature-points selects the strongest feature points - highest 'response' value from the KeyPoint class.

The sample uses a technique to track only the points that appear in previous frames. This makes sense because a new scene would cut in and the old points becomes irrelevant. It does that by back-projecting the key points of the current frame with the previous perspective-transform matrix H. The back-projection is done by  applying the inverse of H.  A 2D mask is created with cv::windowedMatchingMask(). The mask elements would be set to 1 if the back-projected location is within a certain distance from a KeyPoint of the reference frame. The reference frame could be last frame or a user selected frame.

The homography matrix also resets itself to Identity Matrix if too few points between the reference (previous) frame and the current frame can be fit with the newly homography matrix.


Readings
BRIEF: Binary Robust Independent Elementary Features, Calonder et al.

No comments:

Post a Comment