Tuesday, January 18, 2011

SIFT

Attempt to summarize the feature point detection method:
  • Locate key-points
    • Find candidate key-points with a Pyramid of DoG from several octaves (1 octave = rho x 2).
    • Calculate precise locations of key-points
    • Remove candidates on edges or low-contrast area
    • Assign orientation to each key-point. Create new key-points at the same location, scale which have similar and significant orientations.
    • Now each key-point has location, scale, orientation information.
  • Define descriptor for each key-point
    • Each sample point now has a gradient magnitude and orientation from previous steps.
    • For each key-point, make orientation histogram for its neighborhood 4x4 regions.
    • The histogram bins (r) width is 360 / r. If r = 8, width = 40.
    • Value of each bin is the sum of gradient magnitudes in that orientation, weighted inversely to the distance from the key-point.
    • The orientation is specified relative to the feature point orientation.
    • The n-by-n neighborhood could be 4x4 as used [ Lowe2004]. 
    • Number of elements for each descriptor is now r times n times n. 8 x 4 x 4 = 128.
** Works for single color channel images only **

Code
  • Part of VLFEAT implementation by Andrea Vedaldi: http://www.vlfeat.org/api/index.html
  • KeyPoint::size is a function of Sigma where Sigma corresponds to the scale of that key-point. Given DrawMatchesFlags::DRAW_RICH_KEYPOINTS flag, drawKeyPoints()  will draw a circle centering at the feature point location. The KeyPoint::size is used to determine that diameter.
Sample
  • Fewer results (keypoints) by increasing the contrast threshold and decrease the edge threshold (difference between biggest 2 gradients).
  • Background aside, feature points mostly located around the eyes, eyebrows, where the hair-lines meet the face, hair (brunette), lips, teeth, cheek, nose-tip from looking straight.
  • Fewer points when the face turns a little to one side, with half of the face is shaded. In this case, the default thresholds give better results.
  • Works fine with Obama portrait. Although, the original sized picture (1916x2608) is causing memory failure at Sift::prepareBuffers().
Readings
  • Local Features Tutorial Nov 8 2004, F. Estrada & A. Jepson & D. Fleet
  • CS 664 Lecture #21 "SIFT, object recognition, dynamic programming", Cornell University
  • Distinctive Image Feature from Scale-Invariant Keypoints, David Lowe 2004
  • Implementing the Scale Invariant Feature Transform(SIFT) Method, YU MENG, Dr. B Tidderman

No comments:

Post a Comment