Key Point Detection
- Build Integral Image
- Uses Pyramid of Filter (not image) to approximate Laplace of Gaussian (LoG), supposedly run faster than SIFT, which uses DoG for approximation.
- Filters will span several octaves and with fixed number of scales in each, similar to SIFT. This type of process is Scale-Space Analysis.
- Integral image helps to keep the running speed constant as it is insensitive to increasing filter sizes.
- Uses discrete Box Filter to calculate Hessian determinant. Box size in multiples of its current scale.
- All filters run on the the original image instead of iterative like SIFT, allowing parallel execution.
- Feature points are maxima of the determinants in the adjacent scale and points (3x3x3), similar to SIFT.
Orientation Assignment
- The responses from 2 first-order Haar wavelet filters (1, -1), in dx and dy orientations, are collected on each feature point. The responses are put on a 2D plane as vectors [dx, dy].
- Again, the Integral Image would help with this box filter.
- An orientation-window of 60-degree-angle is slided around the origin at x-y plane. Each angle will have a corresponding sum of magnitudes for every vectors inside that window.
- The dominant orientation would be the angle of the largest sum.
Descriptor Extraction
- The descriptor is 64-element vector, half the size of SIFT. Each represents the intensity property of a 4x4 square within the "interest region". The intensity property is 4-element tuple [ Sum(dx), Sum(abs(dx)), Sum(dy), Sum(abs(dy)) ]. The Sum is calculated from the Haar wavelet response of a 5x5 sample window.
- 4x4x4 = 64
- Implying that there are more than 1 4x4 square at each key-point location. I suppose the size of the region is related to the associated scale.
Matching
- The sign (+ive/-ive) of the Trace of Laplacian (Hessian Matrix) is used in matching phase, speeding up the process.
Code
- It is credited to Liu Liu with modifications by Ian Mahon. The authors state the current gotchas and gives a good overview of the code at the file (surf.cpp) header.
- The code did try to exploit parallelism using OpenMP (cv::parallel_for).
- Define some parameter structure and default values similar SIFT in the future (start with cvSURFParams?).
- The 'extended' parameter changes the descriptor size from 64 to 128. It is default to '1' in the code, but it has default argument value of 'false' at the SURF::SURF(...) constructor.
- Why not integrate OpenSURF by Chris Evans? (With hyperlink to CMU CV site with Image Databases and other CV resources).
Sample
- With default parameters, SIFT takes 3-4 seconds and detect 872 points on Kobe Bryants picture. SURF takes less than 1 second to detect 413 points.
- SURF is able to handle the Obama picture at its original resolution, unlike SIFT. Takes 11 seconds to detect 2690 points.
- Decrease/Increase the number of detected points by increasing/lowering the threshold argument. Only the points with Hessian response higher than thresholds are considered.
Commercial Application (academic spin-off)
- Kooaba - visual search
Readings
- Speeded-Up Robust Features (SURF). Herbert Bay et al.
- CPSC 643 Presentation of SURF, Herbert Bay et al.
- Non-Maxima Suppression Demo
- Wavelets in Multiresolution Analysis by Tom Germano
- A Tutorial on Wavelets and their Applications by Martin J. Mohlenkamp at the University of Colorado, Boulder
- WAVELETS FOR KIDS - A Tutorial Introduction by B. Vidakovic, P. Mueller at Duke University.
- Wavelets: A Tutorial by Zhuming Lam (Apr-2002)
- The term 'repeatability' (seen on SIFT and SURF papers) is a measure of the ability to detect the same set of key-point from various viewpoints. For example, the rotation of Van Gogh image (Fig 3 of the SURF paper by Bay et al.).
- Origin of this term: See "Evaluation of Interest Point Detectors" paper by Shmid et al
No comments:
Post a Comment