OpenCV Adventure: Algorithms used in FLANN

Data-Structures

K-D Tree
K-D stands for K-Dimension. It is a type of binary tree for multi-dimension vectors. It is balanced when it is built by splitting the nodes at the median values. Choose the dimension that could divide the samples in half (largest variance) at each level. The tree is built with a training set of feature vectors. It is used to find query specific value or value ranges, nearest neighbors.
Randomized K-D Tree
Improve the approximation of nearest neighbors from the query point by searching simultaneously across a number of randomized trees. The tree are built from the same set of samples. The author argues that in practice the variances among dimensions do not vary much. Randomly picking from the highest few will be enough to make the different trees. (RKD) It could be further improved by performing PCA. Re-align the data to the highest principle components. (PKD)
Hierarchical K-Means Tree
It is a tree of which every inner node is split in K-ways. K-means clustering is used to classify the data subset at each node. A L level tree would have approximately K^L leaf nodes. They are all at level L.
The papers describe how to apply text-retrieval methodologies to object-retrieval from video. The feature descriptors are 'visual vocabulary'. The descriptors are clustered (quantized). Uses hierarchical tree to improve efficiency. A inverted-book (index) is built (offline) to keep track of where the vocabulary appears (on frame(s) of a video).

ANN Search

On K-D Tree
Start backtracking after reaching the terminal node from the initial depth first search. Examine, update-closest node list or eliminate neighboring sub-trees by checking the 'distances' from the query point. For approximation, only a fixed number of leaf nodes will be looked at during back-tracking. A priority queue of nodes to be examined is maintained. High priority is given to the closer 'cells'.
On RKD Tree
The priority queue will be shared across all trees. In other words, the next cell to examine will be based on the closest cell-to-query-point distance among ALL trees.
On Hierarchical K-Means Tree
The "Vocabulary Tree" paper presents methods in looking up image from an image database. The primary object in the image will have its features looked up from the tree. Assuming that similar images has the similar set of feature points, the path on which the features arrives at the leaf nodes should be 'close-enough'. Each leaf node (vocabulary) is characterized the list of inner nodes it passed through. It is a vector of weighted nodes. The weighting method is based on ID-ITF used by text retrieval. Simply put, a 'similarity' will rank highly if it happen frequently in this frame and the feature does not occur often in the video. The candidate images from database could be looked up from the inverted files. The images are ranked according to a score based on ID-ITF. I suppose each candidate image will have a score that is based on the differences from the query-image of how the feature traverse the tree. Other techniques are used to filter for more precise results: stop-lists, spatial-consistency.

Other note

The Google Video paper uses both MSER to detect blobs and shape detectors to find corners. They complement each other. Uses Mahalanobis distance to measure distance between SIFT descriptors.

Readings

[ FLANN ] http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN
There is link to both User Manual of FLANN library and Academic Paper behind it.
[ K-D Tree ] Multidimensional Binary Search Trees Used for Associative Searching, Jon Louis Bentley
[ NN search on K-D Tree ] AN ALGORITHM FOR FINDING BEST MATCHES IN LOGARITHMIC EXPECTED TIME, Friedman et al.
[ ANN search on K-D Tree ] Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces, Beis and Lowe
[ Randomized K-D Trees ] Optimised KD-trees for fast image descriptor matching, Silpa-Anan and Hartley
[ Hierarchical K-means Trees ] Scalable Recognition with a Vocabulary Tree, Nister and Stewenius
[ K-means clustering on Video Vocabulary ] Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman.

OpenCV Adventure

Thursday, January 27, 2011

Algorithms used in FLANN

No comments:

Post a Comment