当前位置:网站首页>Geospatial search: implementation principle of KD tree

Geospatial search: implementation principle of KD tree

2022-06-25 12:45:00 Wenxiaowu

Previous review :

Suppose we now have a data set , When given a new instance , The simplest and crudest way is to calculate its distance from all points , And then find k The nearest neighbor , Finally, judge the class of this instance according to the rule of majority voting .

But if there are many and intensive training examples in this dataset , And an instance has many features , Then you have to calculate thousands of distances , The amount of computation is extremely huge .

At this time , We can use a faster calculation method ——kd Trees .

What is? kd Trees

kd Trees look fundamentally , It's a binary tree structure , According to this structure k Continuous division of dimensional space , Each node represents k Dimension super rectangle area .

A two-dimensional (k=2) The rectangular area of :

The three dimensional (k=3) The rectangular area of :

Be careful ,kd The tree here k Represents the number of features , That is, the dimension of data , and k In the neighborhood k It refers to the closest to the new instance point k A neighbor .

How to construct kd Trees

principle

Input : k Dimensional data sets :

xi=(xi(1),xi(2),...,xi(k))T,xi(l) The superscript of represents the th l Eigenvalues

Output : kd Trees

1. Start : Construct the root node .

Select an optimal feature at the root node for cutting , It is generally determined by comparing the variance of each feature , The feature with the largest variance is what we want to choose Axis .

If we choose x(1), Take it as the coordinate axis , Find the syncopation point , Cut the super rectangular area into two sub areas . Usually selected x(1) The median of the data in the direction is used as the cut-off point , The depth generated from the root node is 1 Left and right child nodes of , The coordinates of the left node are smaller than the tangent point , The coordinate of the right node is greater than the tangent point .

2. repeat : Selection and cutting of residual features

Continue to the depth of j The node of , choice x(1) Is the tangent axis ,l=j(mod k)+1, Take all instances in the node area x(1) The median of the coordinates is used as the tangent point , The region is continuously divided into two sub regions .

3. stop it : obtain kd Trees

Stop segmentation until there are no instances in the two sub regions , That is to get a tree kd Trees .

Example explanation

Input :

Output : kd Trees

For convenience , Mark the data points in the data set for visual display :

1. For the first time

Because the dimension of the training data set is 2, Then select any feature . Choose x(1) For the axis , take x(1) The data in is sorted from small to large , Namely :2,4,5,7,8,9

The median might as well choose 7, namely With (7,2) Is the root node , Slice the whole area .

On the left is less than 7 The child node of , The one on the right is larger than 7 The child node of .

2. Second slice

Divide the area again :

Add... To the first feature 1, With x(2) For the axis .

For the left area after the first segmentation , take x(2) The data in is sorted from small to large , Namely :2,3,4,7

The median is 4, The coordinates of the syncopation point are  (5,4), Draw another horizontal line for the second segmentation .

similarly , For the right area after the first segmentation , take x(2) The data in is sorted from small to large , Namely :1,6

Because there are only two points , Choose 6 As a syncopation point , The coordinates of the tangent point in the right area are  (9,6) , Draw a horizontal line for the second segmentation .

3. Continue to cut

The second feature adds 1, With x(3) For the axis , And only here x(1) and x(2), So only for x(1) division . Or directly calculate 2(mod 2)+1=1 You can get the selected feature .

It's obvious that A(2,3),D(4,7),E(8,1)  The three instance points have not been segmented yet , Then take these points as root nodes and draw a line perpendicular to x(1) Cut the line of the axis .

4. draw kd Trees

The root nodes in these divisions are :

first stage :(7,2)

Level second :(5,4),(9,6)

Level third :(2,3),(4,7),(8,1)

Draw... According to this level kd Trees :

How to search kd Trees

principle

Input :  Constructed kd Trees , The target point is x

Output : x The nearest neighbor of

seek “ The nearest point at the moment ”:  Starting from the root node , Recursive access kd Trees , Find out include x Leaf nodes of , This leaf node is “ The nearest point at the moment ”

to flash back :  Take the target point and ” The nearest point at the moment “ Backtracking and iteration along the root of the tree , The current nearest point must exist in the region corresponding to a child node of the node , Check whether there is a closer point in the area corresponding to another child node of the parent node of the child node .

When you go back to the root node , End of the search , final ” The nearest point at the moment “ That is to say x The nearest neighbor of .

Example explanation

The tree generated above kd Trees, for example :

Case 1

Input : kd Trees , The target point x=(2.1,3.1)

Output :  The nearest neighbor of the target

Find the nearest point :  Start at root ,x=(2.1,3.1) At the root node (7,2) In the left sub region of , Continue to (5,4) In the determined left sub region , Continue to (2,3) In the right sub region of ,(2,3) Is the nearest neighbor .

to flash back :  With (2.1,3.1) For the center of a circle , Draw a circle with its distance from the determined nearest neighbor as the radius , There are no other points in this area , That proves (2,3) yes (2.1,3.1) The nearest neighbor of .

Case 2

Input : kd Trees , The target point x=(2,4.5)

Output :  The nearest neighbor of the target point

Find the nearest point :  Start at root ,x=(2,4.5) At the root node (7,2) In the left sub region of , Continue to (5,4) Within the determined upper sub region , Continue to (4,7) In the left sub region of ,(4,7) Is the nearest neighbor .

to flash back :  We use (2,4.5) For the center of a circle , With (2,4.5) To (4,7) The distance between two points draws a circle for the radius , There are two nodes in this area , Namely (2,3) and (5,4), By calculation (2,4.5) The distance to these two points , Come to (2,3) The closest , that (2,3) Is the nearest point . Then with (2,4.5) For the center of a circle , With (2,4.5) To (2,3) The distance between two points draws a circle for the radius , There are no other nodes in the circle , The description can confirm (2,3) Namely (2,4.5) The nearest neighbor of .

原网站

版权声明
本文为[Wenxiaowu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/176/202206251209319324.html