当前位置：网站首页>Deep learning - bounding box prediction

Deep learning - bounding box prediction

2022-06-30 07:44:00 【Hair will grow again without it】

Bounding Box

Why predict ？

The last blog explained the convolution implementation of sliding window method , This algorithm is more efficient , But there are still problems , Can't output the most accurate bounding box , In the sliding window method , You take these discrete sets of positions , Then run the classifiers on them , under these circumstances , None of these bounding boxes perfectly match the car's position .

One of the algorithms to get a more accurate bounding box is YOLO Algorithm ,YOLO(You only look once) It means you only watch it once , This is from Joseph Redmon,Santosh Divvala,Ross Girshick and Ali Farhadi Proposed algorithm .
That's what it does , For example, your input image is 100×100 Of , Then put a grid on the image . In order to introduce it more simply , I use 3×3 grid , The actual implementation will use a finer grid , May be 19×19. The basic idea is to use image classification and location algorithm , Apply algorithm to 9 On a grid .（ The basic idea is , use Image classification and location algorithm , One by one in the image of 9 In a grid .） A little bit more specific , You need to define the training label like this , So for 9 Each of the cells is assigned a label 𝑦,𝑦 yes 8 Dimensional , As you saw before .

Let's see. Upper left grid , Here it is. , There is nothing in it , So the label vector of the upper left grid 𝑦 yes [ 0???] . Then the output label of this grid 𝑦 Is the same , This lattice （ Number 3）, There are other squares that don't have anything .
More specifically , This picture has Two objects ,YOLO What the algorithm does is , Take the midpoint of two objects , Then assign the object to the grid containing the midpoint of the object . So even if the central grid （ Number 5） Part of two cars at the same time , We pretend that the central grid doesn't have any objects we're interested in , So for the central grid , Category labels 𝑦 It's similar to this vector , It's similar to this vector without an object , namely 𝑦 = [ 0???] .
The grid of the green wireframe column and the grid of the orange wireframe column contain the midpoint of the object , The corresponding vectors are the vectors written by the rightmost green pen and the blue pen ,𝑝𝑐 = 1, Then you write 𝑏𝑥、𝑏𝑦、𝑏ℎ and 𝑏𝑤 To specify the bounding box location , And then there are categories 1 It's pedestrians , that 𝑐1 = 0, Category 2 It's a car , therefore 𝑐2 = 1, Category 3 It's a motorcycle , Then the value 𝑐3 = 0.
So for here 9 Any one of the squares , You'll get one 8 Dimension output vector , Because this is 3×3 The grid of , So there is 9 Lattice , The total output size is 3×3×8, So the target output is 3×3×8.
So this algorithm is The advantage is that the neural network can output accurate bounding boxes , So when it comes to testing , What you do is feed the input image 𝑥, Then run forward to spread , Until you get this output 𝑦.

Notice how to allocate the grid where the object is located
The process of assigning objects to a lattice is , You look at the midpoint of the object , And then assign this object to the grid where the middle point is , So even if the object can span more than one grid , It will only be assigned to 9 One of the squares , Namely 3×3 One of the squares of the network , perhaps 19×19 One of the squares of the network . stay 19×19 In Grid , The midpoint of two objects （ The blue dot in the picture shows ） The probability of being in the same lattice is lower .

advantage
It explicitly outputs the bounding box coordinates , So this allows the neural network to output the bounding box , It can have any aspect ratio , And can
Output more accurate coordinates , It is not limited by the step size of sliding window classifier .
This is a convolution implementation , You are not in 3×3 Running on the grid 9 Sub algorithm , perhaps , If you're using a 19×19 The grid of ,19 The square is 361 Time , So you don't have to run the same algorithm 361 Time . contrary , This is a single convolution implementation , But you use a convolution network , There are many shared computing steps , Dealing with this 3×3 Many computing steps in computing are shared , Or your 19×19 The grid of , So this algorithm is very efficient .
Because this is a convolution implementation , In fact, it runs very fast , It can achieve real-time identification .

原网站

版权声明
本文为[Hair will grow again without it]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206300722174525.html

当前位置：网站首页>Deep learning - bounding box prediction

Deep learning - bounding box prediction

Bounding Box

Why predict ？

边栏推荐

猜你喜欢

随机推荐