当前位置：网站首页>Deep learning -- Realization of convolution by sliding window

Deep learning -- Realization of convolution by sliding window

2022-06-30 07:44:00 【Hair will grow again without it】

Convoluted sliding window

The transformation from the fully connected layer to the convoluted layer

In order to construct convolution application of sliding window , First of all, we need to know how to transform the full connection layer of neural network into convolution layer .
Suppose the object detection algorithm inputs a 14×14×3 Image , Here the filter size is 5×5, The number is 16,14×14×3 The image of is mapped to 10×10×16. And then by the parameter 2×2 Maximum pooling operation of , The image is reduced to 5×5×16. then Add a connection 400 Full connection layer of units , Then add a full connection layer , Finally through softmax Unit output 𝑦.

Now? The demonstration is how to convert these fully connected layers into convolution layers
Draw a convolution network like this , Its first layers are the same as before , And for the next layer , That is to say This whole connection layer , We can use 5×5 Filter to achieve , The number is 400 individual , The input image size is 5×5×16, use 5×5 It's convoluted by the filter , The filter is actually 5×5×16, Because in the convolution process , The filter will traverse this 16 Channels , So the number of channels in these two places must be the same , The output is 1×1. Hypothetical application 400 This one 5×5×16 filter , The output dimension is 1×1×400, We no longer think of it as a containing 400 Set of nodes , It is One 1×1×400 The output layer of . mathematically , It's the same as the full connection layer , Because of this 400 Each of the nodes has one 5×5×16 Filters for dimensions , So each value is a layer above these 5×5×16 The output of the activation value through an arbitrary linear function .
We Add another convolution layer , here It's using 1×1 Convolution , Suppose there is 400 individual 1×1 Filter , Here 400 Under the action of a filter , The next dimension is 1×1×400, It is actually the full connection layer in the last network . Finally through 1×1 Filter treatment , Get one softmax Activation value , Through convolution networks , We Finally get this 1×1×4 The output layer of , Not here 4 A digital

Through convolution to achieve sliding window object detection algorithm

Let's say we input... To the convolution network of the sliding window 14×14×3 Pictures of the , Same as before , The last output layer of neural network , namely softmax The output of the unit is 1×1×4.

hypothesis The size of the picture input to the convolution network is 14×14×3, The picture of the test set is 16×16×3, Now add a yellow bar to the input image , stay In the original sliding window algorithm , You're going to put this blue area into the convolution network （ Red pen mark ） Generate 0 or 1 classification . Then slide the window , The stride is 2 Pixel , Slide to the right 2 Pixel , Put this The green box area is input to the convolution network , Run the entire convolution network , Get another label 0 or 1. Continue to input the orange area to the convolution network , After convolution, we get another label , Finally, the purple area at the bottom right is convoluted for the last time . Here we are 16×16×3 Slide the window on the small image , Convolution network is running 4 Time , So I output 4 A label .
Final , In the output layer 4 In the sub Cube , Blue is the upper left part of the image 14×14 Output （ Red arrow sign ）, The upper right square is the upper right part of the image （ Green arrow sign ） The corresponding output of , The lower left corner box is the lower left corner of the input layer （ Orange arrow logo ）, That's it 14×14 The result of convolution network processing , Again , In the lower right corner, this block is the lower right corner of the convolution network processing input layer 14×14 Area ( Purple arrow logo ) Result .
So the principle of the convolution operation is that we don't need to divide the input image into four subsets , Carry out forward propagation respectively , It is Input them as a picture to convolution network for calculation , The public areas can share a lot of Computing , As we can see here 4 individual 14×14 It's like a box .
Look at a larger sample of pictures , If to one 28×28×3 The picture application of sliding window operation , If you run forward propagation in the same way , Finally get 8×8×4 Result . Because the maximum pooling parameter is 2, It's the size of 2 We use neural network on the original image .

summary ：
Cut an area out of the picture , Let's say its size is 14×14, Input it into the convolution network . Continue to enter the next area , The same size 14×14, Repeat , Until an area recognizes the car . But as you can see on the previous page , We can't rely on continuous convolution to identify the car in the picture , such as , We can have a size of 28×28 The entire image of the convolution operation , Get all the predictions at once , If you're lucky , The neural network can identify the location of the car .

The above is the application of sliding window algorithm on convolution layer , It improves the efficiency of the whole algorithm . But this algorithm still has a disadvantage , The position of the bounding box may not be accurate enough .