当前位置:网站首页>Deep learning - goal orientation

Deep learning - goal orientation

2022-06-30 07:43:00 Hair will grow again without it

target location

We are familiar with the task of image classification , The algorithm traverses the image , Judge whether the object is a car , This is it. Image classification . In this lesson, we will learn another problem of constructing neural networks , That is, the problem of location and classification . It means , We It is not only necessary to use the algorithm to judge whether the picture is a car , And mark its position in the picture , Circle the car with a border or red box , This is it. Locate the classification problem . among “ location ” It means to judge the specific position of the car in the picture .

You are no stranger to the problem of image classification , for example , Input a picture into a multilayer convolutional neural network . This is called convolutional neural network , It will output an eigenvector , And feedback to softmax Unit to predict the picture type . If you are building a car auto drive system , Then objects may include the following categories : Pedestrians 、 automobile 、 Motorcycle and background , This means that the first three objects are not included in the picture , That is to say, there are no pedestrians in the picture 、 Cars and motorcycles , The output will be a background object , These four categories are softmax The possible output of the function .
This is the standard classification process , If you also want to locate the car in the picture , How do you do that ? We The neural network can output several more units , Output a bounding box . Specifically, let the neural network output more 4 A digital , Marked as 𝑏𝑥,𝑏𝑦,𝑏ℎ and 𝑏𝑤, These four numbers are the parametric representation of the bounding box of the detected object .
Let's start by agreeing on the symbols that will be used in this week's course , picture The coordinates of the upper left corner are (0,0), The lower right corner is marked with (1,1). To determine the exact location of the bounding box , You need to specify a red square Center point , This point is expressed as **(𝑏𝑥,𝑏𝑦)**, The height of the bounding box is 𝑏ℎ, Width is 𝑏𝑤. Therefore, the training set contains not only the object classification labels to be predicted by the neural network , Also include the four numbers that represent the bounding box , Then we use supervised learning algorithm , Output a category label , There are also four parameter values , Thus, the frame position of the detected object is given .
 Insert picture description here

Q: How to define goal tags for supervised learning tasks ?
A: Please note that , There are four categories , The output of the neural network is the four numbers and a classification label , Or the probability of the occurrence of classification labels . Target tag 𝑦 Is defined as follows :
 Insert picture description here
It's a vector , The first component 𝑝𝑐 Express Whether it contains objects , If the object belongs to the first three categories ( Pedestrians 、 automobile 、 The motorcycle ), be 𝑝𝑐 = 1, If it's the background , Then there is no object to be detected in the picture , be 𝑝𝑐 = 0. We can think of it this way 𝑝𝑐, It represents the probability that the detected object belongs to a certain classification , Except for background classification . If an object is detected , Output the bounding box parameters of the detected object 𝑏𝑥、𝑏𝑦、𝑏ℎ and 𝑏𝑤. Last , If there is an object , that 𝑝𝑐 = 1, At the same time output 𝑐1、𝑐2 and 𝑐3, Indicates that the object belongs to 1-3 What kind of class , It's pedestrians , Car or motorcycle .

example
Let's assume that the picture contains only one object , So for this classification and positioning problem , At most one of the objects will appear in the picture .
Suppose this is a picture of a training set , Marked as 𝑥, Pictured above Car pictures . And in the 𝑦 among , First element 𝑝𝑐 = 1, Because there is a car in the picture ,𝑏𝑥、𝑏𝑦、𝑏ℎ and 𝑏𝑤 Will indicate the location of the bounding box , So the label training set needs the boundary box of the label . In the picture is a car , So the result Belong to the category 2, Because the target is not a pedestrian or motorcycle , It's a car , therefore 𝑐1 = 0,𝑐2 = 1,𝑐3 = 0,𝑐1、𝑐2 and 𝑐3 At most one of them is equal to 1.
This is the case when there is only one detection object in the picture , What if there is no detected object in the picture ? What if the training sample is such a picture ?
In this case ,𝑝𝑐 = 0,𝑦 Other parameters of will become meaningless , Here I write it all in question marks , Express “ meaningless ” Parameters of , Because there is no detected object in the picture , So you don't have to consider the size of the bounding box in the network output , It doesn't need to consider that the object in the picture belongs to 𝑐1、𝑐2 and 𝑐3 What kind of .
For a given labeled training sample , Whether or not the picture contains a positioning object , Build input picture 𝑥 And classification labels 𝑦 The specific process is the same . These data ultimately define the training set .
 Insert picture description here

Loss function of neural network
The parameter is category 𝑦 And network output 𝑦^, If the square error strategy is used , be 𝐿(𝑦\^ , 𝑦) = (𝑦\^1 − 𝑦1)2 + (𝑦\^2 − 𝑦2)2 + ⋯ (𝑦\^8 − 𝑦8)2, The loss value is equal to the sum of the squares of the corresponding differences of each element .
If there is a positioning object in the picture , that 𝑦1 = 1, therefore 𝑦1 = 𝑝𝑐, similarly , If there is a positioning object in the picture ,𝑝𝑐 = 1, The loss value is the sum of the squares of the different elements .
The other case is ,𝑦1 = 0, That is to say 𝑝𝑐 = 0, The loss value is (𝑦1^ − 𝑦1)2, Because in this case , We don't have to think about other elements , Just focus on the neural network output 𝑝𝑐 The accuracy of .

原网站

版权声明
本文为[Hair will grow again without it]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/181/202206300722174848.html