当前位置:网站首页>Deep learning - goal orientation
Deep learning - goal orientation
2022-06-30 07:43:00 【Hair will grow again without it】
target location
We are familiar with the task of image classification , The algorithm traverses the image , Judge whether the object is a car , This is it.
Image classification. In this lesson, we will learn another problem of constructing neural networks , That is, the problem of location and classification . It means , We It is not only necessary to use the algorithm to judge whether the picture is a car , And mark its position in the picture , Circle the car with a border or red box , This is it.Locate the classification problem. among “ location ” It means to judge the specific position of the car in the picture .
You are no stranger to the problem of image classification , for example , Input a picture into a multilayer convolutional neural network . This is called convolutional neural network , It will output an eigenvector , And feedback to softmax Unit to predict the picture type . If you are building a car auto drive system , Then objects may include the following categories : Pedestrians 、 automobile 、 Motorcycle and background , This means that the first three objects are not included in the picture , That is to say, there are no pedestrians in the picture 、 Cars and motorcycles , The output will be a background object , These four categories are softmax The possible output of the function .
This is the standard classification process , If you also want to locate the car in the picture , How do you do that ? We The neural network can output several more units , Output a bounding box . Specifically, let the neural network output more 4 A digital , Marked as𝑏𝑥,𝑏𝑦,𝑏ℎand𝑏𝑤, These four numbers are the parametric representation of the bounding box of the detected object .
Let's start by agreeing on the symbols that will be used in this week's course , picture The coordinates of the upper left corner are (0,0), The lower right corner is marked with (1,1). To determine the exact location of the bounding box , You need to specify a red square Center point , This point is expressed as **(𝑏𝑥,𝑏𝑦)**, The height of the bounding box is 𝑏ℎ, Width is 𝑏𝑤. Therefore, the training set contains not only the object classification labels to be predicted by the neural network , Also include the four numbers that represent the bounding box , Then we use supervised learning algorithm , Output a category label , There are also four parameter values , Thus, the frame position of the detected object is given .
Q: How to define goal tags for supervised learning tasks ?
A: Please note that , There are four categories , The output of the neural network is the four numbers and a classification label , Or the probability of the occurrence of classification labels . Target tag 𝑦 Is defined as follows :
It's a vector , The first component𝑝𝑐Express Whether it contains objects , If the object belongs to the first three categories ( Pedestrians 、 automobile 、 The motorcycle ), be 𝑝𝑐 = 1, If it's the background , Then there is no object to be detected in the picture , be 𝑝𝑐 = 0. We can think of it this way 𝑝𝑐, It represents the probability that the detected object belongs to a certain classification , Except for background classification . If an object is detected , Output the bounding box parameters of the detected object𝑏𝑥、𝑏𝑦、𝑏ℎand𝑏𝑤. Last , If there is an object , that 𝑝𝑐 = 1, At the same time output𝑐1、𝑐2and𝑐3, Indicates that the object belongs to 1-3 What kind of class , It's pedestrians , Car or motorcycle .
example
Let's assume that the picture contains only one object , So for this classification and positioning problem , At most one of the objects will appear in the picture .
Suppose this is a picture of a training set , Marked as 𝑥, Pictured above Car pictures . And in the 𝑦 among , First element 𝑝𝑐 = 1, Because there is a car in the picture ,𝑏𝑥、𝑏𝑦、𝑏ℎ and 𝑏𝑤 Will indicate the location of the bounding box , So the label training set needs the boundary box of the label . In the picture is a car , So the result Belong to the category 2, Because the target is not a pedestrian or motorcycle , It's a car , therefore 𝑐1 = 0,𝑐2 = 1,𝑐3 = 0,𝑐1、𝑐2 and 𝑐3 At most one of them is equal to 1.
This is the case when there is only one detection object in the picture , What if there is no detected object in the picture ? What if the training sample is such a picture ?
In this case ,𝑝𝑐 = 0,𝑦 Other parameters of will become meaningless , Here I write it all in question marks , Express “ meaningless ” Parameters of , Because there is no detected object in the picture , So you don't have to consider the size of the bounding box in the network output , It doesn't need to consider that the object in the picture belongs to 𝑐1、𝑐2 and 𝑐3 What kind of .
For a given labeled training sample , Whether or not the picture contains a positioning object , Build input picture 𝑥 And classification labels 𝑦 The specific process is the same . These data ultimately define the training set .
Loss function of neural network
The parameter is category 𝑦 And network output 𝑦^, If the square error strategy is used , be𝐿(𝑦\^ , 𝑦) = (𝑦\^1 − 𝑦1)2 + (𝑦\^2 − 𝑦2)2 + ⋯ (𝑦\^8 − 𝑦8)2, The loss value is equal to the sum of the squares of the corresponding differences of each element .
If there is a positioning object in the picture , that𝑦1 = 1, therefore𝑦1 = 𝑝𝑐, similarly , If there is a positioning object in the picture ,𝑝𝑐 = 1, The loss value is the sum of the squares of the different elements .
The other case is ,𝑦1 = 0, That is to say𝑝𝑐 = 0, The loss value is(𝑦1^ − 𝑦1)2, Because in this case , We don't have to think about other elements , Just focus on the neural network output 𝑝𝑐 The accuracy of .
边栏推荐
- 深度学习——特征点检测和目标检测
- Firewall firewalld
- 深度学习——LSTM
- Proteus catalog component names and Chinese English cross reference table
- PMIC power management
- HelloWorld
- Combinatorial mathematics Chapter 1 Notes
- 为什么大学毕业了还不知道干什么?
- Stepper motor
- December 4, 2021 [metagenome] - sorting out the progress of metagenome process construction
猜你喜欢
随机推荐
24C02
2022 retail industry strategy: three strategies for consumer goods gold digging (in depth)
STM32 infrared communication
Firewall firewalld
Distance from point to line
6月底了,可以开始做准备了,不然这么赚钱的行业就没你的份了
期末复习-PHP学习笔记5-PHP数组
2021 private equity fund market report (62 pages)
STM32 infrared communication 2
【花雕体验】14 行空板pinpong库测试外接传感器模块(之一)
C language implementation sequence stack
深度学习——语言模型和序列生成
Introduction notes to pytorch deep learning (XII) neural network - nonlinear activation
期末复习-PHP学习笔记2-PHP语言基础
期末複習-PHP學習筆記6-字符串處理
期末复习-PHP学习笔记11-PHP-PDO数据库抽象层.
Combinatorial mathematics Chapter 2 Notes
Analysys analysis: online audio content consumption market analysis 2022
期末复习-PHP学习笔记4-PHP自定义函数
November 9, 2020 [wgs/gwas] - whole genome analysis (association analysis) process (Part 2)











