当前位置:网站首页>Deep learning - goal orientation
Deep learning - goal orientation
2022-06-30 07:43:00 【Hair will grow again without it】
target location
We are familiar with the task of image classification , The algorithm traverses the image , Judge whether the object is a car , This is it.
Image classification. In this lesson, we will learn another problem of constructing neural networks , That is, the problem of location and classification . It means , We It is not only necessary to use the algorithm to judge whether the picture is a car , And mark its position in the picture , Circle the car with a border or red box , This is it.Locate the classification problem. among “ location ” It means to judge the specific position of the car in the picture .
You are no stranger to the problem of image classification , for example , Input a picture into a multilayer convolutional neural network . This is called convolutional neural network , It will output an eigenvector , And feedback to softmax Unit to predict the picture type . If you are building a car auto drive system , Then objects may include the following categories : Pedestrians 、 automobile 、 Motorcycle and background , This means that the first three objects are not included in the picture , That is to say, there are no pedestrians in the picture 、 Cars and motorcycles , The output will be a background object , These four categories are softmax The possible output of the function .
This is the standard classification process , If you also want to locate the car in the picture , How do you do that ? We The neural network can output several more units , Output a bounding box . Specifically, let the neural network output more 4 A digital , Marked as𝑏𝑥,𝑏𝑦,𝑏ℎand𝑏𝑤, These four numbers are the parametric representation of the bounding box of the detected object .
Let's start by agreeing on the symbols that will be used in this week's course , picture The coordinates of the upper left corner are (0,0), The lower right corner is marked with (1,1). To determine the exact location of the bounding box , You need to specify a red square Center point , This point is expressed as **(𝑏𝑥,𝑏𝑦)**, The height of the bounding box is 𝑏ℎ, Width is 𝑏𝑤. Therefore, the training set contains not only the object classification labels to be predicted by the neural network , Also include the four numbers that represent the bounding box , Then we use supervised learning algorithm , Output a category label , There are also four parameter values , Thus, the frame position of the detected object is given .
Q: How to define goal tags for supervised learning tasks ?
A: Please note that , There are four categories , The output of the neural network is the four numbers and a classification label , Or the probability of the occurrence of classification labels . Target tag 𝑦 Is defined as follows :
It's a vector , The first component𝑝𝑐Express Whether it contains objects , If the object belongs to the first three categories ( Pedestrians 、 automobile 、 The motorcycle ), be 𝑝𝑐 = 1, If it's the background , Then there is no object to be detected in the picture , be 𝑝𝑐 = 0. We can think of it this way 𝑝𝑐, It represents the probability that the detected object belongs to a certain classification , Except for background classification . If an object is detected , Output the bounding box parameters of the detected object𝑏𝑥、𝑏𝑦、𝑏ℎand𝑏𝑤. Last , If there is an object , that 𝑝𝑐 = 1, At the same time output𝑐1、𝑐2and𝑐3, Indicates that the object belongs to 1-3 What kind of class , It's pedestrians , Car or motorcycle .
example
Let's assume that the picture contains only one object , So for this classification and positioning problem , At most one of the objects will appear in the picture .
Suppose this is a picture of a training set , Marked as 𝑥, Pictured above Car pictures . And in the 𝑦 among , First element 𝑝𝑐 = 1, Because there is a car in the picture ,𝑏𝑥、𝑏𝑦、𝑏ℎ and 𝑏𝑤 Will indicate the location of the bounding box , So the label training set needs the boundary box of the label . In the picture is a car , So the result Belong to the category 2, Because the target is not a pedestrian or motorcycle , It's a car , therefore 𝑐1 = 0,𝑐2 = 1,𝑐3 = 0,𝑐1、𝑐2 and 𝑐3 At most one of them is equal to 1.
This is the case when there is only one detection object in the picture , What if there is no detected object in the picture ? What if the training sample is such a picture ?
In this case ,𝑝𝑐 = 0,𝑦 Other parameters of will become meaningless , Here I write it all in question marks , Express “ meaningless ” Parameters of , Because there is no detected object in the picture , So you don't have to consider the size of the bounding box in the network output , It doesn't need to consider that the object in the picture belongs to 𝑐1、𝑐2 and 𝑐3 What kind of .
For a given labeled training sample , Whether or not the picture contains a positioning object , Build input picture 𝑥 And classification labels 𝑦 The specific process is the same . These data ultimately define the training set .
Loss function of neural network
The parameter is category 𝑦 And network output 𝑦^, If the square error strategy is used , be𝐿(𝑦\^ , 𝑦) = (𝑦\^1 − 𝑦1)2 + (𝑦\^2 − 𝑦2)2 + ⋯ (𝑦\^8 − 𝑦8)2, The loss value is equal to the sum of the squares of the corresponding differences of each element .
If there is a positioning object in the picture , that𝑦1 = 1, therefore𝑦1 = 𝑝𝑐, similarly , If there is a positioning object in the picture ,𝑝𝑐 = 1, The loss value is the sum of the squares of the different elements .
The other case is ,𝑦1 = 0, That is to say𝑝𝑐 = 0, The loss value is(𝑦1^ − 𝑦1)2, Because in this case , We don't have to think about other elements , Just focus on the neural network output 𝑝𝑐 The accuracy of .
边栏推荐
- STM32 infrared communication
- Xiashuo think tank: 125 planet updates reported today (packed with 101 meta universe collections)
- STM32 infrared communication 2
- 25岁,从天坑行业提桶跑路,在经历千辛万苦转行程序员,属于我的春天终于来了
- 【花雕体验】14 行空板pinpong库测试外接传感器模块(之一)
- Lodash filter collection using array of values
- Graphic explanation pads update PCB design basic operation
- Basic theory of four elements and its application
- Quick placement of devices by module in Ad
- Combinatorial mathematics Chapter 1 Notes
猜你喜欢

Examen final - notes d'apprentissage PHP 6 - traitement des chaînes

Examen final - notes d'apprentissage PHP 3 - Déclaration de contrôle du processus PHP

Introduction notes to pytorch deep learning (10) neural network convolution layer

Tencent and Fudan University "2021-2022 yuan universe report" with 102 yuan universe collections

Final review -php learning notes 7-php and web page interaction

At the age of 25, I started to work in the Tiankeng industry with buckets. After going through a lot of hardships to become a programmer, my spring finally came

【花雕体验】14 行空板pinpong库测试外接传感器模块(之一)

深度学习——GRU单元
![February 14, 2022 [reading notes] - life science based on deep learning Chapter 2 Introduction to deep learning (Part 1)](/img/ff/e4df5a66cda74ee0d71015b7d1a462.jpg)
February 14, 2022 [reading notes] - life science based on deep learning Chapter 2 Introduction to deep learning (Part 1)

深度学习——特征点检测和目标检测
随机推荐
December 19, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)
Final review -php learning notes 7-php and web page interaction
Introduction notes to pytorch deep learning (10) neural network convolution layer
November 21, 2021 [reading notes] - bioinformatics and functional genomics (Chapter 5 advanced database search)
Experiment 1: comprehensive experiment [process on]
STM32 infrared communication 3 brief
right four steps of SEIF SLAM
C language implementation sequence stack
深度学习——词汇表征
Processes, jobs, and services
String application -- string violent matching (implemented in C language)
Program acceleration
November 22, 2021 [reading notes] - bioinformatics and functional genomics (Section 5 of Chapter 5 uses a comparison tool similar to blast to quickly search genomic DNA)
Simple application of generating function -- integer splitting 2
Sublime text 3 configuring the C language running environment
Installation software operation manual (continuous update)
Inversion Lemma
深度学习——卷积的滑动窗口实现
Directory of software
PMIC power management


