当前位置:网站首页>Deep learning - goal orientation
Deep learning - goal orientation
2022-06-30 07:43:00 【Hair will grow again without it】
target location
We are familiar with the task of image classification , The algorithm traverses the image , Judge whether the object is a car , This is it.
Image classification
. In this lesson, we will learn another problem of constructing neural networks , That is, the problem of location and classification . It means , We It is not only necessary to use the algorithm to judge whether the picture is a car , And mark its position in the picture , Circle the car with a border or red box , This is it.Locate the classification problem
. among “ location ” It means to judge the specific position of the car in the picture .
You are no stranger to the problem of image classification , for example , Input a picture into a multilayer convolutional neural network . This is called convolutional neural network , It will output an eigenvector , And feedback to softmax Unit to predict the picture type . If you are building a car auto drive system , Then objects may include the following categories : Pedestrians 、 automobile 、 Motorcycle and background , This means that the first three objects are not included in the picture , That is to say, there are no pedestrians in the picture 、 Cars and motorcycles , The output will be a background object , These four categories are softmax The possible output of the function .
This is the standard classification process , If you also want to locate the car in the picture , How do you do that ? We The neural network can output several more units , Output a bounding box . Specifically, let the neural network output more 4 A digital , Marked as𝑏𝑥,𝑏𝑦,𝑏ℎ
and𝑏𝑤
, These four numbers are the parametric representation of the bounding box of the detected object .
Let's start by agreeing on the symbols that will be used in this week's course , picture The coordinates of the upper left corner are (0,0), The lower right corner is marked with (1,1). To determine the exact location of the bounding box , You need to specify a red square Center point , This point is expressed as **(𝑏𝑥,𝑏𝑦)**, The height of the bounding box is 𝑏ℎ, Width is 𝑏𝑤. Therefore, the training set contains not only the object classification labels to be predicted by the neural network , Also include the four numbers that represent the bounding box , Then we use supervised learning algorithm , Output a category label , There are also four parameter values , Thus, the frame position of the detected object is given .
Q: How to define goal tags for supervised learning tasks ?
A: Please note that , There are four categories , The output of the neural network is the four numbers and a classification label , Or the probability of the occurrence of classification labels . Target tag 𝑦 Is defined as follows :
It's a vector , The first component𝑝𝑐
Express Whether it contains objects , If the object belongs to the first three categories ( Pedestrians 、 automobile 、 The motorcycle ), be 𝑝𝑐 = 1, If it's the background , Then there is no object to be detected in the picture , be 𝑝𝑐 = 0. We can think of it this way 𝑝𝑐, It represents the probability that the detected object belongs to a certain classification , Except for background classification . If an object is detected , Output the bounding box parameters of the detected object𝑏𝑥、𝑏𝑦、𝑏ℎ
and𝑏𝑤
. Last , If there is an object , that 𝑝𝑐 = 1, At the same time output𝑐1、𝑐2
and𝑐3
, Indicates that the object belongs to 1-3 What kind of class , It's pedestrians , Car or motorcycle .
example
Let's assume that the picture contains only one object , So for this classification and positioning problem , At most one of the objects will appear in the picture .
Suppose this is a picture of a training set , Marked as 𝑥, Pictured above Car pictures . And in the 𝑦 among , First element 𝑝𝑐 = 1, Because there is a car in the picture ,𝑏𝑥、𝑏𝑦、𝑏ℎ and 𝑏𝑤 Will indicate the location of the bounding box , So the label training set needs the boundary box of the label . In the picture is a car , So the result Belong to the category 2, Because the target is not a pedestrian or motorcycle , It's a car , therefore 𝑐1 = 0,𝑐2 = 1,𝑐3 = 0,𝑐1、𝑐2 and 𝑐3 At most one of them is equal to 1.
This is the case when there is only one detection object in the picture , What if there is no detected object in the picture ? What if the training sample is such a picture ?
In this case ,𝑝𝑐 = 0,𝑦 Other parameters of will become meaningless , Here I write it all in question marks , Express “ meaningless ” Parameters of , Because there is no detected object in the picture , So you don't have to consider the size of the bounding box in the network output , It doesn't need to consider that the object in the picture belongs to 𝑐1、𝑐2 and 𝑐3 What kind of .
For a given labeled training sample , Whether or not the picture contains a positioning object , Build input picture 𝑥 And classification labels 𝑦 The specific process is the same . These data ultimately define the training set .
Loss function of neural network
The parameter is category 𝑦 And network output 𝑦^, If the square error strategy is used , be𝐿(𝑦\^ , 𝑦) = (𝑦\^1 − 𝑦1)2 + (𝑦\^2 − 𝑦2)2 + ⋯ (𝑦\^8 − 𝑦8)2
, The loss value is equal to the sum of the squares of the corresponding differences of each element .
If there is a positioning object in the picture , that𝑦1 = 1
, therefore𝑦1 = 𝑝𝑐
, similarly , If there is a positioning object in the picture ,𝑝𝑐 = 1, The loss value is the sum of the squares of the different elements .
The other case is ,𝑦1 = 0
, That is to say𝑝𝑐 = 0
, The loss value is(𝑦1^ − 𝑦1)2
, Because in this case , We don't have to think about other elements , Just focus on the neural network output 𝑝𝑐 The accuracy of .
边栏推荐
- 24C02
- Final review -php learning notes 7-php and web page interaction
- Analysis of cross clock transmission in tinyriscv
- Armv8 (coretex-a53) debugging based on openocd and ft2232h
- Installation software operation manual (continuous update)
- Disk space, logical volume
- Calculate Euler angle according to rotation matrix R yaw, pitch, roll source code
- STM32 infrared communication
- Tue Jun 28 2022 15:30:29 GMT+0800 (中国标准时间) 日期格式化
- 深度学习——循环神经网络
猜你喜欢
深度学习——卷积的滑动窗口实现
期末复习-PHP学习笔记4-PHP自定义函数
342 maps covering exquisite knowledge, one of which is classic and pasted on the wall
Final review -php learning notes 8-mysql database
National technology n32g45x series about timer timing cycle calculation
深度学习——特征点检测和目标检测
Periodic planning work
Introduction notes to pytorch deep learning (XII) neural network - nonlinear activation
2021 China Enterprise Cloud index insight Report
深度学习——残差网络ResNets
随机推荐
Assembly learning register
Lodash filter collection using array of values
Simple application of generating function
25岁,从天坑行业提桶跑路,在经历千辛万苦转行程序员,属于我的春天终于来了
期末复习-PHP学习笔记4-PHP自定义函数
ADC basic concepts
Arm debug interface (adiv5) analysis (I) introduction and implementation [continuous update]
Efga design open source framework fabulous series (I) establishment of development environment
Armv8 (coretex-a53) debugging based on openocd and ft2232h
Introduction notes to pytorch deep learning (11) neural network pooling layer
November 9, 2020 [wgs/gwas] - whole genome analysis (association analysis) process (Part 2)
深度学习——Bounding Box预测
Self study notes -- use of 74h573
How to quickly delete routing in Ad
Directory of software
Halcon: read the camera and binary it
Account command and account authority
Xiashuo think tank: 28 updates of the planet reported today (including the information of flirting with girls and Han Tuo on Valentine's day)
2021 private equity fund market report (62 pages)
Experiment 1: comprehensive experiment [process on]