当前位置:网站首页>Positive and negative sample division and architecture understanding in image classification and target detection
Positive and negative sample division and architecture understanding in image classification and target detection
2022-07-03 10:01:00 【Star soul is not a dream】
understand Deep learning with supervised learning Of The key Lies in Reasoning and Training Phase in Separate , A model can be understood by understanding the reasoning of various deep neural network architectures and the operation in the training stage .
The reasoning stage Yes, it will Model As a similar Nonlinear function of black box , For example, through the combination of various convolution modules, a backbone, Output what you want shape Tensor , Do it again post-processing .
Training phase Is the need to Divide positive and negative samples , then According to the task To design a Loss function , Use optimization algorithms such as SGD Update the neurons in an iterative way weight and bias, The goal of optimization is to minimize the loss function , So the trained model can Fit the training set .
We can usually put All Neural Networks With Encoder - decoder Understand the architecture of .
Image classification :
- The reasoning stage : Input as image , And then there was Encoder ( Such as CNN) Encode as tensor , It's usually W/H Reduce x times , And the number of channels C increase y times , Encoded into a new tensor (W/x, H/x, yC). And then there was decoder , Join in FC、softmax etc. . Of course , Can also be softmax All the previous understanding is Encoder , hold softmax Understood as a decoder .
- Training phase : The same as the reasoning stage , But is softmax Output vector Need and The labeled label calculates the cross entropy loss ( Commonly used ), So as to back propagate updates softmax All before weight and bias.
object detection :
- The reasoning stage : Target detection is more complex , Generally speaking , The architecture of target detection is Backbone + Neck + Detection head. Interestingly The name , trunk And then there was Neck And finally Detection head of decision .Backbone Often We are Large image classification data set Pre training model for training on ( Encoder for image classification ), This is because Annotation of classification problems Cheaper , However, the features extracted from the two tasks of the network can be used universally , Therefore, it is an idea of transfer learning .Neck yes Backbone Some of the output tensors Feature fusion operation , Get better combination features to It is suitable for the detection of targets of different sizes .Detection head yes Neck The tensor after fusion is operated , Output what you want shape Tensor . And finally post-processing , Delete a part of the vector according to the threshold , And then use NMS Remove redundant borders .
Of course , We can Backbone + Neck as Encoder ,Detection head as decoder . Be careful : Some architectures may not Neck , Such as YOLO v1, So it will bring performance loss .
Backbone + Neck + Detection head Our architecture allows us to design individual modules separately , Then we can construct different target detection models by replacing .
2. Training phase :
The core of the training phase is The design of loss function .Detection head The output tensor and the loss of label annotation , So as to update the network . therefore , This part does not cover the above post-processing . The key here is The choice of positive and negative samples , To calculate the loss .
stay Image classification task in Positive sample yes This kind All labeled images , The negative sample is Other categories All images . Network input positive sample image , then Predicted value and label vector 1 Where to seek loss , So the predicted value will become larger , Thus reducing losses , because softmax constraint , Then the other values of the prediction vector will become smaller ; When the network inputs the negative sample image of a class at present , The predicted value of the class to which the image belongs will become larger , Other values will also become smaller . therefore , For image classification , We don't need to pay attention to the division of positive and negative samples , Because by Labeled one-hot code , Naturally, it is equivalent to distinguishing positive and negative samples .
Target detection task in , Enter an image , Unlike image classification , Units of positive and negative samples No longer an image , and Is an area in an image , therefore An image has multiple positive and negative samples , Although the size of these areas is smaller than the image in image classification , But because of the huge number , So compared with target detection slow More . that How to get these areas ( sample )? How to divide so many areas into positive and negative samples ? These are two important questions . The former : A common practice is anchor based To get these areas , Some generated on small pieces of each image A priori box anchor It's a sample . the latter : Commonly used are based on and Real box Of IOU To divide positive and negative samples , Different algorithms have different strategies . If anchor Divided into positive samples , So right. This positive sample Conduct Return to You can get Prediction box , Then the prediction box can participate in the loss function Calculation of positioning loss , Prediction box and Real box Calculated distance .
Notice that there are three kinds of boxes :
- Real box
- A priori box anchor
- Prediction box
Sum up , In target detection Positive samples are not Real box , The real dimension box is the goal of optimization , just as Image classification Medium one-hot Encoded vector equally . Positive sample Those who choose Partial a priori box anchor, just as Image classification Medium An image of a class . And through the model A priori box anchor And what you get is Prediction box , just as Image classification Medium Prediction vector , therefore Prediction box and real box Loss. Of course , image yolov1 did not anchor, So there are some differences .
Backbone + Neck + Detection head modular :
- Input: Image, Patches, Image Pyramid
- Backbones: VGG16, ResNet-50, SpineNet , EffificientNet-B0/B7, CSPResNeXt50, CSPDarknet53, swin transformer
- Neck:
- Additional blocks: SPP, ASPP, RFB , SAM
- Path-aggregation blocks: FPN, PAN, NAS-FPN, Fully-connected FPN, BiFPN, ASFF, SFAM
- Heads:
- Dense Prediction (one-stage):
- RPN, SSD, YOLO(v2-v5), RetinaNet (anchor based)
- YOLOv1, CornerNet, CenterNet, MatrixNet, FCOS(anchor free)
- Sparse Prediction (two-stage):
- Faster R-CNN, R-FCN, Mask R-CNN(anchor based)
- RepPoints(anchor free)
- Dense Prediction (one-stage):
notes : It comes from yolov4 The paper .
Part of the Positive and negative sample division strategy , Please refer to :
Target detection Anchor And Loss Sort out the calculation | Scavenging records
anchor Generation method , Please refer to :
Anchor frame (anchor box) Understanding and code implementation - You know
Reference resources :
边栏推荐
- Project cost management__ Plan value_ Earned value_ Relationship among actual cost and Countermeasures
- El table X-axis direction (horizontal) scroll bar slides to the right by default
- Raspberry pie installation SciPy
- 4G module initialization of charge point design
- Embedded systems are inherently flawed. Compared with the Internet, there are so many holes that it is simply difficult to walk away from
- NR PUCCH format0 sequence generation and detection mechanism
- For new students, if you have no contact with single-chip microcomputer, it is recommended to get started with 51 single-chip microcomputer
- getopt_ Typical use of long function
- Mysql database underlying foundation column
- 2021-10-27
猜你喜欢

开学实验里要用到mysql,忘记基本的select语句怎么玩啦?补救来啦~

El table X-axis direction (horizontal) scroll bar slides to the right by default

Crash工具基本使用及实战分享

You need to use MySQL in the opening experiment. How can you forget the basic select statement? Remedy is coming~

STM32 port multiplexing and remapping

JMX、MBean、MXBean、MBeanServer 入门

Fundamentals of Electronic Technology (III)_ Integrated operational amplifier and its application__ Basic arithmetic circuit
![[untitled] proteus simulation of traffic lights based on 89C51 Single Chip Microcomputer](/img/90/4de927e797ec9c2bb70e507392bed0.jpg)
[untitled] proteus simulation of traffic lights based on 89C51 Single Chip Microcomputer

手机都算是单片机的一种,只不过它用的硬件不是51的芯片

Timer and counter of 51 single chip microcomputer
随机推荐
getopt_ Typical use of long function
Project cost management__ Topic of comprehensive calculation
端午节快乐!—— canvas写的粽子~~~~~
In third tier cities and counties, it is difficult to get 10K after graduation
Code word in NR
Project cost management__ Cost management technology__ Article 6 prediction
SCM is now overwhelming, a wide variety, so that developers are overwhelmed
STM32 interrupt priority management
Project scope management__ Scope management plan and scope specification
STM32 interrupt switch
没有多少人能够最终把自己的兴趣带到大学毕业上
2020-08-23
El table X-axis direction (horizontal) scroll bar slides to the right by default
单片机学到什么程度能找到工作,这个标准不好量化
The new series of MCU also continues the two advantages of STM32 product family: low voltage and energy saving
Basic knowledge of MySQL database (an introduction to systematization)
PIP references domestic sources
Uniapp realizes global sharing of wechat applet and custom sharing button style
Synchronization control between tasks
My 4G smart charging pile gateway design and development related articles