当前位置：网站首页>"The core concept of" image classification and target detection in the positive and negative samples and understanding architecture

"The core concept of" image classification and target detection in the positive and negative samples and understanding architecture

2022-07-31 19:36:00 【qq_43479892】

优质资源分享

学习路线指引（点击解锁）	知识定位	人群定位
🧡 Python实战微信订餐小程序 🧡	进阶级	本课程是python flask+微信小程序的完美结合,从项目搭建到腾讯云部署上线,打造一个全栈订餐系统.
Python量化交易实战	入门级	手把手带你打造一个易扩展、更安全、效率更高的量化交易系统

1.前言

The key to understanding supervised deep learning is to apply 推理和训练 The stages are separated,分别理解各种深度神经网络架构的推理和训练阶段的操作就可以理解某个模型.

The model we define is equivalent to a complex collection of nonlinear functions,Optimization methods using supervised learning(如SGD),We can optimize one in this function set complex nonlinear functions.对于分类问题,Through this function, linear inseparable features can be input,Converted to linearly separable features.对于回归问题,The function is a learned high-dimensional input feature-to-output mapping.

推理阶段is to treat the model as a nonlinear function similar to a black box,比如通过各种卷积模块的组合构成一个backbone,输出想要的shape的张量,Do post-processing.

训练阶段是需要划分正负样本,Then design a loss function according to the task,使用优化算法如SGD以迭代的方式更新神经元的weight和bias,优化的目标是最小化损失函数,So the trained model can fit the training set.This brings up the trained model过拟合问题,也就是所谓的泛化问题.Much deep learning work is addressing this problem,This is indeed a good question worth looking into,I will sort out this issue in full later.

We can usually treat all neural networks with 编码器-解码器 的架构进行理解.

2. 图像分类

推理阶段：Enter as a small batch 的图像, Then there is the encoder（如CNN）进行编码为张量,一般是 W/H 减小 x 倍, 而通道数 C 增加 y 倍, 编码成新的张量 (bs,W/x, H/x, yC).然后是解码器 ,加入 FC、softmax 等.当然,也可以将 softmax 之前的全部理解为编码器, 把 softmax 理解为解码器.
训练阶段：和推理阶段一样,不过是softmax输出的向量需要和 Annotated label calculation交叉熵损失（常用）,从而反向传播更新 softmax 之前的全部weight和bias.

正负样本

在图像分类任务中正样本are all images of that class,负样本are all images of other classes.网络输入正样本图像, Then the predicted value and label vector for that location 1 的地方求损失, 所以预测值会变大, 从而降低损失,由于 softmax 约束, 那么预测向量的其他值会变小;同理,对于当前类,Other categories of images are negative samples,The output probability will go 0 进行优化.所以,对于图像分类来说,我们并不需要关注正负样本的划分,因为通过标签的one-hot 编码,自然的相当于区分了正负样本.

但是,This also brings a fatal problem：The above methods are trained in a closed training set,Inputs can only be classified into defined categories,Although the results can be filtered by the card threshold method,However, some negative samples are completely different from the defined categories but may have a high probability.这种问题叫做：Open Domain Identification Problem.

3. 目标检测

Object detection is a slightly complicated problem,Because generally it contains 目标的定位(回归任务)、目标分类(分类任务)和置信度（回归）.This makes the architecture of object detection more complex,一般来说,目标检测的架构为Backbone + Neck + Detection head.有趣的是这个命名, 躯干然后是脖子最后是决策的检测头.

推理阶段：

Backbone 常为 We work on a large image classification dataset（如ImageNet）上进行训练的预训练模型（图像分类的编码器）,这是因为分类问题的标注更加便宜,而网络在两个任务上的提取的特征却可以通用,因此是一种迁移学习的思想.
Neck 是 Backbone 输出的张量的一些 特征融合操作,Get a better combination of features to adapt toDetection of objects of different sizes.
Detection head 是对 Neck 融合后的张量的进行操作,输出想要的shape的张量.最后是后处理,根据阈值删除一部分向量,然后使用NMS去除冗余的边框.

当然,我们可以将 Backbone + Neck as an encoder,Detection head as a decoder.注意：可能有的架构并没有 Neck , 如 YOLO v1,所以会带来性能损失.

Backbone + Neck + Detection head 的架构让我们可以分别设计单个模块, 然后进行替换即可构造不同的目标检测模型.（当然,A lot of people just write essays like this,简称缝合怪！）

训练阶段：

训练阶段的核心在于 损失函数的设计.Detection head 输出的张量与标签标注的求损失,从而去更新网络.所以,这部分并不涉及上面的 后处理. 这里的关键在于 正负样本的选择 ,从而来计算损失.

目标检测任务中,输入一张图像,和图像分类不同的是,The unit of positive and negative samples is no longer an image,Rather, it is an area within an image,So an image has multiple positive and negative samples,虽然这些区域的大小比图像分类中的图像要小,但是由于数量巨多,So object detection is much slower than image classification.

So how to get these areas（样本）？如何把这么多的区域分为正负样本？

这是两个重要的问题.前者：一种常用的做法是 anchor based 的方法来得到这些区域,Some prior boxes generated on patches of each image anchor 就是样本. 后者：常用的是基于和真实标注框的IOU来划分正负样本, 不同的算法策略不同.

So what is the use of divided positive and negative samples？

The divided positive and negative samples are in when training the modelis used to determine which losses are calculated.如果anchor 划分为正样本, Then the prediction box can be obtained by regressing the positive sample,Then the prediction box can participate in the calculation of the positioning loss in the loss function.正样本需要计算边框回归损失,置信度损失和分类损失.如果anchor 划分为负样本, Then you only need to calculate the confidence loss for the negative sample.通过这种方式,Models are different anchor A confidence level can be given,The confidence of the target will generally get a larger probability.所以,Which regions the model has learned is the target.

注意这里有三种框：

真实标注框
先验框anchor
预测框

综上,目标检测中的正样本并不是真实标注框.正如图像分类中的 one-hot 编码的向量一样,真实标注框是优化的目标.正如图像分类中的某个类的图像,正样本是那些选择的部分先验框anchor.正如图像分类中的预测向量,而Regress the prior box through the modelanchor得到的结果是预测框.所以预测框和真实框求Loss.当然,像 yolov1 并没有 anchor,所以有部分不同.

YOLOv4 The entire architecture is summarized in ：

Backbone + Neck + Detection head 模块：

Input: Image, Patches, Image Pyramid
Backbones: VGG16, ResNet-50, SpineNet , EffificientNet-B0/B7, CSPResNeXt50, CSPDarknet53, swin transformer
Neck:
- Additional blocks: SPP, ASPP, RFB , SAM
- Path-aggregation blocks: FPN, PAN, NAS-FPN, Fully-connected FPN, BiFPN, ASFF, SFAM
Heads:
- Dense Prediction (one-stage):
  - RPN, SSD, YOLO(v2-v5), RetinaNet (anchor based)
  - YOLOv1, CornerNet, CenterNet, MatrixNet, FCOS(anchor free)
- Sparse Prediction (two-stage):
  - Faster R-CNN, R-FCN, Mask R-CNN(anchor based)
  - RepPoints(anchor free)