当前位置:网站首页>"The core concept of" image classification and target detection in the positive and negative samples and understanding architecture
"The core concept of" image classification and target detection in the positive and negative samples and understanding architecture
2022-07-31 19:36:00 【qq_43479892】
优质资源分享
学习路线指引(点击解锁) | 知识定位 | 人群定位 |
---|---|---|
🧡 Python实战微信订餐小程序 🧡 | 进阶级 | 本课程是python flask+微信小程序的完美结合,从项目搭建到腾讯云部署上线,打造一个全栈订餐系统. |
Python量化交易实战 | 入门级 | 手把手带你打造一个易扩展、更安全、效率更高的量化交易系统 |
1.前言
The key to understanding supervised deep learning is to apply 推理 和 训练 The stages are separated,分别理解各种深度神经网络架构的推理和训练阶段的操作就可以理解某个模型.
The model we define is equivalent to a complex collection of nonlinear functions,Optimization methods using supervised learning(如SGD),We can optimize one in this function set complex nonlinear functions.对于分类问题,Through this function, linear inseparable features can be input,Converted to linearly separable features.对于回归问题,The function is a learned high-dimensional input feature-to-output mapping.
推理阶段is to treat the model as a nonlinear function similar to a black box,比如通过各种卷积模块的组合构成一个backbone,输出想要的shape的张量,Do post-processing.
训练阶段是需要划分正负样本
,Then design a loss function according to the task,使用优化算法如SGD以迭代的方式更新神经元的weight和bias,优化的目标是最小化损失函数,So the trained model can fit the training set.This brings up the trained model过拟合问题,也就是所谓的泛化问题.Much deep learning work is addressing this problem,This is indeed a good question worth looking into,I will sort out this issue in full later.
We can usually treat all neural networks with 编码器-解码器
的架构进行理解.
2. 图像分类
- 推理阶段:Enter as a small batch 的图像, Then there is the encoder(如CNN)进行编码为张量,一般是 W/H 减小 x 倍, 而通道数 C 增加 y 倍, 编码成新的张量 (bs,W/x, H/x, yC).然后是 解码器 ,加入 FC、softmax 等.当然,也可以将 softmax 之前的全部理解为 编码器, 把 softmax 理解为 解码器.
- 训练阶段:和推理阶段一样,不过是softmax输出的 向量 需要和 Annotated label calculation
交叉熵损失
(常用),从而反向传播更新 softmax 之前的全部weight和bias.
正负样本
在图像分类任务 中正样本
are all images of that class,负样本
are all images of other classes.网络输入正样本图像, Then the predicted value and label vector for that location 1 的地方求损失, 所以预测值会变大, 从而降低损失,由于 softmax 约束, 那么预测向量的其他值会变小;同理,对于当前类,Other categories of images are negative samples,The output probability will go 0 进行优化.所以,对于图像分类来说,我们并不需要关注正负样本的划分,因为通过 标签的one-hot 编码,自然的相当于区分了正负样本.
但是,This also brings a fatal problem:The above methods are trained in a closed training set,Inputs can only be classified into defined categories,Although the results can be filtered by the card threshold method,However, some negative samples are completely different from the defined categories but may have a high probability.这种问题叫做:Open Domain Identification Problem.
3. 目标检测
Object detection is a slightly complicated problem,Because generally it contains 目标的定位(回归任务)、目标分类(分类任务)和置信度(回归).This makes the architecture of object detection more complex,一般来说,目标检测的架构为Backbone + Neck + Detection head.有趣的是 这个命名, 躯干 然后是 脖子 最后是 决策的检测头.
- 推理阶段:
- Backbone 常为 We work on a large image classification dataset(如ImageNet)上进行训练的预训练模型(图像分类的编码器),这是因为 分类问题的标注 更加便宜,而网络在两个任务上的提取的特征却可以通用,因此是一种迁移学习的思想.
- Neck 是 Backbone 输出的张量的一些 特征融合操作,Get a better combination of features to adapt toDetection of objects of different sizes.
- Detection head 是对 Neck 融合后的张量的进行操作,输出想要的shape的张量.最后是后处理,根据阈值删除一部分向量,然后使用NMS去除冗余的边框.
当然,我们可以将 Backbone + Neck as an encoder,Detection head as a decoder.注意:可能有的架构并没有 Neck , 如 YOLO v1,所以会带来性能损失.
Backbone + Neck + Detection head 的架构让我们可以分别设计单个模块, 然后进行替换即可构造不同的目标检测模型.(当然,A lot of people just write essays like this,简称缝合怪!)
- 训练阶段:
训练阶段的核心在于 损失函数的设计.Detection head 输出的张量与标签标注的求损失,从而去更新网络.所以,这部分并不涉及上面的 后处理. 这里的关键在于 正负样本的选择 ,从而来计算损失.
目标检测任务中,输入一张图像,和图像分类不同的是,The unit of positive and negative samples is no longer an image,Rather, it is an area within an image,So an image has multiple positive and negative samples,虽然这些区域的大小比图像分类中的图像要小,但是由于数量巨多,So object detection is much slower than image classification.
So how to get these areas(样本)?如何把这么多的区域分为正负样本?
这是两个重要的问题.前者:一种常用的做法是 anchor based 的方法来得到这些区域,Some prior boxes generated on patches of each image anchor 就是样本. 后者:常用的是基于和 真实标注框 的IOU来划分正负样本, 不同的算法策略不同.
So what is the use of divided positive and negative samples?
The divided positive and negative samples are in when training the modelis used to determine which losses are calculated.如果anchor 划分为正样本, Then the prediction box can be obtained by regressing the positive sample,Then the prediction box can participate in the calculation of the positioning loss in the loss function.正样本需要计算 边框回归损失,置信度损失和分类损失.如果anchor 划分为负样本, Then you only need to calculate the confidence loss for the negative sample.通过这种方式,Models are different anchor A confidence level can be given,The confidence of the target will generally get a larger probability.所以,Which regions the model has learned is the target.
注意这里有三种框:
- 真实标注框
- 先验框anchor
- 预测框
综上,目标检测中的正样本并不是真实标注框.正如图像分类中的 one-hot 编码的向量一样,真实标注框是优化的目标.正如图像分类中的 某个类的图像,正样本是那些选择的部分先验框anchor.正如图像分类中的预测向量,而Regress the prior box through the modelanchor得到的结果是预测框.所以预测框和真实框求Loss.当然,像 yolov1 并没有 anchor,所以有部分不同.
YOLOv4 The entire architecture is summarized in :
Backbone + Neck + Detection head 模块:
- Input: Image, Patches, Image Pyramid
- Backbones: VGG16, ResNet-50, SpineNet , EffificientNet-B0/B7, CSPResNeXt50, CSPDarknet53, swin transformer
- Neck:
- Additional blocks: SPP, ASPP, RFB , SAM
- Path-aggregation blocks: FPN, PAN, NAS-FPN, Fully-connected FPN, BiFPN, ASFF, SFAM
- Heads:
- Dense Prediction (one-stage):
- RPN, SSD, YOLO(v2-v5), RetinaNet (anchor based)
- YOLOv1, CornerNet, CenterNet, MatrixNet, FCOS(anchor free)
- Sparse Prediction (two-stage):
- Faster R-CNN, R-FCN, Mask R-CNN(anchor based)
- RepPoints(anchor free)
- Dense Prediction (one-stage):
参考:
一部分 正负样本划分策略:https://murphypei.github.io/blog/2020/10/anchor-loss.html
anchor 生成方法:https://zhuanlan.zhihu.com/p/450451509
YOLOv4 论文:https://arxiv.org/abs/2004.10934
边栏推荐
- Linux环境redis集群搭建「建议收藏」
- 返回一个零长度的数组或者空的集合,不要返回null
- 第七章
- Qualcomm cDSP simple programming example (to query Qualcomm cDSP usage, signature), RK3588 npu usage query
- 基于WPF重复造轮子,写一款数据库文档管理工具(一)
- leetcode:6135. 图中的最长环【内向基环树 + 最长环板子 + 时间戳】
- 基于STM32 环形队列来实现串口接收数据
- The new telecom "routine", my dad was tricked!
- leetcode 665. Non-decreasing Array
- rj45对接头千兆(百兆以太网接口定义)
猜你喜欢
Poker Game in C# -- Introduction and Code Implementation of Blackjack Rules
全网一触即发,自媒体人的内容分发全能助手——融媒宝
Shell script quick start to actual combat -02
PCB stackup design
How can we improve the real yourself, become an excellent architect?
Short-circuit characteristics and protection of SiC MOSFETs
手把手教你学会部署Nestjs项目
ThreadLocal
程序员如何学习开源项目,这篇文章告诉你
MySQL---运算符
随机推荐
请问我的这段sql中sql语法哪里出了错
MySQL - multi-table query
如何才能真正的提高自己,成为一名出色的架构师?
ResNet的基础:残差块的原理
idea中搜索具体的字符内容的快捷方式
财务盈利、偿债能力指标
【Yugong Series】July 2022 Go Teaching Course 023-List of Go Containers
Qualcomm cDSP simple programming example (to query Qualcomm cDSP usage, signature), RK3588 npu usage query
MySQL---创建和管理数据库和数据表
架构实战营模块八作业
Bika LIMS open source LIMS set - use of SENAITE (detection process)
嵌入式开发没有激情了,正常吗?
统计UTF-8字符串中的字符函数
常用的安全渗透测试工具(渗透测试工具)
<artifactId>ojdbc8</artifactId>「建议收藏」
MySQL---operator
MySQL---多表查询
深度学习中的batch(batch size,full batch,mini batch, online learning)、iterations与epoch
使用 Flutter 和 Firebase 制作!计数器应用程序
Arduino框架下STM32全系列开发固件安装指南