当前位置:网站首页>[Intensive reading of the paper] R-CNN's Bounding box regression problem is detailed
[Intensive reading of the paper] R-CNN's Bounding box regression problem is detailed
2022-08-05 05:42:00 【takedachia】
本文就R-CNNIntensive reading of the thesis The prediction box regression of (Bounding box regression)issues are discussed in detail.
R-CNNThe feature vector extracted from the candidate frame,进行分类+偏移预测的并行处理.
Offset prediction is prediction box regression(Bounding box regression)问题,We need to position the generated candidate boxes、Size fine-tuning.
(图摘自b站up“同济子豪兄”的R-CNN论文解读)
We need to think about such a question:Why join this oneRegressionIt can correct the problem of inaccurate positioning?
The prediction problem of the candidate box offset(Bounding box Regression)
因为selective searchThe generated candidate boxes inevitably generate localization errors,So we can perform offset correction on the generated candidate boxes.
Bounding box regression是延续了DPM算法(同一个作者)的设计,It does this by training a linear regression model,Given a set of characteristics(CNN提取的特征),to predict a new detection box,The offset for this new box is thisRegression预测的目标.
This offset is relative to the correct position(如Ground TruthFrame the complete frame of a target)的偏移量,The offset is calculated from a set of offset coefficients,The offset coefficient is learned.
The following is a detailed description of the paper and my thinkingBounding box Regressiondid something.
①为什么一个RegressionCan solve the problem of inaccurate positioning?
Personally, I think it is a very good idea to predict the offset of the candidate frame through modeling.When reading the paper for the first time,I've been thinking about this:Why join this oneRegressionIt can solve the problem of inaccurate positioning?
Just imagine in training,We set the positive sample to be Ground Truth框,It describes a completecat的特征;one of our candidate boxes and GTBoxes partially overlap,It describes an incomplete onecat的特征.So incompletecatBox is relative to fullcatThe frame needs an offset correction.比如下图,The red frame needs to be corrected to the blue frame position.

We assume that the feature vector of a candidate box can infer the relative position information for a certain target,举个例子:上图中的红框,The cat's tail and limbs are not completely framed,Then from the feature vector of the red box, it should be inferred that the features of the cat's tail or limbs are relatively incomplete,Then you need to expand the box to the position where the feature is located,for complete features.In this example, the red box needs to be extended to the upper right to pursue the complete tail of the box,At the same time, it also needs to expand down to pursue the complete limbs in the frame.
注意,The extension is only based on the features of the current red box1次Offset prediction,When offset to the blue frame,In this algorithm, the blue box will not be judged again whether an offset is needed.The author also pointed out1A second offset adjustment is sufficient,Multiple tweaks do not improve performance.
②Model the offset
基于上面的思考,We can convert the feature vector extracted from a candidate box into an offset,To guide the current candidate box to adjust the size and position.
The positioning of a candidate box is generally required4个参数,Commonly used are the coordinates of the center of the frame ( x , y ) (x, y) (x,y)and the height and width of the box ( w , h ) (w, h) (w,h).那么偏移量为 Δ x , Δ y , Δ w , Δ h Δx, Δy, Δw, Δh Δx,Δy,Δw,Δh.
即: f e a t u r e ( 4096 维向量) → Δ x , Δ y , Δ w , Δ h feature(4096维向量)→ Δx, Δy, Δw, Δh feature(4096维向量)→Δx,Δy,Δw,Δh
The author used relative offsets in the text,具体为:
设Ground Truth框 G G G的位置信息、候选框 P P P的位置信息为:
G = ( G x , G y , G w , G h ) P = ( P x , P y , P w , P h ) \begin{gathered} G=\left(G_{x}, G_{y}, G_{w}, G_{h}\right) \\ P=\left(P_{x}, P_{y}, P_{w}, P_{h}\right) \end{gathered} G=(Gx,Gy,Gw,Gh)P=(Px,Py,Pw,Ph)
Our goal is to build G G G和 P P P之间的关系.
established in the text4offset factor t x , t y , t w , t h t_{x}, t_{y}, t_{w}, t_{h} tx,ty,tw,th,as a relative offset:
G x = P w t x + P x G y = P h t y + P y G w = P w exp ( t w ) G h = P h exp ( t h ) . \begin{aligned} {G}_{x} &=P_{w} t_{x}+P_{x} \\ {G}_{y} &=P_{h} t_{y}+P_{y} \\ {G}_{w} &=P_{w} \exp \left(t_{w}\right) \\ {G}_{h} &=P_{h} \exp \left(t_{h}\right) . \end{aligned} GxGyGwGh=Pwtx+Px=Phty+Py=Pwexp(tw)=Phexp(th).
And then at the time of forecasting,这个系数 t x , t y , t w , t h t_{x}, t_{y}, t_{w}, t_{h} tx,ty,tw,thshould consist of candidate boxes P P P推断出来,Then ours should be built from the candidate box P P Pextracted features f e a t u r e ( P ) feature(P) feature(P)和 ( t x , t y , t w , t h ) (t_{x}, t_{y}, t_{w}, t_{h}) (tx,ty,tw,th)的关系.
③建立回归模型
我们假设 f e a t u r e ( P ) feature(P) feature(P)和 ( t x , t y , t w , t h ) (t_{x}, t_{y}, t_{w}, t_{h}) (tx,ty,tw,th)存在线性关系,You can set a weight w w w,This creates a regression equation:
( t ^ x , t ^ y , t ^ w , t ^ h ) = w ∗ f e a t u r e ( P ) (\hat{t}_{x}, \hat{t}_{y}, \hat{t}_{w}, \hat{t}_{h})=w*feature(P) (t^x,t^y,t^w,t^h)=w∗feature(P)
about this regression equation:
- 向量 ( t ^ x , t ^ y , t ^ w , t ^ h ) (\hat{t}_{x}, \hat{t}_{y}, \hat{t}_{w}, \hat{t}_{h}) (t^x,t^y,t^w,t^h)is what we want to predict4offset factor,权重 w w wis the parameter we want to learn.
- 通过②Modeling of medium offsets,A pair is knownGTThe location information of boxes and candidate boxes ( G , P ) (G, P) (G,P)Solvable equations are calculated accordingly ( t x , t y , t w , t h ) (t_{x}, t_{y}, t_{w}, t_{h}) (tx,ty,tw,th),文中设为 t ⋆ t_{\star} t⋆ .
This way we get a lot t ⋆ t_{\star} t⋆和 f e a t u r e ( P ) feature(P) feature(P)的训练数据,weights can be fitted w w w来. - 论文中 f e a t u r e ( P ) feature(P) feature(P)是用CNNThe features extracted by the last pooling layer of the model are used as feature vectors,论文中用 ϕ 5 ( P ) {\phi}_{5}(P) ϕ5(P)表示.
- 论文中,用 d x ( P ) , d y ( P ) , d w ( P ) , d h ( P ) d_{x}(P), d_{y}(P), d_{w}(P), d_{h}(P) dx(P),dy(P),dw(P),dh(P)表示 t ^ x , t ^ y , t ^ w , t ^ h \hat{t}_{x}, \hat{t}_{y}, \hat{t}_{w}, \hat{t}_{h} t^x,t^y,t^w,t^h,represents the regression results availablePThe eigenvector linear representation of .
④回归模型的训练
We know for linear regression models,The loss function is the sum varianceSSE,And seek the best w w wThere is an analytical solution.
对于本例,损失函数可表示为:
w ⋆ = argmin w ^ ⋆ ∑ i N ( t ⋆ i − w ^ ⋆ T ϕ 5 ( P i ) ) 2 \mathbf{w}_{\star}=\underset{\hat{\mathbf{w}}_{\star}}{\operatorname{argmin}} \sum_{i}^{N}\left(t_{\star}^{i}-\hat{\mathbf{w}}_{\star}^{\mathrm{T}} \boldsymbol{\phi}_{5}\left(P^{i}\right)\right)^{2} w⋆=w^⋆argmini∑N(t⋆i−w^⋆Tϕ5(Pi))2
其中:
- i i irepresents each training sample, t ⋆ t_{\star} t⋆the corresponding categoryGTboxes and candidate boxes ( G , P ) (G, P) (G,P)The vector of obtained offset coefficients.
- ϕ 5 ( P ) {\phi}_{5}(P) ϕ5(P)As mentioned above, it is a candidate box P P P经CNNFeatures extracted by the last pooling layer(即4096维向量).
- A regression model is trained for each class,一共20个.
这里,The author added for the loss functionL2The regularization term improves model performance,防止过拟合.作者在文中指出,This regularization term is very important,他将 λ \lambda λ设为了1000.
(在统计学中,This is also a ridge regression problem)
The final function to be optimized is :
w ⋆ = argmin w ^ ⋆ ∑ i N ( t ⋆ i − w ^ ⋆ T ϕ 5 ( P i ) ) 2 + λ ∥ w ^ ⋆ ∥ 2 \mathbf{w}_{\star}=\underset{\hat{\mathbf{w}}_{\star}}{\operatorname{argmin}} \sum_{i}^{N}\left(t_{\star}^{i}-\hat{\mathbf{w}}_{\star}^{\mathrm{T}} \boldsymbol{\phi}_{5}\left(P^{i}\right)\right)^{2}+\lambda\left\|\hat{\mathbf{w}}_{\star}\right\|^{2} w⋆=w^⋆argmini∑N(t⋆i−w^⋆Tϕ5(Pi))2+λ∥w^⋆∥2
We can use the least squares method to find the regression coefficients w w w的解析解.
训练出 w w wThe offset regression prediction can be performed
⑤Bounding box RegressionApplicable conditions and precautions
The regression model we built above is based on the assumption that there is a linear relationship between the eigenvectors and the offset coefficients.The linear problem is a simple one,has its own scope of application.
并且,Our regression problem solves the candidate box orientationGT的微调.If too many features are lost in the candidate box,It is also difficult to adjust.
This determines in actual training,The candidate box cannot be separatedGTbox too far.
作者指出了IoU必须大于0.6The candidate boxes can be used to train the regressor.
此外,在①中已提到,Only for candidate boxes1次回归,当调整完毕后,No more regressions will be performed on the adjusted box.Because the authors found that multiple fine-tuning did not improve the model accuracy.
最后如下图所示,在推理阶段,This returns the head with the previous oneSVMHeads are parallel,同时计算,The classification sum of the final detection frame is obtained(微调后的)定位.
(图摘自b站up“同济子豪兄”的R-CNN论文解读)
边栏推荐
- A deep learning code base for Xiaobai, one line of code implements 30+ attention mechanisms.
- CVPR2020 - 自校准卷积
- spingboot 容器项目完成CICD部署
- Tensorflow steps on the pit notes and records various errors and solutions
- CVPR2021 - Inception Convolution with Efficient Dilation Search
- 用GAN的方法来进行图片匹配!休斯顿大学提出用于文本图像匹配的对抗表示学习,消除模态差异!
- 面向小白的深度学习代码库,一行代码实现30+中attention机制。
- 初识机器学习
- 原来何恺明提出的MAE还是一种数据增强
- flink yarn-session的两种使用方式
猜你喜欢

MSRA提出学习实例和分布式视觉表示的极端掩蔽模型ExtreMA

华科提出首个用于伪装实例分割的一阶段框架OSFormer

基于STM32F407的一个温度传感器报警系统(用的是DS18B20温度传感器,4针0.96寸OLED显示屏,并且附带日期显示)

vscode要安装的插件

【数据库和SQL学习笔记】4.SELECT查询2:排序(ORDER BY)、聚合函数、分组查询(GROUP BY)
![[Pytorch study notes] 10. How to quickly create your own Dataset dataset object (inherit the Dataset class and override the corresponding method)](/img/71/f82e76085f9d8e6610f6f817e2148f.png)
[Pytorch study notes] 10. How to quickly create your own Dataset dataset object (inherit the Dataset class and override the corresponding method)

Pandas(五)—— 分类数据、读取数据库

读论文-Cycle GAN

【数据库和SQL学习笔记】5.SELECT查询3:多表查询、连接查询

【Pytorch学习笔记】11.取Dataset的子集、给Dataset打乱顺序的方法(使用Subset、random_split)
随机推荐
el-pagination分页分页设置
物联网:LoRa无线通信技术
[Remember 1] June 29, 2022 Brother and brother double pain
SharedPreferences和SQlite数据库
【Multisim仿真】直流稳压电源设计报告
【论文精读】ROC和PR曲线的关系(The relationship between Precision-Recall and ROC curves)
Machine Learning (1) - Machine Learning Fundamentals
【22李宏毅机器学习】课程大纲概述
Mysql-连接https域名的Mysql数据源踩的坑
记我的第一篇CCF-A会议论文|在经历六次被拒之后,我的论文终于中啦,耶!
大型Web网站高并发架构方案
读论文-Cycle GAN
IDEA 配置连接数据库报错 Server returns invalid timezone. Need to set ‘serverTimezone‘ property.
【论文阅读-表情捕捉】High-quality Real Time Facial Capture Based on Single Camera
Detailed explanation of BroadCast Receiver (broadcast)
【论文精读】Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation(R-CNN)
SharedPreferences and SQlite database
关于使用QML的MediaPlayer实现视频和音频的播放时遇到的一些坑
[Database and SQL study notes] 10. (T-SQL language) functions, stored procedures, triggers
Service