当前位置:网站首页>Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers
Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers
2022-07-28 22:49:00 【Artificial Intelligence Algorithm Research Institute】
front said : As the current advanced deep learning target detection algorithm YOLOv5, A large number of trick, But there is still room for improvement , For the detection difficulties in specific application scenarios , There are different ways to improve . Subsequent articles , Focus on YOLOv5 How to improve is introduced in detail , The purpose is to provide their own meager help and reference for those who need innovation in scientific research or friends who need to achieve better results in engineering projects .
solve the problem :YOLOv5 The backbone feature extraction network is CNN The Internet ,CNN It has translation invariance and locality , Lack the ability of global modeling and long-distance modeling , Introduce the framework of natural language processing Transformer To form CNN+Transformer framework , Take advantage of both , Improve the effect of target detection , I have passed the experiment , It will have a certain improvement effect on small targets and intensive prediction tasks .
principle :
Author's unit :UC Berkeley, Google
The paper :https://arxiv.org/abs/2101.1160
https://link.zhihu.com/?target=https%3A//arxiv.org/abs/2101.11605
GitHub:https://github.com/leaderj1001/BottleneckTransformers
BoTNet It is a simple but powerful backbone, This architecture integrates self attention into a variety of computer vision tasks , Including image classification , Object detection and instance segmentation . By only ResNet The last three bottleneck blocks Replace spatial convolution with global self attention , And make no other changes , The baseline has been significantly improved in target detection , At the same time, the parameters are also reduced , This minimizes latency .
Transformer Medium MHSA and BoTNet Medium MHSA The difference between :
normalization ,Transformer Use Layer Normalization, and BoTNet Use Batch Normalization.
Nonlinear activation ,Transformer Only one nonlinear activation is used in FPN block Module ,BoTNet Used 3 A nonlinear activation .
Output projection ,Transformer Medium MHSA Contains an output projection ,BoTNet There is no .
Optimizer ,Transformer Use Adam Optimizer training ,BoTNet Use sgd+ momentum
Fang Law :
Step 1 modify common.py, increase CTR3 modular .
class CTR3(nn.Module):
# CSP Bottleneck with 3 convolutions
def __init__(self, c1, c2, n=1, e=0.5, e2=1, w=20, h=20): # ch_in, ch_out, number, , expansion,w,h
super(CTR3, self).__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2)
self.m = nn.Sequential(
*[BottleneckTransformer(c_, c_, stride=1, heads=4, mhsa=True, resolution=(w, h), expansion=e2) for _ in
range(n)])
# self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])
def forward(self, x):
# print("CTR3-INPUT:",x.shape)
# return self.cv3
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))The second step : take yolo.py Register in CTR3 modular .
if m in [Conv,MobileNetV3_InvertedResidual,ShuffleNetV2_InvertedResidual,ghostc3,DepthSepConv,CTR3
]:The third step : Make changes yaml file
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, CTR3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
junction fruit : I have done a lot of experiments on multiple data sets , For different data sets, the effect is different , And add in different places , There will be some differences .
Let me know : The next article will continue to share other Transformer Integration of modules . Interested friends can pay attention to me , If you have questions, you can leave a message or chat with me in private
PS:Transformer Not just for improvement YOLOv5, You can also improve others YOLO Network and target detection network , such as YOLOv3、v4、v6、v7 etc. .
Last , I hope I can powder each other , Be a friend , Learn and communicate together .
边栏推荐
- 1e3是浮点数?
- 842. Arrange numbers
- Padim [anomaly detection: embedded based]
- Ngx+sql environment offline installation log (RPM installation)
- DIP-VBTV: Color Image Restoration Model Combining Deep Image Prior and Vector Bundle Total Variation
- Winserver operation and maintenance technology stack
- es学习目录
- 770. 单词替换
- Concise history of graphic technology
- (重要)初识C语言 -- 函数
猜你喜欢

Using PCL to batch display PCD point cloud data flow

GD32F303固件库开发(10)----双ADC轮询模式扫描多个通道
![[connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly](/img/ab/624e9a3240416f8445c908378310ad.png)
[connect set-top box] - use ADB command line to connect ec6108v9 Huawei Yuehe box wirelessly
![MKD [anomaly detection: knowledge disruption]](/img/15/10f5c8d6851e94dac764517c488dbc.png)
MKD [anomaly detection: knowledge disruption]

LTE cell search process and sch/bch design

OSV_ q AttributeError: ‘numpy. ndarray‘ object has no attribute ‘clone‘

STM32 - reset and clock control (cubemx for clock configuration)

Stm32subeide (10) -- ADC scans multiple channels in DMA mode
![Padim [anomaly detection: embedded based]](/img/11/834d8b4fdd39959a9dd380e179d317.png)
Padim [anomaly detection: embedded based]

php二维数组如何删除去除第一行元素
随机推荐
STM32 - Communication
ES6 concept
Es learning directory
Gd32f303 firmware library development (10) -- dual ADC polling mode scanning multiple channels
Att & CK Threat Intelligence
JS get the current time (year month day hour minute second)
Bluetooth smart Bracelet system based on STM32 MCU
STM32CUBEIDE(10)----ADC在DMA模式下扫描多个通道
Configuration and official document of Freia library [tips]
OSV_ Q write divergence operator div and Laplace stepped on the pit
Paper reading vision gnn: an image is worth graph of nodes
hp proliant dl380从U盘启动按哪个键
Evaluation index of anomaly detection: rocauc et al. [tips]
The function of wechat applet to cut pictures
B站713故障后的多活容灾建设|TakinTalks大咖分享
C语言学习内容总结
Vscode ROS configuration GDB debugging error record
Padim [anomaly detection: embedded based]
STM32 - memory, I2C protocol
console.log()控制台显示...解决办法