当前位置：网站首页>Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers

Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers

2022-07-28 22:49:00 【Artificial Intelligence Algorithm Research Institute】

 front said ： As the current advanced deep learning target detection algorithm YOLOv5, A large number of trick, But there is still room for improvement , For the detection difficulties in specific application scenarios , There are different ways to improve . Subsequent articles , Focus on YOLOv5 How to improve is introduced in detail , The purpose is to provide their own meager help and reference for those who need innovation in scientific research or friends who need to achieve better results in engineering projects .

solve the problem ：YOLOv5 The backbone feature extraction network is CNN The Internet ,CNN It has translation invariance and locality , Lack the ability of global modeling and long-distance modeling , Introduce the framework of natural language processing Transformer To form CNN+Transformer framework , Take advantage of both , Improve the effect of target detection , I have passed the experiment , It will have a certain improvement effect on small targets and intensive prediction tasks .

principle ：

Author's unit ：UC Berkeley, Google
The paper ：https://arxiv.org/abs/2101.1160https://link.zhihu.com/?target=https%3A//arxiv.org/abs/2101.11605

GitHub：https://github.com/leaderj1001/BottleneckTransformers

BoTNet It is a simple but powerful backbone, This architecture integrates self attention into a variety of computer vision tasks , Including image classification , Object detection and instance segmentation . By only ResNet The last three bottleneck blocks Replace spatial convolution with global self attention , And make no other changes , The baseline has been significantly improved in target detection , At the same time, the parameters are also reduced , This minimizes latency .

Transformer Medium MHSA and BoTNet Medium MHSA The difference between ：

normalization ,Transformer Use Layer Normalization, and BoTNet Use Batch Normalization.
Nonlinear activation ,Transformer Only one nonlinear activation is used in FPN block Module ,BoTNet Used 3 A nonlinear activation .
Output projection ,Transformer Medium MHSA Contains an output projection ,BoTNet There is no .
Optimizer ,Transformer Use Adam Optimizer training ,BoTNet Use sgd+ momentum

Fang Law ：

Step 1 modify common.py, increase CTR3 modular .

class CTR3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, e=0.5, e2=1, w=20, h=20):  # ch_in, ch_out, number, , expansion,w,h
        super(CTR3, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)
        self.m = nn.Sequential(
            *[BottleneckTransformer(c_, c_, stride=1, heads=4, mhsa=True, resolution=(w, h), expansion=e2) for _ in
              range(n)])
        # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])

    def forward(self, x):
        # print("CTR3-INPUT:",x.shape)
        # return self.cv3
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

The second step ： take yolo.py Register in CTR3 modular .

if m in [Conv,MobileNetV3_InvertedResidual,ShuffleNetV2_InvertedResidual,ghostc3,DepthSepConv,CTR3
]:

The third step ： Make changes yaml file

backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, CTR3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

junction fruit ： I have done a lot of experiments on multiple data sets , For different data sets, the effect is different , And add in different places , There will be some differences .

Let me know ： The next article will continue to share other Transformer Integration of modules . Interested friends can pay attention to me , If you have questions, you can leave a message or chat with me in private

PS：Transformer Not just for improvement YOLOv5, You can also improve others YOLO Network and target detection network , such as YOLOv3、v4、v6、v7 etc. .

Last , I hope I can powder each other , Be a friend , Learn and communicate together .

原网站

版权声明
本文为[Artificial Intelligence Algorithm Research Institute]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130600560772.html

当前位置：网站首页>Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers

Improvement 17 of yolov5: cnn+transformer -- integrating bottleneck transformers

边栏推荐

猜你喜欢

随机推荐