当前位置：网站首页>Pytorch structure reparameterization repvggblock

Pytorch structure reparameterization repvggblock

2022-07-25 00:40:00 【Hebi tongzj】

stay ShuffleNet v2 Proposed the lightweight network 4 Large design criteria ：

When the input and output channels are the same ,MAC Minimum
FLOPs Phase at the same time , If the number of packets is too large, the packet convolution will increase MAC
Fragmentation operation ( Multi branch structure ) Unfriendly to parallel acceleration
The memory and time-consuming brought by element by element operation can not be ignored

In recent years , The structure of convolutional neural network has become more and more complex ; Thanks to the good convergence ability of multi branch structure , Multi branch structures are becoming more and more popular

however , When using multi branch structure , On the one hand, parallel acceleration cannot be effectively utilized , On the other hand, it increases MAC

In order to make the simple structure reach the same accuracy as the multi branch structure , In the training RepVGG When using multi branch structure (3×3 Convolution + 1×1 Convolution + Identity mapping ), With the help of its good convergence ability ; In reasoning 、 During deployment, the multi branch structure is transformed into a single path structure by using re parameterization Technology , With the help of simple structure, the ultimate speed

Reparameterization

In the multi branch structure used in training , There is one in every branch BN layer

BN Layer has four parameters used in operation ：mean、var、weight、bias, For input x Perform the following transformation ：

$BN(x)=weight \cdot \frac{x-mean}{\sqrt{var}}+bias$

Turn into $BN(x) = w_{bn} \cdot x +b_{bn}$ In the form of ：

$w_{bn}=\frac{weight}{\sqrt{var}},\ b_{bn}=bias-\frac{weight\cdot mean}{\sqrt{var}}$

import torch
from torch import nn


class BatchNorm(nn.BatchNorm2d):

    def unpack(self):
        mean, weight, bias = self.running_mean, self.weight, self.bias
        std = (self.running_var + self.eps).sqrt()
        eq_weight = weight / std
        eq_bias = bias - weight * mean / std
        return eq_weight, eq_bias


bn = BatchNorm(8).eval()
#  Initialize random parameters 
bn.running_mean.data, bn.running_var.data, bn.weight.data, bn.bias.data = torch.rand([4, 8])

image = torch.rand([1, 8, 1, 1])
print(bn(image).view(-1))
#  take  BN  Parameters of are converted to  w, b  form 
weight, bias = bn.unpack()
print(image.view(-1) * weight + bias)

because BN The layer will fit the offset of each channel , So the convolution layer and BN When layers are connected together , Convolution layer does not use bias , Its operation can be expressed as ：

$Conv(x)=w_{c}*x$

$BN(Conv(x))=w_{bn}w_{c}*x+b_{bn}$

so , Convolution layer and the BN Layer can be equivalent to a convolution layer with offset

And identity mapping can also be equivalent to 1×1 Convolution ：

about nn.Conv2d(c1, c2, kernel_size=1), Of its parameters shape by [c2, c1, 1, 1] —— Can be regarded as [c2, c1] The linear layer of , To perform channel transformation of each pixel ( Reference resources ：PyTorch Two dimensional multi-channel convolution operation )
When c1 = c2、 And when this linear layer is a unit matrix , Equivalent to identity mapping

1×1 Convolution can be achieved by filling 0 Expressed as 3×3 Convolution , Therefore, the calculation of the multi branch structure can be expressed as ：

$BN_{3 \times 3}(Conv_{3 \times 3}(x))=w_3*x+b_3$

$BN_{1 \times 1}(Conv_{1 \times 1}(x))=w_1*x+b_1$

$BN_{id}(Conv_{id}(x))=w_o*x+b_0$

y=(w_3+w_1+w_0)*x+(b_3+b_1+b_0)

Thus, it can be equivalent to a new 3×3 Convolution ( This conclusion can also be extended to grouping convolution 、5×5 Convolution )

stay NVIDIA 1080Ti Speed test on , With [32, 2048, 56, 56] The image input convolution kernel of gets the output of the same channel and the same size ,3×3 Convolution has the most floating-point operations per second

Structural recurrence

Reference code ：https://github.com/DingXiaoH/RepVGG

I refactor the source code in the paper , The purpose is to enhance its readability 、 Ease of use ( In order to be portable YOLO project , In addition to the L2 Calculation of norm )

meanwhile , I also write reparameterized functions into static methods of classes , Support the re parameterization of the integrated model

from collections import OrderedDict

import torch
import torch.nn.functional as F
from torch import nn


class BatchNorm(nn.BatchNorm2d):

    def unpack(self):
        mean, weight, bias = self.running_mean, self.weight, self.bias
        std = (self.running_var + self.eps).sqrt()
        eq_weight = weight / std
        eq_bias = bias - weight * mean / std
        return eq_weight, eq_bias


class RepVGGBlock(nn.Module):

    def __init__(self, c1, c2, k=3, s=1, g=1, deploy=False):
        super(RepVGGBlock, self).__init__()
        self.deploy = deploy
        #  Check the size of convolution kernel 
        assert k & 1, 'The convolution kernel size must be odd'
        #  Main branch convolution parameter 
        self.conv_main_config = dict(
            in_channels=c1, out_channels=c2, kernel_size=k,
            stride=s, padding=k // 2, groups=g
        )
        if deploy:
            self.conv_main = nn.Conv2d(**self.conv_main_config, bias=True)
        else:
            #  The main branch 
            self.conv_main = nn.Sequential(OrderedDict(
                conv=nn.Conv2d(**self.conv_main_config, bias=False),
                bn=BatchNorm(c2)
            ))
            # 1×1  Convolution branch 
            self.conv_1x1 = nn.Sequential(OrderedDict(
                conv=nn.Conv2d(c1, c2, 1, s, padding=0, groups=g, bias=False),
                bn=BatchNorm(c2)
            )) if k != 1 else None
            #  Identity mapping branch 
            self.identity = BatchNorm(c2) if c1 == c2 and s == 1 else None

    def forward(self, x, act=F.silu):
        y = self.conv_main(x)
        if self.conv_1x1:
            y += self.conv_1x1(x)
        if self.identity:
            y += self.identity(x)
        #  Use activation function 
        y = act(y) if act else y
        return y

    @staticmethod
    def merge(model: nn.Module):
        #  Query all sub models of the model ,  Yes  RepVGGBlock  A merger 
        for m in model.modules():
            if isinstance(m, RepVGGBlock) and not m.deploy:
                #  Main branch information 
                kernel = m.conv_main.conv.weight
                (c2, c1_per_group, k, _), g = kernel.shape, m.conv_main.conv.groups
                center_pos = k // 2
                #  Convert main branch 
                bn_weight, bn_bias = m.conv_main.bn.unpack()
                kernel_weight, kernel_bias = kernel * bn_weight.view(-1, 1, 1, 1), bn_bias
                #  transformation  1×1  Convolution branch 
                if m.conv_1x1:
                    kernel_1x1 = m.conv_1x1.conv.weight[..., 0, 0]
                    bn_weight, bn_bias = m.conv_1x1.bn.unpack()
                    kernel_weight[..., center_pos, center_pos] += kernel_1x1 * bn_weight.view(-1, 1)
                    kernel_bias += bn_bias
                #  Transformation identity mapping branch 
                if m.identity:
                    kernel_id = torch.cat([torch.eye(c1_per_group)] * g, dim=0).to(kernel.device)
                    bn_weight, bn_bias = m.identity.unpack()
                    kernel_weight[..., center_pos, center_pos] += kernel_id * bn_weight.view(-1, 1)
                    kernel_bias += bn_bias
                #  Declare the combined convolution kernel 
                m.conv_main = nn.Conv2d(**m.conv_main_config, bias=True)
                m.conv_main.weight.data, m.conv_main.bias.data = kernel_weight, kernel_bias
                #  Delete the merged Branch 
                m.deploy = True
                delattr(m, 'conv_1x1')
                delattr(m, 'identity')
                m.conv_1x1, m.identity = None, None

Then design an integration model to verify ：

merge Whether the function changes the network structure
Before and after re parameterization , Whether the calculation results of the model are consistent
After reparameterization , Whether the reasoning speed of the model has been improved

if __name__ == '__main__':

    class RepVGG(nn.Module):

        def __init__(self, num_blocks, num_classes=1000, width_multiplier=None, deploy=False):
            super(RepVGG, self).__init__()
            assert len(width_multiplier) == 4
            self.deploy = deploy
            #  Enter the number of channels 
            self.in_planes = min(64, int(64 * width_multiplier[0]))
            self.stage0 = RepVGGBlock(3, self.in_planes, k=3, s=2, deploy=self.deploy)
            #  The trunk is divided into four parts ,  Each part uses multiple  RepVGGBlock  cascade 
            self.stage1 = self._make_stage(int(64 * width_multiplier[0]), num_blocks[0], stride=2)
            self.stage2 = self._make_stage(int(128 * width_multiplier[1]), num_blocks[1], stride=2)
            self.stage3 = self._make_stage(int(256 * width_multiplier[2]), num_blocks[2], stride=2)
            self.stage4 = self._make_stage(int(512 * width_multiplier[3]), num_blocks[3], stride=2)
            self.gap = nn.AdaptiveAvgPool2d(output_size=1)
            self.linear = nn.Linear(int(512 * width_multiplier[3]), num_classes)

        def _make_stage(self, planes, num_blocks, stride):
            strides = [stride] + [1] * (num_blocks - 1)
            blocks = []
            for stride in strides:
                blocks.append(RepVGGBlock(self.in_planes, planes, k=3, s=stride, deploy=self.deploy))
                self.in_planes = planes
            return nn.Sequential(*blocks)

        def forward(self, x):
            out = self.stage0(x)
            out = self.stage1(out)
            out = self.stage2(out)
            out = self.stage3(out)
            out = self.stage4(out)
            out = self.gap(out)
            out = out.view(out.size(0), -1)
            out = self.linear(out)
            return out


    vgg = RepVGG(num_blocks=[1, 1, 1, 1], num_classes=20,
                 width_multiplier=[1, 1, 1, 1]).eval()
    print(vgg)
    #  by  BatchNorm  Initialize random parameters 
    for m in vgg.modules():
        if isinstance(m, BatchNorm):
            m.running_mean.data, m.running_var.data, \
            m.weight.data, m.bias.data = torch.rand([4, m.num_features])

    image = torch.rand([1, 3, 224, 224])


    class Timer:
        prefix = 'Cost: '

        def __init__(self, fun, *args, **kwargs):
            import time
            start = time.time()
            fun(*args, **kwargs)
            cost = (time.time() - start) * 1e3
            print(self.prefix + f'{cost:.0f} ms')


    #  Using training structures  VGG  To test 
    print(vgg(image))
    Timer(vgg, image)

    #  call  RepVGGBlock  Static method of ,  Merge  RepVGGBlock  The branch of 
    RepVGGBlock.merge(vgg)
    print(vgg)

    #  Using reasoning structure  VGG  To test 
    print(vgg(image))
    Timer(vgg, image)

RepVGG(
(stage0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(stage1): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage2): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage3): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage4): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(gap): AdaptiveAvgPool2d(output_size=1)
(linear): Linear(in_features=512, out_features=20, bias=True)
)
tensor([[-0.1108, 0.0824, 0.5547, -0.1671, 0.7442, -0.1164, -0.2825, 0.4088,
0.1239, -0.3792, 0.1152, -0.4021, 0.4034, 0.2350, 0.2601, -0.1197,
0.2462, -0.2451, 0.0439, -0.2507]], grad_fn=<AddmmBackward>)
Cost: 22 ms

RepVGG(
(stage0): RepVGGBlock(
(conv_main): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
(stage1): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage2): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage3): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage4): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(gap): AdaptiveAvgPool2d(output_size=1)
(linear): Linear(in_features=512, out_features=20, bias=True)
)
tensor([[-0.1108, 0.0824, 0.5547, -0.1671, 0.7442, -0.1164, -0.2825, 0.4088,
0.1239, -0.3792, 0.1152, -0.4021, 0.4034, 0.2350, 0.2601, -0.1197,
0.2462, -0.2451, 0.0439, -0.2507]], grad_fn=<AddmmBackward>)
Cost: 14 ms

原网站

版权声明
本文为[Hebi tongzj]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/202/202207201408595394.html

当前位置：网站首页>Pytorch structure reparameterization repvggblock

Pytorch structure reparameterization repvggblock

Reparameterization

Structural recurrence

边栏推荐

猜你喜欢

随机推荐