当前位置:网站首页>Pytorch structure reparameterization repvggblock
Pytorch structure reparameterization repvggblock
2022-07-25 00:40:00 【Hebi tongzj】
stay ShuffleNet v2 Proposed the lightweight network 4 Large design criteria :
- When the input and output channels are the same ,MAC Minimum
- FLOPs Phase at the same time , If the number of packets is too large, the packet convolution will increase MAC
- Fragmentation operation ( Multi branch structure ) Unfriendly to parallel acceleration
- The memory and time-consuming brought by element by element operation can not be ignored
In recent years , The structure of convolutional neural network has become more and more complex ; Thanks to the good convergence ability of multi branch structure , Multi branch structures are becoming more and more popular
however , When using multi branch structure , On the one hand, parallel acceleration cannot be effectively utilized , On the other hand, it increases MAC

In order to make the simple structure reach the same accuracy as the multi branch structure , In the training RepVGG When using multi branch structure (3×3 Convolution + 1×1 Convolution + Identity mapping ), With the help of its good convergence ability ; In reasoning 、 During deployment, the multi branch structure is transformed into a single path structure by using re parameterization Technology , With the help of simple structure, the ultimate speed

Reparameterization
In the multi branch structure used in training , There is one in every branch BN layer
BN Layer has four parameters used in operation :mean、var、weight、bias, For input x Perform the following transformation :

Turn into
In the form of :

import torch
from torch import nn
class BatchNorm(nn.BatchNorm2d):
def unpack(self):
mean, weight, bias = self.running_mean, self.weight, self.bias
std = (self.running_var + self.eps).sqrt()
eq_weight = weight / std
eq_bias = bias - weight * mean / std
return eq_weight, eq_bias
bn = BatchNorm(8).eval()
# Initialize random parameters
bn.running_mean.data, bn.running_var.data, bn.weight.data, bn.bias.data = torch.rand([4, 8])
image = torch.rand([1, 8, 1, 1])
print(bn(image).view(-1))
# take BN Parameters of are converted to w, b form
weight, bias = bn.unpack()
print(image.view(-1) * weight + bias)because BN The layer will fit the offset of each channel , So the convolution layer and BN When layers are connected together , Convolution layer does not use bias , Its operation can be expressed as :


so , Convolution layer and the BN Layer can be equivalent to a convolution layer with offset

And identity mapping can also be equivalent to 1×1 Convolution :
- about nn.Conv2d(c1, c2, kernel_size=1), Of its parameters shape by [c2, c1, 1, 1] —— Can be regarded as [c2, c1] The linear layer of , To perform channel transformation of each pixel ( Reference resources :PyTorch Two dimensional multi-channel convolution operation )
- When c1 = c2、 And when this linear layer is a unit matrix , Equivalent to identity mapping
1×1 Convolution can be achieved by filling 0 Expressed as 3×3 Convolution , Therefore, the calculation of the multi branch structure can be expressed as :




Thus, it can be equivalent to a new 3×3 Convolution ( This conclusion can also be extended to grouping convolution 、5×5 Convolution )
stay NVIDIA 1080Ti Speed test on , With [32, 2048, 56, 56] The image input convolution kernel of gets the output of the same channel and the same size ,3×3 Convolution has the most floating-point operations per second

Structural recurrence
Reference code :https://github.com/DingXiaoH/RepVGG
I refactor the source code in the paper , The purpose is to enhance its readability 、 Ease of use ( In order to be portable YOLO project , In addition to the L2 Calculation of norm )
meanwhile , I also write reparameterized functions into static methods of classes , Support the re parameterization of the integrated model
from collections import OrderedDict
import torch
import torch.nn.functional as F
from torch import nn
class BatchNorm(nn.BatchNorm2d):
def unpack(self):
mean, weight, bias = self.running_mean, self.weight, self.bias
std = (self.running_var + self.eps).sqrt()
eq_weight = weight / std
eq_bias = bias - weight * mean / std
return eq_weight, eq_bias
class RepVGGBlock(nn.Module):
def __init__(self, c1, c2, k=3, s=1, g=1, deploy=False):
super(RepVGGBlock, self).__init__()
self.deploy = deploy
# Check the size of convolution kernel
assert k & 1, 'The convolution kernel size must be odd'
# Main branch convolution parameter
self.conv_main_config = dict(
in_channels=c1, out_channels=c2, kernel_size=k,
stride=s, padding=k // 2, groups=g
)
if deploy:
self.conv_main = nn.Conv2d(**self.conv_main_config, bias=True)
else:
# The main branch
self.conv_main = nn.Sequential(OrderedDict(
conv=nn.Conv2d(**self.conv_main_config, bias=False),
bn=BatchNorm(c2)
))
# 1×1 Convolution branch
self.conv_1x1 = nn.Sequential(OrderedDict(
conv=nn.Conv2d(c1, c2, 1, s, padding=0, groups=g, bias=False),
bn=BatchNorm(c2)
)) if k != 1 else None
# Identity mapping branch
self.identity = BatchNorm(c2) if c1 == c2 and s == 1 else None
def forward(self, x, act=F.silu):
y = self.conv_main(x)
if self.conv_1x1:
y += self.conv_1x1(x)
if self.identity:
y += self.identity(x)
# Use activation function
y = act(y) if act else y
return y
@staticmethod
def merge(model: nn.Module):
# Query all sub models of the model , Yes RepVGGBlock A merger
for m in model.modules():
if isinstance(m, RepVGGBlock) and not m.deploy:
# Main branch information
kernel = m.conv_main.conv.weight
(c2, c1_per_group, k, _), g = kernel.shape, m.conv_main.conv.groups
center_pos = k // 2
# Convert main branch
bn_weight, bn_bias = m.conv_main.bn.unpack()
kernel_weight, kernel_bias = kernel * bn_weight.view(-1, 1, 1, 1), bn_bias
# transformation 1×1 Convolution branch
if m.conv_1x1:
kernel_1x1 = m.conv_1x1.conv.weight[..., 0, 0]
bn_weight, bn_bias = m.conv_1x1.bn.unpack()
kernel_weight[..., center_pos, center_pos] += kernel_1x1 * bn_weight.view(-1, 1)
kernel_bias += bn_bias
# Transformation identity mapping branch
if m.identity:
kernel_id = torch.cat([torch.eye(c1_per_group)] * g, dim=0).to(kernel.device)
bn_weight, bn_bias = m.identity.unpack()
kernel_weight[..., center_pos, center_pos] += kernel_id * bn_weight.view(-1, 1)
kernel_bias += bn_bias
# Declare the combined convolution kernel
m.conv_main = nn.Conv2d(**m.conv_main_config, bias=True)
m.conv_main.weight.data, m.conv_main.bias.data = kernel_weight, kernel_bias
# Delete the merged Branch
m.deploy = True
delattr(m, 'conv_1x1')
delattr(m, 'identity')
m.conv_1x1, m.identity = None, NoneThen design an integration model to verify :
- merge Whether the function changes the network structure
- Before and after re parameterization , Whether the calculation results of the model are consistent
- After reparameterization , Whether the reasoning speed of the model has been improved
if __name__ == '__main__':
class RepVGG(nn.Module):
def __init__(self, num_blocks, num_classes=1000, width_multiplier=None, deploy=False):
super(RepVGG, self).__init__()
assert len(width_multiplier) == 4
self.deploy = deploy
# Enter the number of channels
self.in_planes = min(64, int(64 * width_multiplier[0]))
self.stage0 = RepVGGBlock(3, self.in_planes, k=3, s=2, deploy=self.deploy)
# The trunk is divided into four parts , Each part uses multiple RepVGGBlock cascade
self.stage1 = self._make_stage(int(64 * width_multiplier[0]), num_blocks[0], stride=2)
self.stage2 = self._make_stage(int(128 * width_multiplier[1]), num_blocks[1], stride=2)
self.stage3 = self._make_stage(int(256 * width_multiplier[2]), num_blocks[2], stride=2)
self.stage4 = self._make_stage(int(512 * width_multiplier[3]), num_blocks[3], stride=2)
self.gap = nn.AdaptiveAvgPool2d(output_size=1)
self.linear = nn.Linear(int(512 * width_multiplier[3]), num_classes)
def _make_stage(self, planes, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
blocks = []
for stride in strides:
blocks.append(RepVGGBlock(self.in_planes, planes, k=3, s=stride, deploy=self.deploy))
self.in_planes = planes
return nn.Sequential(*blocks)
def forward(self, x):
out = self.stage0(x)
out = self.stage1(out)
out = self.stage2(out)
out = self.stage3(out)
out = self.stage4(out)
out = self.gap(out)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
vgg = RepVGG(num_blocks=[1, 1, 1, 1], num_classes=20,
width_multiplier=[1, 1, 1, 1]).eval()
print(vgg)
# by BatchNorm Initialize random parameters
for m in vgg.modules():
if isinstance(m, BatchNorm):
m.running_mean.data, m.running_var.data, \
m.weight.data, m.bias.data = torch.rand([4, m.num_features])
image = torch.rand([1, 3, 224, 224])
class Timer:
prefix = 'Cost: '
def __init__(self, fun, *args, **kwargs):
import time
start = time.time()
fun(*args, **kwargs)
cost = (time.time() - start) * 1e3
print(self.prefix + f'{cost:.0f} ms')
# Using training structures VGG To test
print(vgg(image))
Timer(vgg, image)
# call RepVGGBlock Static method of , Merge RepVGGBlock The branch of
RepVGGBlock.merge(vgg)
print(vgg)
# Using reasoning structure VGG To test
print(vgg(image))
Timer(vgg, image)RepVGG(
(stage0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(stage1): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage2): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage3): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage4): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(gap): AdaptiveAvgPool2d(output_size=1)
(linear): Linear(in_features=512, out_features=20, bias=True)
)
tensor([[-0.1108, 0.0824, 0.5547, -0.1671, 0.7442, -0.1164, -0.2825, 0.4088,
0.1239, -0.3792, 0.1152, -0.4021, 0.4034, 0.2350, 0.2601, -0.1197,
0.2462, -0.2451, 0.0439, -0.2507]], grad_fn=<AddmmBackward>)
Cost: 22 ms
RepVGG(
(stage0): RepVGGBlock(
(conv_main): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
(stage1): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage2): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage3): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage4): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(gap): AdaptiveAvgPool2d(output_size=1)
(linear): Linear(in_features=512, out_features=20, bias=True)
)
tensor([[-0.1108, 0.0824, 0.5547, -0.1671, 0.7442, -0.1164, -0.2825, 0.4088,
0.1239, -0.3792, 0.1152, -0.4021, 0.4034, 0.2350, 0.2601, -0.1197,
0.2462, -0.2451, 0.0439, -0.2507]], grad_fn=<AddmmBackward>)
Cost: 14 ms
边栏推荐
- Dpdk based basic knowledge sorting-01
- ===、==、Object. Is basic package type
- Exception, import package and file operation
- What is the root password of MySQL initial installation
- Basic functions of tea
- Fast development board for Godson solid state drive startup (burning system to solid state) - partition
- What is the function of transdata operator and whether it can optimize performance
- C language force buckle the eleventh question to find the maximum capacity of the bucket. (two methods)
- Advanced multithreading (Part 2)
- 动态规划-01背包滚动数组优化
猜你喜欢

Daily question 1 · 1260. Two dimensional network migration · simulation

494. Target sum · depth first search · knapsack problem

NXP i.mx6q development board software and hardware are all open source, and the schematic diagram of the core board is provided

Divide 300000 bonus! Deeperec CTR model performance optimization Tianchi challenge is coming

C recursively obtains all files under the folder and binds them to the treeview control
![[mindspore ascend] [running error] graph_ In mode, run the network to report an error](/img/81/9e96182be149aef221bccb63e1ce96.jpg)
[mindspore ascend] [running error] graph_ In mode, run the network to report an error
![[untitled]](/img/70/5db8a8df63a3fd593acf7f69640698.png)
[untitled]

paddlepaddle论文系列之Alexnet详解(附源码)

Which automation tools can double the operation efficiency of e-commerce?

Invitation letter | "people, finance, tax" digital empowerment, vigorously promote retail enterprises to achieve "doubling" of economies of scale
随机推荐
Summary of MATLAB basic grammar
Oracle is not null cannot filter null values
Quartus:17.1版本的Quartus安装Cyclone 10 LP器件库
If real-time intersection with line segments in online CAD drawings is realized
Moonpdflib Preview PDF usage record
The model needs to use two losses_ FN, how to operate?
Mobile terminal touch event
Principle of data proxy
Why does [mindspore ascend] [custom operator] repeatedly assign values to one tensor affect another tensor?
第三章 内核开发
Financial RPA robot enables enterprises to open a new era of intelligence
The number of palindromes in question 9 of C language deduction. Two pointer array traversal method
EF core: self referencing organizational structure tree
Find the median of two numbers in the fourth question of C language deduction (three methods)
[mindspore ascend] [running error] graph_ In mode, run the network to report an error
在混合云中管理数据库:八个关键注意事项
分页的相关知识
C language force buckle the flipped number of question 7. Violence Act
Uxdb resets the password without knowing the plaintext password
数组中只出现一次的两个数字