当前位置:网站首页>Huawei's lightweight neural network architecture GhostNet has been upgraded again, and G-GhostNet (IJCV22) has shown its talents on the GPU
Huawei's lightweight neural network architecture GhostNet has been upgraded again, and G-GhostNet (IJCV22) has shown its talents on the GPU
2022-08-05 10:04:00 【AI Vision Network】
Concludes with the test code
githubAre training.
Based on the memory and network deployment in the period of limited resources problem,Put forward two differentGhost模块,Aims to use low cost of linear operation to generateGhost特征图.C-GhostModules are used toCPU等设备,And through a simple stack moduleC-GhostNet.适用于GPU等设备的G-GhostModule using redundant construction stage characteristics.Finally the experimental results show that the two modules respectively the corresponding equipment accuracy and delay the best balance.
在GhostNet(CVPR 2020)出现之前,The author of redundant features remained inflexible prejudice.But neither overly restrain redundant,Or excessive increase redundant,All the bad effects on the structure performance(存在即合理).在GhostNet中,The author and others through the experiment of meticulous observation,提出“Accept the redundant features in an effective manner figure”的思想,The characteristics of the previous generation process in linear operation to replace the more cheap,To ensure performance and realize the lightweight.
如下图所示,Characteristics of the middle figure after visual,Clear characteristics of some express approximate(In the same color of the box),Therefore the author put forward,Approximate characteristic figure can be get through some cheap operation.
图1:ResNet50Some characteristics of the first group residuals in map visualization.
据此,作者等人提出GhostNet,And in this article referred to asC-GhostNet,So the author will focus on readingG-Ghost,对C-GhostOnly to do review.
C-GhostNetIn order to realize lightweight,Use some low computing density operator.Low operation density makesGPUParallel computing ability can't be make full use of,从而导致C-GhostNet在GPUOn devices such as bad delay,Therefore need to design a suitable forGPU设备的Ghost模块.
The author and others found that,现有大多数CNN架构中,A stage is usually includes several convolution layer/块,At the same time in different layers in each phase/块,Characteristics of figure size is the same,Therefore a guess:The characteristic similarity and redundancy is not only exist in a layer inside,Also exists between the phase of the multiple layer.Below the visual results show that the idea(As the third line of the second column on the right and the characteristics of the seventh row third column figure there is a certain similarity).
图2:左边为ResNet34The second phase of all convolution block,The right to the phase of the characteristics of the first and the last piece of figure.
The author and others using redundant stage characteristics observed,设计G-GhostModule and applied to theGPU等设备,实现了一个在GPU上具有SOTAThe performance of lightweightCNN.
如图3,给定输入,A common process of convolution can be expressed as:
其中,,Said by the size of the characteristics of the figure generated size for the characteristics of the figure.By the visual result,The output characteristic figure contains some redundancy exists,So you can use cheaper operations to achieve redundancy generated.
The author and others will output characteristic figure as inherent characteristics andGhost特征的组合,其中GhostCharacteristics can be gained by acting on inner characteristics of cheap operation,具体过程如下:
For the output size of,First use of a normal convolution generatedm个本征特征图,即,其中,.Then the characteristics of each of the figure for cheap operation to generates个Ghost特征(n=m×s),可用公式表示:
On behalf of the firsti个特征图,是生成第jA ghost figurej个(最后一个除外)廉价操作,Don't for cheap operation,And use identity map to keep inner characteristics.
一份简单的C-GhostModule code examples as shown below:
class GhostModule(nn.Module):
def __init__(self, in_channel, out_channel, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
super(GhostModule, self).__init__()
self.out_channel = out_channel
init_channels = math.ceil(out_channel / ratio)
new_channels = init_channels*(ratio-1)
# Generate the internal characteristic figure
self.primary_conv = nn.Sequential(
nn.Conv2d(in_channel, init_channels, kernel_size, stride, kernel_size//2, ),
nn.ReLU(inplace=True) if relu else nn.Sequential(),
# Based on the inherent characteristic diagram is generatedGhost特征
self.cheap_operation = nn.Sequential(
nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels),
nn.ReLU(inplace=True) if relu else nn.Sequential(),
def forward(self, x):
x1 = self.primary_conv(x)
x2 = self.cheap_operation(x1)
out = torch.cat([x1,x2], dim=1)
return out[:,:self.out_channel,:,:]
The sample code used in low computing density ofdepth-wiseConvolution as generatedGhostCharacteristics of cheap operation,对于GPU等设备来说,Can't make full use of the parallel computing ability.另一方面,If can remove part of the figure and cut to activate,Can the probability to reduce GPU 上的延迟.
RadosavovicPeople such as introducing activation degree(All the size of the output tensor convolution layer)To measure the complexity of the network,对于GPUFor delay,Activate degrees than FLOPsMore related.
如何实现这一目标?Have to use reading part mentioned“The characteristic similarity and redundancy is not only exist in a layer inside,Also exists between the phase of the multiple layer”.Due to the current popularCNN架构中,Different layers in the same stage,Its output characteristic figure size will not change,Therefore, a cross-layer cheap alternative can be done by this feature.其具体实现过程如下:
在CNNA certain stage of,Deep features were divided intoGhost特征(Can be obtained by shallow cheap operation)And complex features(Not by shallow cheap operation),以图2为例:
图2:左边为ResNet34The second phase of all convolution block,The right to the phase of the characteristics of the first and the last piece of figure.
The second phase from shallow to deep eight layer respectively as,···,,Assume that the output of the as,Set a partition ratio at this time,那么输出的GhostCharacteristic is the,Complex characteristics as.
The complex characteristics, in turn, by8A convolution layer to obtain,With more abstract semantic information,GhostCharacteristics by using cheap operation directly from the output of the,The final output through the channel joining together to aggregate.如下图所示():
图4:λ = 0.5 的 G-Ghost 阶段示意图.
但是,Directly the characteristics of the effects of splicing is obvious.Step by step a complex characteristics after extraction,Contains richer semantic information;而GhostCharacteristics of cheap operating income by shallow,Part may lack deep information.So an information compensation method is necessary,The author and others to use the following to promote the characterization of cheap operation ability:
图5:λ = 0.5 的含有mix操作的 G-Ghost 阶段示意图.
如图5,Complex characteristics through continuousnGenerate a convolution block,GhostThe characteristic by the first convolution piece through cheap operating income.其中mixModule is used to promote cheap operation characterization ability,The complex characteristics of first branch in the first2至第nLayer in the middle of the characteristics of joining together,Using the transformation function,Transform to the domain with the output of cheap operation,最后再进行特征融合(Such as simple line by element of).
下面以图6为例进行详细说明.如图,Have to havenThe same stage of the convolution of the output,On behalf of the last layer,Complex feature;代表第一层,The cheap operation application layer.Will now to the characteristics of joining together,得到特征,And use the functionZThe output of the transformation to the branch with cheap operation consistent domain,Function is expressed in formula is as follows:
代表对Z进行全局平均池化得到,And said the weight and bias,即将Z池化后,By a fully connected domain transform.
图6The output of the two branches to simple addition of each element,And applied to nonlinear getGhost特征.
官方公布的g_ghost_regnet文件中,Cheap operation code for:
self.cheap = nn.Sequential(
nn.Conv2d(cheap_planes, cheap_planes,
kernel_size=1, stride=1, bias=False),
# nn.ReLU(inplace=True),
In addition to the size of1的卷积核,Cheap operation can also use3×3、5×5的卷积核,Or directly identity mapping to calculate.
表1:ImageNetEach method is.
表2:ImageNetEach method is(GhostX-RegNet).
In this paper, using the visual observations and a large number of experimental results,提出了GhostCharacteristics of thought,利用“The characteristic similarity and redundancy is not only exist in a layer inside,Also exists between the phase of the multiple layer”这一猜测,Compared to designC-Ghost更适用于GPU等设备的G-Ghost,And in the actual delay and achieved good tradeoff between performance.
同时在G-Ghost中,Reasonable in the middle of the characteristics of the aggregation module is effective in relievingGhostFeature information loss problem.但是GhostFeatures and complex but need to manually adjust the partition ratio,Different precision、The influence of delay has a different,In actual test of the author and others,Maintained a good balance of accuracy and delay.
The other about usingC-Ghost、G-Ghost模块构建GhostNet整体网络,Are indicated in the paper and the code,The interested reader can read~
gpu1060 1x3x128x128 5ms cpu30ms
skipnet gpu1060 1x3x128x128 6ms cpu18ms
# 2022.07.17-Changed for building GhostNet
# Huawei Technologies Co., Ltd. <[email protected]huawei.com>
Creates a G-Ghost RegNet Model as defined in paper:
GhostNets on Heterogeneous Devices via Cheap Operations.
Modified from https://github.com/d-li14/regnet.pytorch
import time
import numpy as np
import torch
import torch.nn as nn
__all__ = ['regnetx_032']
def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=dilation, groups=groups, bias=False, dilation=dilation)
def conv1x1(in_planes, out_planes, stride=1):
"""1x1 convolution"""
return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
class Bottleneck(nn.Module):
expansion = 1
__constants__ = ['downsample']
def __init__(self, inplanes, planes, stride=1, downsample=None, group_width=1, dilation=1, norm_layer=None):
super(Bottleneck, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
width = planes * self.expansion
# Both self.conv2 and self.downsample layers downsample the input when stride != 1
self.conv1 = conv1x1(inplanes, width)
self.bn1 = norm_layer(width)
self.conv2 = conv3x3(width, width, stride, width // min(width, group_width), dilation)
self.bn2 = norm_layer(width)
self.conv3 = conv1x1(width, planes)
self.bn3 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = self.relu(out)
return out
class LambdaLayer(nn.Module):
def __init__(self, lambd):
super(LambdaLayer, self).__init__()
self.lambd = lambd
def forward(self, x):
return self.lambd(x)
class Stage(nn.Module):
def __init__(self, block, inplanes, planes, group_width, blocks, stride=1, dilate=False, cheap_ratio=0.5):
super(Stage, self).__init__()
norm_layer = nn.BatchNorm2d
downsample = None
self.dilation = 1
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes:
downsample = nn.Sequential(conv1x1(inplanes, planes, stride), norm_layer(planes), )
self.base = block(inplanes, planes, stride, downsample, group_width, previous_dilation, norm_layer)
self.end = block(planes, planes, group_width=group_width, dilation=self.dilation, norm_layer=norm_layer)
group_width = int(group_width * 0.75)
raw_planes = int(planes * (1 - cheap_ratio) / group_width) * group_width
cheap_planes = planes - raw_planes
self.cheap_planes = cheap_planes
self.raw_planes = raw_planes
self.merge = nn.Sequential(nn.AdaptiveAvgPool2d(1), nn.Conv2d(planes + raw_planes * (blocks - 2), cheap_planes, kernel_size=1, stride=1, bias=False), nn.BatchNorm2d(cheap_planes),
nn.ReLU(inplace=True), nn.Conv2d(cheap_planes, cheap_planes, kernel_size=1, bias=False), nn.BatchNorm2d(cheap_planes), # nn.ReLU(inplace=True),
self.cheap = nn.Sequential(nn.Conv2d(cheap_planes, cheap_planes, kernel_size=1, stride=1, bias=False), nn.BatchNorm2d(cheap_planes), # nn.ReLU(inplace=True),
self.cheap_relu = nn.ReLU(inplace=True)
layers = []
downsample = nn.Sequential(LambdaLayer(lambda x: x[:, :raw_planes]))
layers = []
layers.append(block(raw_planes, raw_planes, 1, downsample, group_width, self.dilation, norm_layer))
inplanes = raw_planes
for _ in range(2, blocks - 1):
layers.append(block(inplanes, raw_planes, group_width=group_width, dilation=self.dilation, norm_layer=norm_layer))
self.layers = nn.Sequential(*layers)
def forward(self, input):
x0 = self.base(input)
m_list = [x0]
e = x0[:, :self.raw_planes]
for l in self.layers:
e = l(e)
m = torch.cat(m_list, 1)
m = self.merge(m)
c = x0[:, self.raw_planes:]
c = self.cheap_relu(self.cheap(c) + m)
x = torch.cat((e, c), 1)
x = self.end(x)
return x
class RegNet(nn.Module):
def __init__(self, block, layers, widths, num_classes=1000, zero_init_residual=True, group_width=1, replace_stride_with_dilation=None, norm_layer=None):
super(RegNet, self).__init__()
if norm_layer is None:
norm_layer = nn.BatchNorm2d
self._norm_layer = norm_layer
self.inplanes = 32
self.dilation = 1
if replace_stride_with_dilation is None:
# each element in the tuple indicates if we should replace
# the 2x2 stride with a dilated convolution instead
replace_stride_with_dilation = [False, False, False, False]
if len(replace_stride_with_dilation) != 4:
raise ValueError("replace_stride_with_dilation should be None "
"or a 4-element tuple, got {}".format(replace_stride_with_dilation))
self.group_width = group_width
self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=2, padding=1, bias=False)
self.bn1 = norm_layer(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.layer1 = self._make_layer(block, widths[0], layers[0], stride=2, dilate=replace_stride_with_dilation[0])
self.inplanes = widths[0]
if layers[1] > 2:
self.layer2 = Stage(block, self.inplanes, widths[1], group_width, layers[1], stride=2, dilate=replace_stride_with_dilation[1], cheap_ratio=0.5)
self.layer2 = self._make_layer(block, widths[1], layers[1], stride=2, dilate=replace_stride_with_dilation[1])
self.inplanes = widths[1]
self.layer3 = Stage(block, self.inplanes, widths[2], group_width, layers[2], stride=2, dilate=replace_stride_with_dilation[2], cheap_ratio=0.5)
self.inplanes = widths[2]
if layers[3] > 2:
self.layer4 = Stage(block, self.inplanes, widths[3], group_width, layers[3], stride=2, dilate=replace_stride_with_dilation[3], cheap_ratio=0.5)
self.layer4 = self._make_layer(block, widths[3], layers[3], stride=2, dilate=replace_stride_with_dilation[3])
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(0.2)
self.fc = nn.Linear(widths[-1] * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
# Zero-initialize the last BN in each residual branch, # so that the residual branch starts with zeros, and each residual block behaves like an identity. # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
# if zero_init_residual:
# for m in self.modules():
# if isinstance(m, Bottleneck):
# nn.init.constant_(m.bn3.weight, 0)
def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
norm_layer = self._norm_layer
downsample = None
previous_dilation = self.dilation
if dilate:
self.dilation *= stride
stride = 1
if stride != 1 or self.inplanes != planes:
downsample = nn.Sequential(conv1x1(self.inplanes, planes, stride), norm_layer(planes), )
layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.group_width, previous_dilation, norm_layer))
self.inplanes = planes
for _ in range(1, blocks):
layers.append(block(self.inplanes, planes, group_width=self.group_width, dilation=self.dilation, norm_layer=norm_layer))
return nn.Sequential(*layers)
def _forward_impl(self, x):
# See note [TorchScript super()]
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.dropout(x)
x = self.fc(x)
return x
def forward(self, x):
return self._forward_impl(x)
def regnetx_032(**kwargs):
return RegNet(Bottleneck, [2, 6, 15, 2], [96, 192, 432, 1008], group_width=48, **kwargs)
if __name__ == '__main__':
model=RegNet(Bottleneck, [2, 6, 15, 2], [96, 192, 432, 1008], group_width=48, num_classes=num_classes)
for i in range(20):
data = torch.randn(1, 3, 128, 128)#.cuda()
start = time.time()
out = model(data)
print('time', time.time() - start, out.size())
- Our Web3 Entrepreneurship Project, Yellow
- MySQL advanced (twenty-seven) database index principle
- 21 Days of Deep Learning - Convolutional Neural Networks (CNN): Clothing Image Classification (Day 3)
- Is digital transformation a business buy-in?
- 创建一个 Dapp,为什么要选择波卡?
- PAT Level B - B1021 Single Digit Statistics (15)
- PHP operation mangoDb
- 为什么sys_class 里显示的很多表的 RELTABLESPACE 值为 0 ?
- dotnet OpenXML parsing PPT charts Getting started with area charts
- [Office] Collection of Microsoft Office download addresses (offline installation and download of Microsoft's official original version)
Confessing in the era of digital transformation: Mai Cong Software allows enterprises to use data in the easiest way
首次去中心化抢劫?近2亿美元损失:跨链桥Nomad 被攻击事件分析
hcip BGP 增强实验
哪位大佬有20年4月或者1月的11G GI和ojvm补丁呀,帮忙发下?
The founder of the DFINITY Foundation talks about the ups and downs of the bear market, and where should DeFi projects go?
What is SPL?
Tanabata romantic date without overtime, RPA robot helps you get the job done
uniapp connect ibeacon
Microservice Technology Stack
EU | Horizon 2020 ENSEMBLE: D2.13 SOTIF Safety Concept (Part 2)
Analysis and practice of antjian webshell dynamic encrypted connection
MySQL使用聚合函数可以不搭配GROUP BY分组吗?
What is the function of the regular expression replaceAll() method?
入门 Polkadot 平行链开发,看这一篇就够了
2022 Huashu Cup Mathematical Modeling Question A Optimization Design Ideas for Ring Oscillators Code Sharing