当前位置:网站首页>Yolov5-6.0 series | yolov5 model network construction
Yolov5-6.0 series | yolov5 model network construction
2022-06-09 05:30:00 【Clichong】
If there is a mistake , Please point out .
List of articles
Study yolov5 Code for , First of all, from yolov5 The whole process of model building is introduced . With yolov5-6.0 Version as an example , This note is mainly about yolov5 Model construction analysis , The construction code of the model is all placed in moodel Under the document , The main yolo.py The document completes the construction of the whole model ( Other modules are called ). As for the implementation of some specific modules , You can refer to my other articles , This note mainly records yolo.py This file .
stay yolov5-6.0 Version of yolo.py in , There are mainly 3 A content ,Detect class ,Model Class and parse_model function , Here are some details .
1. Call relationship
that , First, you need to know the calling relationship of these functions . Generally speaking , When using a model , You must initialize a model first , Be similar to :
# Create model
model = Model(opt.cfg).to(device)
model.train()
In the initialization of the model (__init__ function ) in , Would call parse_model function , about Model What's coming in is yaml Path to file , This yaml A dictionary like structure will be passed into parse_model Function to parse , Create a model structure for the network .
class Model(nn.Module):
def __init__(self, cfg='yolov5s.yaml', ch=3, nc=None, anchors=None): # model, input channels, number of classes
super().__init__()
...
with open(cfg, errors='ignore') as f:
self.yaml = yaml.safe_load(f) # model dict
# Create a network model
# self.model: Initialize the entire network model ( Include Detect The layer structure )
# self.save: In all layer structures from It's not equal to -1 The serial number of , Arrange the order side by side [4, 6, 10, 14, 17, 20, 23]
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
...
yaml file , With yolov5s.yaml For example , It reads as follows :
# YOLOv5 by Ultralytics, GPL-3.0 license
# Parameters
nc: 20 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
# YOLOv5 v6.0 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
parse_model The function will be built into a Sequential structure , The core is through eval() Function to convert these specific module name strings into string expressions for execution . So you can also see yaml The last line is a detection header Detect, Then through the eval() After the function is executed, it will call Detect class , adopt Detect To build the final Head structure .
The calling code is as follows :
def parse_model(d, ch): # model_dict, input_channels(3)
...
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):
# eval() Used to execute a string expression , And return the value of the expression
m = eval(m) if isinstance(m, str) else m # eval strings
...
# stay args Add three to the list Detect Layer output channel, args=[nc, anchor, ch]
# Got Detect Of args Parameters , stay m(*args) You can build Detect class , Pass in [num class, anchor, channels]
elif m is Detect:
args.append([ch[x] for x in f])
if isinstance(args[1], int): # number of anchors
args[1] = [list(range(args[1] * 2))] * len(f)
...
# m_: Get the current layer module If n>1 Just create multiple m( Current layer structure ), If n=1 Just create one m
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
layers.append(m_)
...
return nn.Sequential(*layers), sorted(save)
See the following content for the specific implementation process , Each module will be parsed in detail .
2. parse_model Function analysis
Just mentioned above , stay model When initializing , The first step is to build the forward propagation process of the whole model , Then you need to pass in yaml File to assist in building . One problem is , In the forward propagation process of a general model, it is possible to design feature maps of different layers for splicing and fusion , So this operation is in yolov5 Is realized by recording the output of the required layer . So the final return of this function is not just the continuous stacking of modules , It also returns a parameter save, To indicate that the output of these layers will be used finally .
With the parameters of this record , So forward propagation forward When , If the current layer is in save Parameters in , A list will be used y To keep the structure of the intermediate feature graph in forward propagation . Because we know that this output is needed later , It needs to be combined with other features concat Spliced . If not save Parameters in ,y It will not be saved , Speed up operational efficiency . Then save the feature map index (save) And the intermediate structure of the characteristic graph (y list ), Can be in to yaml Definition , In the specified layer structure, some feature maps are fused and spliced . This is in Model Class _forward_once Function , See the following for specific codes .
- yolov5-6.0 The code is as follows :
def parse_model(d, ch): # model_dict, input_channels(3)
""" Use on it Model Module Parse model file ( The dictionary form ), And build a network structure What this function does is : Update the current layer args( Parameters ), Calculation c2( The output of the current layer channel) => Use the parameters of the current layer to build the current layer => Generate layers + save :params d: model_dict Model file The dictionary form {dict:7} yolov5s.yaml Medium 6 Elements + ch :params ch: Record the output of each layer of the model channel initial ch=[3] Delete... Later :return nn.Sequential(*layers): The layer structure of each layer of the network :return sorted(save): Put all layers in the structure from No -1 Write down the value of And sort [4, 6, 10, 14, 17, 20, 23] """
LOGGER.info('\n%3s%18s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))
anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors # number of anchors
# Output of each feature layer channels Count : (classes, xywh, conf)
no = na * (nc + 5) # number of outputs = anchors * (classes + 5)
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
# from, number, module, args Traverse in turn backbone And head The parameters of the list , The line has 4 Parameters are useful 4 Variables followed by
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):
# eval() Used to execute a string expression , And return the value of the expression
m = eval(m) if isinstance(m, str) else m # eval strings
# useless
for j, a in enumerate(args):
try:
args[j] = eval(a) if isinstance(a, str) else a # eval strings
except NameError:
pass
# depth gain Control depth n: The number of times the current module ( Indirectly control the depth )
n = n_ = max(round(n * gd), 1) if n > 1 else n # depth gain
# If the model is normal bottleneck, Then set the input and output channel, to update args
if m in [Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
BottleneckCSP, C3, C3TR, C3SPP, C3Ghost]:
c1, c2 = ch[f], args[0]
if c2 != no: # if not output
c2 = make_divisible(c2 * gw, 8) # gw Control the width (channels Width ), Need to be 8 to be divisible by
args = [c1, c2, *args[1:]] # [64,6,2,2] -> [3,32,6,2,2]
# If the current layer is BottleneckCSP/C3/C3TR, You need to in args Add bottleneck The number of
if m in [BottleneckCSP, C3, C3TR, C3Ghost]:
args.insert(2, n) # number of repeats Insert... After the second position bottleneck Number n
n = 1
# BN The layer only needs to return the output of the previous layer channel
elif m is nn.BatchNorm2d:
args = [ch[f]]
# Concat Layer will f The output of this layer is obtained by accumulating all the outputs in channel
elif m is Concat:
c2 = sum([ch[x] for x in f])
# stay args Add three to the list Detect Layer output channel, args=[nc, anchor, ch]
# Got Detect Of args Parameters , stay m(*args) You can build Detect class , Pass in [num class, anchor, channels]
elif m is Detect:
args.append([ch[x] for x in f])
if isinstance(args[1], int): # number of anchors
args[1] = [list(range(args[1] * 2))] * len(f)
# Dimension change processing : x(1,64,80,80) to x(1,256,40,40)
# Because it's right wh Simultaneous processing is transferred to channel On , So we need square (**2) Handle
elif m is Contract:
c2 = ch[f] * args[0] ** 2
# Dimension change processing : x(1,64,80,80) to x(1,16,160,160)
# Also because it is right wh Simultaneous processing is transferred to channel On , So we need square (**2) Handle
elif m is Expand:
c2 = ch[f] // args[0] ** 2
# Other examples are nn.Upsample Returns the current output channel That's all right.
else:
c2 = ch[f]
# m_: Get the current layer module If n>1 Just create multiple m( Current layer structure ), If n=1 Just create one m
m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module
# Print the information of the current module : 'from', 'n', 'params', 'module', 'arguments'
# eg: 0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
t = str(m)[8:-2].replace('__main__.', '') # module type
np = sum([x.numel() for x in m_.parameters()]) # number params Calculate the parameter quantity of this layer
m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params
LOGGER.info('%3s%18s%3s%10.0f %-40s%-30s' % (i, f, n_, np, t, args))
# Put all layers in the structure from No -1 Write down the value of , Used to build head structure [6, 4, 14, 10, 17, 20, 23]
save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
layers.append(m_)
# ch: Record the output of each layer of the model channel initial ch=[3] Delete... Later
if i == 0:
ch = []
ch.append(c2)
return nn.Sequential(*layers), sorted(save)
I wrote the comments clearly in the code , There will be no more introductions here
3. Detect Class parsing
Detect Modules are used to build Detect Layer of , It's really just a 1x1 The convolution of . For those who came in later x, contain 3 A list , the reason being that 3 The scale of the feature map . The same output is made here channel Dimension reduction or dimension increase of , Anyway, the key is to let it output channel It's consistent , Concrete channel Values are (nclass + 5), there 5 yes xywh + conf. After convolution , You can make some changes to its dimensions , To calculate the subsequent model training loss .
meanwhile , We need to pay attention to , In training and reasoning, here Detect The output of is inconsistent . During the training , Output is a tensor list Store three elements [bs, anchor_num, grid_w, grid_h, xywh+c+20classes] Namely [1, 3, 80, 80, 25], [1, 3, 40, 40, 25], [1, 3, 20, 20, 25]; And in reasoning , The output is multi-scale stacked anchor result , [bs, anchor_num*grid_w*grid_h, xywh+c+20classes] -> [1, 19200+4800+1200, 25]. The specific reason is before yolov3-spp It is also introduced in , It is because the training results design the matching of positive and negative samples to construct the loss function , Reasoning doesn't have to be done nms After treatment to select the right anchor To display the results . See the previous for details yolov3-spp note :
Yolov3-spp series | yolov3spp Positive and negative sample matching
Yolov3-spp series | yolov3spp Training verification and testing process
- yolov5 Implementation code :
class Detect(nn.Module):
stride = None # strides computed during build
onnx_dynamic = False # ONNX export parameter
def __init__(self, nc=80, anchors=(), ch=(), inplace=True): # detection layer
# anchor: Pass in yaml Of documents 3 A set anchor, A list contains 3 Sublist
# ch: [128, 256, 512] 3 Outputs feature map Of channel
super().__init__()
self.nc = nc # number of classes
self.no = nc + 5 # number of outputs per anchor
self.nl = len(anchors) # number of detection layers: 3
self.na = len(anchors[0]) // 2 # number of anchors: 3
self.grid = [torch.zeros(1)] * self.nl # init grid
self.anchor_grid = [torch.zeros(1)] * self.nl # init anchor grid
# There are generally two types of parameters to be saved in the model : One is that back propagation needs to be optimizer The updated , be called parameter; The other is not to be updated as buffer
# buffer The parameter of is updated in forward in , and optim.step Only update nn.parameter Parameters of type , there anhcor It doesn't need direction propagation update
# A good way to dynamically build objects , And setattr The method of using is similar to : self.register_buffer('x', y) -> self.x = y
self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2)) # shape(nl,na,2)
# ordinary 1x1 Convolution processes each feature layer
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
self.inplace = inplace # use in-place ops (e.g. slice assignment) It's usually True Default not to use AWS Inferentia Speed up
def forward(self, x):
""" Return: train: One tensor list Store three elements [bs, anchor_num, grid_w, grid_h, xywh+c+20classes] Namely [1, 3, 80, 80, 25] [1, 3, 40, 40, 25] [1, 3, 20, 20, 25] inference: 0 [1, 19200+4800+1200, 25] = [bs, anchor_num*grid_w*grid_h, xywh+c+20classes] 1 One tensor list Store three elements [bs, anchor_num, grid_w, grid_h, xywh+c+20classes] [1, 3, 80, 80, 25] [1, 3, 40, 40, 25] [1, 3, 20, 20, 25] """
# Free transmission , So we need to debug Only after two times can we jump here to deal with it
z = [] # inference output
for i in range(self.nl):
# (b,c,h,w -> b,nc+5,h,w) The width and height remain the same , take 3 Feature layer channel Unified
x[i] = self.m[i](x[i]) # conv
bs, _, ny, nx = x[i].shape
# (b,c,h,w) -> (b,3,nc+5,h,w) -> (b,3,h,w,nc+xywh+conf)
# eg: x(bs,255,20,20) -> x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
# self.training Belong to the father class nn.Module A variable of
# model.train() Call to self.training = True; model.eval() Call to self.training = False
if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic:
self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
# sigmoid Control value range , about xywh All done. sigmoid
y = x[i].sigmoid()
# Choose direct inplace substitution , Or re splice the output , Here is yolov5 The core code of the regression mechanism
if self.inplace:
# xy Coordinate regression prediction : bx = 2σ(tx) - 0.5 + cx | by = 2σ(ty) - 0.5 + cy
# box center Of x,y Is multiplied by 2 And minus 0.5, So the value range here is from yolov3 Inside (0,1) Open range , Turned into (-0.5,1.5)
# On the surface, it means yolov5 It can be predicted across half a grid , This will improve the accuracy of bbox The recall of .
# Another advantage is that it also solves yolov3 Because sigmoid The problem that the center cannot reach the boundary due to the open interval
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
# wh Width height regression prediction : bw = pw(2σ(tw))^2 | bh = ph(2σ(th))^2
# The range of values is based on anchor Wide and high (0, +∞) Turned into (0, 4), The predicted box range is more accurate , adopt sigmoid Constraints make the regression box scale size more reasonable
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
else: # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953 ( similar )
# Different prediction feature layer scales are different , You need to multiply different coefficients to return to the original size
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
# Different prediction feature layers are used anchor Different sizes , The predicted target size is different , You need to multiply by the relative to the feature point anchor size
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
# again concat Splice together
y = torch.cat((xy, wh, y[..., 4:]), -1)
# z It's a tensor list Three elements Namely [1, 19200, 25] [1, 4800, 25] [1, 1200, 25]
z.append(y.view(bs, -1, self.no))
# Train: [1, 3, 80, 80, 25] [1, 3, 40, 40, 25] [1, 3, 20, 20, 25]
# Eval: 0: [1, 19200+4800+1200, 25]
# 1: [1, 3, 80, 80, 25] [1, 3, 40, 40, 25] [1, 3, 20, 20, 25]
return x if self.training else (torch.cat(z, 1), x)
# Construction grid
def _make_grid(self, nx=20, ny=20, i=0):
d = self.anchors[i].device
# yv: tensor([[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], ...])
# xv: tensor([[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], ...])
yv, xv = torch.meshgrid([torch.arange(ny).to(d), torch.arange(nx).to(d)])
# Build feature points of the mesh
# (80,80)&(80,80) -> (80,80,2) -> (1,3,80,80,2) Copy 3 Share
grid = torch.stack((xv, yv), 2).expand((1, self.na, ny, nx, 2)).float()
# To build a grid anchor, Every feature map (h,w) There are 3 Different scales anchor
# stride In order to restore each prediction feature layer anchor Absolute size of , Because the dimensions of each layer are different
# (3,2) -> (1,3,1,1,2) -> (1,3,80,80,2)
anchor_grid = (self.anchors[i].clone() * self.stride[i]) \
.view((1, self.na, 1, 1, 2)).expand((1, self.na, ny, nx, 2)).float()
return grid, anchor_grid
Be careful yolov5 Compared with the regression mechanism of v3 Improved , That is, the core regression prediction code :
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
I explained these lines of code in the comments , Here's another one :YOLOv5 Network details , This blog is very clear . Here is yolov5 Reference resources yolov4 An improvement proposed , To eliminate grid Sensitivity of the mesh .
- yolov3 Of bbox The regression mechanism is shown in the figure below :

- and yolov5 The regression mechanism of is shown in the figure below :

analysis :
- xy Return to : When the center point of the real target is very close to the upper left corner of the grid , The predicted value of the network can be obtained only when it is negative infinity or positive infinity , And this very extreme value network generally can not achieve . To solve this problem , The author scaled the offset from the original
(0, 1)Zoom to(-0.5, 1.5)In this way, the offset predicted by the network can be easily achieved 0 or 1. So the form is changed tobx = 2σ(tx) - 0.5 + cx | by = 2σ(ty) - 0.5 + cy, Introducing a scaling factor scale,yolov5 It can be predicted across half a grid , This will also improve the accuracy of bbox The recall of . - wh Return to : original yolov3-spp The calculation formula of does not limit the predicted target width and height , This may lead to a gradient explosion , Unstable training and other problems . therefore yolov5 The prediction scaling of becomes : ( 2 ∗ w p r e d / h p r e d ) 2 (2*w_{pred}/h_{pred}) ^2 (2∗wpred/hpred)2, The range of values is based on anchor Wide and high
(0,+∞)Turned into(0,4), This makes the predicted box range more accurate , adopt sigmoid constraint , Make the regression box scale size more reasonable .
The second is to build the grid , Assign to each feature point anchor And grid , You need to multiply the step size ( Down sampling rate ) Of . This is because different predicted feature layer scales are different , You need to multiply different coefficients to return to the original size , Different prediction feature layers are used anchor Different sizes , The predicted target size is different , You need to multiply by the relative to the feature point anchor size .
Others can see the notes , I have written quite clearly .
4. Model Class parsing
This module is the building module of the whole model . And yolov5 The author of has written the function of this module very completely , Not only the construction of models , It also extends many functions, such as : Feature Visualization , Print model information 、TTA Reasoning enhancement 、 The fusion Conv+Bn Speed up reasoning ,autoshape function : The model contains pre-processing 、 Reasoning 、 Post processing module ( Preprocessing + Reasoning + nms). The specific analysis is in the comments , Can be viewed directly , about TTA Reasoning enhancement and fusion Conv+Bn Accelerated reasoning these two are yolov5 One of the trick, Then I will introduce them separately , See Bowen :
YOLOv5 Of Tricks | 【Trick3】Test Time Augmentation(TTA)
YOLOv5 Of Tricks | 【Trick4】 Parameter restructuring ( The fusion Conv+BatchNorm2d)
- yolov5 Implementation code :
class Model(nn.Module):
def __init__(self, cfg='yolov5s.yaml', ch=3, nc=None, anchors=None): # model, input channels, number of classes
super().__init__()
# If the direct input is dict There is no need to deal with ; If not, use yaml.safe_load load yaml file
if isinstance(cfg, dict):
self.yaml = cfg # model dict
else: # is *.yaml
import yaml # for torch hub
self.yaml_file = Path(cfg).name
# If there is Chinese in the configuration file , Add when opening encoding Parameters
with open(cfg, errors='ignore') as f:
self.yaml = yaml.safe_load(f) # model dict
# Define model
ch = self.yaml['ch'] = self.yaml.get('ch', ch) # input channels
if nc and nc != self.yaml['nc']:
LOGGER.info(f"Overriding model.yaml nc={
self.yaml['nc']} with nc={
nc}")
self.yaml['nc'] = nc # override yaml value
if anchors:
LOGGER.info(f'Overriding model.yaml anchors with anchors={
anchors}')
self.yaml['anchors'] = round(anchors) # override yaml value
# Create a network model
# self.model: Initialize the entire network model ( Include Detect The layer structure )
# self.save: In all layer structures from It's not equal to -1 The serial number of , Arrange the order side by side [4, 6, 10, 14, 17, 20, 23]
self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch]) # model, savelist
# default class names ['0', '1', '2',..., '19']
self.names = [str(i) for i in range(self.yaml['nc'])] # default names
self.inplace = self.yaml.get('inplace', True)
# Build strides, anchors
# obtain Detect Modular stride( The down sampling rate relative to the input image ) and anchors At present Detect Output feature map Scale of
m = self.model[-1] # Detect()
if isinstance(m, Detect):
s = 256 # 2x min stride
m.inplace = self.inplace
# Pass in a fake image (1,3,356,256) To automatically obtain the down sampling magnification of each feature layer
# Calculate three feature map Down sampling magnification : [256//32, 256//16, 256//8] -> [8, 16, 32]
m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
# Find the relative current feature map Of anchor size Such as [10, 13]/8 -> [1.25, 1.625]
m.anchors /= m.stride.view(-1, 1, 1)
check_anchor_order(m)
self.stride = m.stride
self._initialize_biases() # only run once
# Init weights, biases
initialize_weights(self) # Enter the team bn Layer slightly set , The activation function is set to inplace
self.info() # Print model information
LOGGER.info('')
def forward(self, x, augment=False, profile=False, visualize=False): # debug It also needs a third time to jump in normally
if augment: # use Test Time Augmentation(TTA), If it is opened, the picture will be scale and flip
return self._forward_augment(x) # augmented inference, None
return self._forward_once(x, profile, visualize) # single-scale inference, train
def _forward_augment(self, x):
img_size = x.shape[-2:] # height, width
s = [1, 0.83, 0.67] # scales
f = [None, 3, None] # flips (2-ud Up and down flip, 3-lr about flip)
y = [] # outputs
# This is equivalent to input x Conduct 3 Test data with different parameters can enhance reasoning , Every time the reasoning structure is saved in the list y in
for si, fi in zip(s, f):
# scale_img Zoom picture size
# Through ordinary bilinear interpolation , according to ratio To control the zoom ratio of the picture , Finally through pad 0 Complement to the size of the original drawing
xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max()))
yi = self._forward_once(xi)[0] # forward:torch.Size([1, 25200, 25])
# cv2.imwrite(f'img_{si}.jpg', 255 * xi[0].cpu().numpy().transpose((1, 2, 0))[:, :, ::-1]) # save
# _descale_pred Restore the inference result to the relative size of the original image , For coordinates only xywh:yi[..., :4] Resume
# If f=2, Flip up and down ; If f=3, Flip left and right
yi = self._descale_pred(yi, fi, si, img_size)
y.append(yi) # [b, 25200, 25] / [b, 18207, 25] / [b, 12348, 25]
# Remove the prediction results of the later part of the first layer , Also remove the prediction results of the previous part of the last layer
# [b, 24000, 25] / [b, 18207, 25] / [b, 2940, 25]
# What you sift out may be repeated parts , Increase the running speed ( If you know me well, please tell me )
y = self._clip_augmented(y) # clip augmented tails
return torch.cat(y, 1), None # augmented inference, train
def _forward_once(self, x, profile=False, visualize=False):
# y The list is used to save the intermediate feature map ; dt Used to record the execution of each module 10 The average length of time
y, dt = [], [] # outputs
# Yes sequence The model is traversed , Constantly input x To deal with , When intermediate results need to be saved, they are stored in the list y in
for m in self.model:
# If you only operate on the output of the previous module , You need to extract the directly saved intermediate feature map for operation ,
# It's usually concat Handle , Compare the current layer with the previous one concat Deconvolution ; detect Modules also need to be extracted 3 A feature layer to handle
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
# profile Parameter opening records the average execution of each module 10 The duration of times and flops Bottleneck for analyzing models , Improve the execution speed of the model and reduce the memory occupation
if profile:
self._profile_one_layer(m, x, dt)
# Use the current module to process the feature map
# If it is concat modular : be x Is a list of feature maps , Then it shall be spliced , And then to the next convolution module ;
# If it is C3, Conv And other common modules : be x It is a single characteristic diagram
# If it is detct modular : be x yes 3 A list of characteristic graphs ( The contents returned by training and reasoning are different )
x = m(x) # run
# self.save: Put all layers in the structure from No -1 Write down and sort the values of [4, 6, 10, 14, 17, 20, 23]
y.append(x if m.i in self.save else None) # save output
# Feature Visualization
if visualize:
feature_visualization(x, m.type, m.i, save_dir=visualize)
return x
def _descale_pred(self, p, flips, scale, img_size):
# de-scale predictions following augmented inference (inverse operation)
if self.inplace:
p[..., :4] /= scale # de-scale xywh The coordinates are scaled back to their original size
# f=2, Flip up and down
if flips == 2:
p[..., 1] = img_size[0] - p[..., 1] # de-flip ud
# f=3, Flip left and right
elif flips == 3:
p[..., 0] = img_size[1] - p[..., 0] # de-flip lr
else:
x, y, wh = p[..., 0:1] / scale, p[..., 1:2] / scale, p[..., 2:4] / scale # de-scale
if flips == 2:
y = img_size[0] - y # de-flip ud
elif flips == 3:
x = img_size[1] - x # de-flip lr
p = torch.cat((x, y, wh, p[..., 4:]), -1)
return p
# here y One of them contains 3 List of sub lists , Through the input image x the 3 Transformation of different scales , So I got 3 individual inference structure
# I can't understand much here , But the general thing to do is to filter the results of the first list and the last list
# Remove the prediction results of the later part of the first layer , Also remove the prediction results of the previous part of the last layer , And then the rest concat For a part
def _clip_augmented(self, y):
# Clip YOLOv5 augmented inference tails
nl = self.model[-1].nl # Detect(): number of detection layers (P3-P5)
g = sum(4 ** x for x in range(nl)) # grid points
e = 1 # exclude layer count
i = (y[0].shape[1] // g) * sum(4 ** x for x in range(e)) # indices: (25200 // 21) * 1 = 1200
y[0] = y[0][:, :-i] # large: (1,25200,25) -> (1,24000,25)
i = (y[-1].shape[1] // g) * sum(4 ** (nl - 1 - x) for x in range(e)) # indices: (12348 // 21) * 16 = 9408
y[-1] = y[-1][:, i:] # small: (1,12348,25) -> (1,2940,25)
return y
def _profile_one_layer(self, m, x, dt):
c = isinstance(m, Detect) # is final layer, copy input as inplace fix
# profile The function returns flops And params, [0] Indicates a floating point number
o = thop.profile(m, inputs=(x.copy() if c else x,), verbose=False)[0] / 1E9 * 2 if thop else 0 # FLOPs
t = time_sync()
# View module execution 10 The average length of time
for _ in range(10):
m(x.copy() if c else x)
dt.append((time_sync() - t) * 100)
# Record relevant parameters
if m == self.model[0]:
LOGGER.info(f"{
'time (ms)':>10s} {
'GFLOPs':>10s} {
'params':>10s} {
'module'}")
LOGGER.info(f'{
dt[-1]:10.2f} {
o:10.2f} {
m.np:10.0f} {
m.type}')
# The total time it takes to output at the time of the last detection head
if c:
LOGGER.info(f"{
sum(dt):10.2f} {
'-':>10s} {
'-':>10s} Total")
# Yes Detect() To initialize
def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency
# https://arxiv.org/abs/1708.02002 section 3.3
# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
m = self.model[-1] # Detect() module
for mi, s in zip(m.m, m.stride): # from
b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85)
b.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)
# Print the last... In the model Detect Layer offset bias Information ( You can also choose which layers bias Information )
def _print_biases(self):
m = self.model[-1] # Detect() module
for mi in m.m: # from
b = mi.bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85)
LOGGER.info(
('%6g Conv2d.bias:' + '%10.3g' * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))
# def _print_weights(self):
# for m in self.model.modules():
# if type(m) is Bottleneck:
# LOGGER.info('%10.3g' % (m.w.detach().sigmoid() * 2)) # shortcut weights
# Parameter restructuring : The fusion conv2d + batchnorm2d ( When reasoning, use , It can accelerate the reasoning speed of the model )
def fuse(self): # fuse model Conv2d() + BatchNorm2d() layers
LOGGER.info('Fusing layers... ')
for m in self.model.modules():
# Yes Conv And DWConv( Inherit Conv) The structure of bn Fusion
if isinstance(m, (Conv, DWConv)) and hasattr(m, 'bn'):
# The fusion Conv Module conv And bn layer ( The activation function is not included ), Returns the convolution after parameter fusion
m.conv = fuse_conv_and_bn(m.conv, m.bn) # update conv
# After integration conv The parameter of contains bn Use of , So you can delete bn layer
delattr(m, 'bn') # remove batchnorm
# Because there is no need for bn layer , therefore forward The function needs to be rewritten :
# self.act(self.bn(self.conv(x))) -> self.act(self.conv(x))
m.forward = m.forward_fuse # update forward
self.info()
return self
# Call directly common.py Medium AutoShape modular It is also a module that extends model functions
def autoshape(self): # add AutoShape module
LOGGER.info('Adding AutoShape... ')
# Extend model functionality At this point, the model contains pre-processing 、 Reasoning 、 Post processing module ( Preprocessing + Reasoning + nms)
m = AutoShape(self) # wrap model
copy_attr(m, self, include=('yaml', 'nc', 'hyp', 'names', 'stride'), exclude=()) # copy attributes
return m
# Used on __init__ On the function , call torch_utils.py Next model_info Function to print model information ( Need to set up verbose=True Will print )
def info(self, verbose=False, img_size=640): # print model information
model_info(self, verbose, img_size)
def _apply(self, fn):
# Apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffers
self = super()._apply(fn)
m = self.model[-1] # Detect()
if isinstance(m, Detect):
m.stride = fn(m.stride)
m.grid = list(map(fn, m.grid))
if isinstance(m.anchor_grid, list):
m.anchor_grid = list(map(fn, m.anchor_grid))
return self
Some of the functions here will also call torch_utils To achieve , such as fuse_conv_and_bn To realize the fusion of convolution and batch normalization , as well as scale_img Through bilinear interpolation to achieve image scaling ( Through pad0 To supplement the size of the original drawing ). Here's an extra sticker scale_img This auxiliary function , and fuse_conv_and_bn For an introduction, please see my blog post :YOLOv5 Of Tricks | 【Trick4】 Parameter restructuring ( The fusion Conv+BatchNorm2d)
scale_img function :
# Through ordinary bilinear interpolation , according to ratio To control the zoom ratio of the picture , Finally through pad 0 Complement to the size of the original drawing
def scale_img(img, ratio=1.0, same_shape=False, gs=32): # img(16,3,256,416)
# scales img(bs,3,y,x) by ratio constrained to gs-multiple
if ratio == 1.0:
return img
else:
h, w = img.shape[2:]
s = (int(h * ratio), int(w * ratio)) # new size
img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize
if not same_shape: # pad/crop img
h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean
Reference material :
2. Yolov3-spp series | yolov3spp Positive and negative sample matching
3. Yolov3-spp series | yolov3spp Training verification and testing process
5. Yolo series | Yolov4v5 The model structure matches the positive and negative samples
6. 【YOLOV5-5.x Source code interpretation 】yolo.py
7. Model parameters Params And the amount of calculation Flops The calculation method of
8. YOLOv5 Of Tricks | 【Trick4】 Parameter restructuring ( The fusion Conv+BatchNorm2d)
边栏推荐
- Common interview questions
- Alibaba cloud AI training camp -sql basics 5: window functions, etc
- 【IT】福昕pdf保持工具選擇
- Alibaba cloud AI training camp MySQL foundation 1:
- Youshimu V8 projector opens the "vision" field of high fresh
- Local redis cluster setup
- 模式识别大作业——PCA&Fisher&KNN&Kmeans
- AQS 之 CountdownLatch 源码分析
- Good hazelnut comes from Liaoyang!
- [it] Fuxin PDF Keeping Tool Selection
猜你喜欢
随机推荐
Deque of STL
Product weekly report issue 28 | CSDN editor upgrade, adding the function of inserting existing videos
Basic cluster deployment - kubernetes simple
Yolov5-6.0系列 | yolov5的模块设计
Youshimu V8 projector opens the "vision" field of high fresh
Thinking of reading
Camtasia studio2022 free key serial number installation trial detailed graphic tutorial
PS how to border an image
Kube dns yaml
In 2022, the database audit manufacturer will choose cloud housekeeper! Powerful!
Transaction code qc51 of SAP QM preliminary level creates quality certificate for purchase order
Sonarlint代码规范改造实践及一些想法
Unbutu 安装FFmpeg的两种方法
MySQL one master multi slave configuration centos7 pro test
[series of troubles caused by inaccurate positioning] Part 2: what's wrong with the weak satellite signal
Practice and some ideas on the transformation of sonarlint code specification
Interview process and thread
MRNA factory| quantitative detection of LNP encapsulated RNA content by ribogreen
Gstreamer应用开发实战指南(三)
A few minutes to understand the Flink waterline








