当前位置:网站首页>Pytorch convolution network regularization dropblock
Pytorch convolution network regularization dropblock
2022-07-03 02:11:00 【Hebi tongzj】
Address of thesis :https://arxiv.org/pdf/1810.12890.pdf
Paper Abstract
DropBlock It's something like dropout The easy way to , It is associated with dropout The main difference is , It erases the continuous area from the characteristic map of the layer , Instead of erasing independent random units
Similarly ,DropBlock By randomly zeroing the response of the network , Realize the decoupling between channels , It alleviates the over fitting phenomenon of the network
The pseudocode of this algorithm is as follows :
- x: Characteristics of figure ,shape by [bs, ch, h, w]
- block_size: Erase the size of the continuous area
- γ: The mean value of Bernoulli distribution , Used to select the center point of the erased area
- trainning: Boolean type , That is the train Mode or eval Pattern
def DropBlock(x, block_size, γ, trainning):
if trainning:
# Select the center point of the area to erase
del_mask = bernoulli(x, γ)
# Erase the corresponding area
x = set_zero(x, del_mask, block_size)
# Feature icon standardization
keep_mask = 1 - del_mask
x *= count(x) / count_1(keep_mask)
return x
# eval There is no behavior in mode
return xBut in the process of concrete implementation , There are many details that need to be added

γ The determination of is through keep_prob The parameters are determined ,keep_prob Indicates the activation unit ( That is, the output is greater than 0) The probability of being retained ,feat_size Is the dimension of the characteristic drawing :

Because at the beginning of training , smaller keep_prob It will affect the convergence of the network , So make keep_prob from 1.0 Gradually reduced to 0.9
From the experimental results, we can see ,ResNet-50 In the use of the DropBlock After that, the accuracy of the verification set has been improved

Here are the differences DropBlock Append position 、 Different approaches 、 Different block_size The impact on the accuracy of the validation set :
- Press the line :DropBlock Added in ResNet-50 Of the 4 After group convolution ;DropBlock Added in ResNet-50 Of the 3、 The first 4 After group convolution
- By column : Only add ; In convolution Branch 、 Add ; In convolution Branch 、 Add , And use keep_prob Attenuation method

In the paper , The optimal hyperparameter is block_size = 7, keep_prob = 0.9, But it still needs to be based on Loss Make adjustments to the changes
DropBlock Reappear
In the realization of DropBlock when , There are the following details :
- keep_prob It's dynamic , Make every time eval Update when
- The center point of the erased area is selected in the active unit ( That is, the output is greater than 0), Make 1 To be selected , Use max_pool2d It can realize the selection of continuous areas , To generate del_mask
- Standardization coefficient = Area of original drawing / Reserved area , But calculating the exact value of the reserved area will cost more computational effort , Slow down the speed of online training , So the standardization coefficient is 1 / keep_prob Approximate substitution
class DropBlock(nn.Module):
''' block_size: Erase the size of the area
keep_prob_init: keep_prob The initial value of the
keep_prob_tar: keep_prob The target value
keep_prob_decay: keep_prob Decay rate of '''
def __init__(self, block_size=5, keep_prob_init=1.,
keep_prob_tar=0.9, keep_prob_decay=1e-2):
super(DropBlock, self).__init__()
self.block_size = block_size
assert self.block_size & 1, 'block_size Need to be odd '
# keep_prob Related parameters
self.keep_prob = keep_prob_init
self._keep_prob_tar = keep_prob_tar
self._keep_prob_decay = keep_prob_decay
# The mean value of Bernoulli distribution
self.gamma = None
def forward(self, x):
# In training mode
if self.training:
*bs_ch, height, width = x.shape
square = height * width
# When γ Set for null
if self.gamma is None:
self.gamma = (1 - self.keep_prob) * square / self.block_size ** 2
for f_size in (height, width):
self.gamma /= f_size - self.block_size + 1
# In the activation area , Select the center point of the erased area
del_mask = torch.bernoulli((x > 0) * self.gamma)
keep_mask = 1 - torch.max_pool2d(
del_mask, kernel_size=self.block_size,
stride=1, padding=self.block_size // 2
)
# Feature icon standardization
# gain = square / keep_mask.view(*bs_ch, -1).sum(2).view(*bs_ch, 1, 1)
return keep_mask * x / self.keep_prob
# In verification mode , Update parameters
self.keep_prob = max([
self._keep_prob_tar,
self.keep_prob * (1 - self._keep_prob_decay)
])
self.gamma = None
return xCode testing
# Using grayscale images , Set the pixels with low brightness to 0
image = cv.imread('YouXiZi.jpg')
mask = cv.cvtColor(image, cv.COLOR_BGR2GRAY) > 100
for i in range(3):
image[..., i] *= mask
cv.imshow('debug', image)
cv.waitKey(0)
# Turn into tensor, Use DropBlock
tensor = tf.ToTensor()(image)
db = DropBlock(block_size=31, keep_prob_init=0.9)
image = db(tensor.unsqueeze(0))[0]
image = image.permute(1, 2, 0).data.numpy()
cv.imshow('debug', image)
cv.waitKey(0)Use the gray image to set the pixels with dark brightness to zero , The bright area is the active unit

The center point of the erased area appears in the bright area , And the brightness of the image is higher than that of the original image ( Standardization coefficient > 1)

边栏推荐
- y54.第三章 Kubernetes从入门到精通 -- ingress(二七)
- Ni visa fails after LabVIEW installs the third-party visa software
- Deep learning notes (constantly updating...)
- 返回一个树形结构数据
- stm32F407-------ADC
- Custom components, using NPM packages, global data sharing, subcontracting
- MySQL学习03
- 力扣(LeetCode)183. 从不订购的客户(2022.07.02)
- Modify table structure
- [fluent] fluent debugging (debug debugging window | viewing mobile phone log information | setting normal breakpoints | setting expression breakpoints)
猜你喜欢

stm32F407-------IIC通讯协议

Huakaiyun (Zhiyin) | virtual host: what is a virtual host
![[camera topic] how to save OTP data in user-defined nodes](/img/3e/b76c4d6ef9ab5f5b4326a3a8aa1c4f.png)
[camera topic] how to save OTP data in user-defined nodes

A 30-year-old software tester, who has been unemployed for 4 months, is confused and doesn't know what to do?
![[shutter] shutter debugging (debugging fallback function | debug method of viewing variables in debugging | console information)](/img/66/0fda43da0d36fc0c9277ca86ece252.jpg)
[shutter] shutter debugging (debugging fallback function | debug method of viewing variables in debugging | console information)

MySQL学习03

MySQL learning 03

Deep learning notes (constantly updating...)

How to deal with cache hot key in redis

【Camera专题】Camera dtsi 完全解析
随机推荐
How do browsers render pages?
【Camera专题】Camera dtsi 完全解析
[Yu Yue education] reference materials of love psychology of China University of mining and technology
[shutter] shutter debugging (debugging control related functions | breakpoint management | code operation control)
Introduce in detail how to communicate with Huawei cloud IOT through mqtt protocol
CFdiv2-Fixed Point Guessing-(區間答案二分)
通达OA v12流程中心
In the face of difficult SQL requirements, HQL is not afraid
Distributed transaction solution
iptables 4层转发
单词单词单词
各国Web3现状与未来
Solution for processing overtime orders (Overtime unpaid)
Performance test | script template sorting, tool sorting and result analysis
Exception handling in kotlin process
In 2022, 95% of the three most common misunderstandings in software testing were recruited. Are you that 5%?
Query product cases - page rendering data
转载收录6.5大侠写的部分Qt开发经验
全链路数字化转型下,零售企业如何打开第二增长曲线
What are the key points often asked in the redis interview