当前位置:网站首页>Pytorch convolution network regularization dropblock
Pytorch convolution network regularization dropblock
2022-07-03 02:11:00 【Hebi tongzj】
Address of thesis :https://arxiv.org/pdf/1810.12890.pdf
Paper Abstract
DropBlock It's something like dropout The easy way to , It is associated with dropout The main difference is , It erases the continuous area from the characteristic map of the layer , Instead of erasing independent random units
Similarly ,DropBlock By randomly zeroing the response of the network , Realize the decoupling between channels , It alleviates the over fitting phenomenon of the network
The pseudocode of this algorithm is as follows :
- x: Characteristics of figure ,shape by [bs, ch, h, w]
- block_size: Erase the size of the continuous area
- γ: The mean value of Bernoulli distribution , Used to select the center point of the erased area
- trainning: Boolean type , That is the train Mode or eval Pattern
def DropBlock(x, block_size, γ, trainning):
if trainning:
# Select the center point of the area to erase
del_mask = bernoulli(x, γ)
# Erase the corresponding area
x = set_zero(x, del_mask, block_size)
# Feature icon standardization
keep_mask = 1 - del_mask
x *= count(x) / count_1(keep_mask)
return x
# eval There is no behavior in mode
return x
But in the process of concrete implementation , There are many details that need to be added
γ The determination of is through keep_prob The parameters are determined ,keep_prob Indicates the activation unit ( That is, the output is greater than 0) The probability of being retained ,feat_size Is the dimension of the characteristic drawing :
Because at the beginning of training , smaller keep_prob It will affect the convergence of the network , So make keep_prob from 1.0 Gradually reduced to 0.9
From the experimental results, we can see ,ResNet-50 In the use of the DropBlock After that, the accuracy of the verification set has been improved
Here are the differences DropBlock Append position 、 Different approaches 、 Different block_size The impact on the accuracy of the validation set :
- Press the line :DropBlock Added in ResNet-50 Of the 4 After group convolution ;DropBlock Added in ResNet-50 Of the 3、 The first 4 After group convolution
- By column : Only add ; In convolution Branch 、 Add ; In convolution Branch 、 Add , And use keep_prob Attenuation method
In the paper , The optimal hyperparameter is block_size = 7, keep_prob = 0.9, But it still needs to be based on Loss Make adjustments to the changes
DropBlock Reappear
In the realization of DropBlock when , There are the following details :
- keep_prob It's dynamic , Make every time eval Update when
- The center point of the erased area is selected in the active unit ( That is, the output is greater than 0), Make 1 To be selected , Use max_pool2d It can realize the selection of continuous areas , To generate del_mask
- Standardization coefficient = Area of original drawing / Reserved area , But calculating the exact value of the reserved area will cost more computational effort , Slow down the speed of online training , So the standardization coefficient is 1 / keep_prob Approximate substitution
class DropBlock(nn.Module):
''' block_size: Erase the size of the area
keep_prob_init: keep_prob The initial value of the
keep_prob_tar: keep_prob The target value
keep_prob_decay: keep_prob Decay rate of '''
def __init__(self, block_size=5, keep_prob_init=1.,
keep_prob_tar=0.9, keep_prob_decay=1e-2):
super(DropBlock, self).__init__()
self.block_size = block_size
assert self.block_size & 1, 'block_size Need to be odd '
# keep_prob Related parameters
self.keep_prob = keep_prob_init
self._keep_prob_tar = keep_prob_tar
self._keep_prob_decay = keep_prob_decay
# The mean value of Bernoulli distribution
self.gamma = None
def forward(self, x):
# In training mode
if self.training:
*bs_ch, height, width = x.shape
square = height * width
# When γ Set for null
if self.gamma is None:
self.gamma = (1 - self.keep_prob) * square / self.block_size ** 2
for f_size in (height, width):
self.gamma /= f_size - self.block_size + 1
# In the activation area , Select the center point of the erased area
del_mask = torch.bernoulli((x > 0) * self.gamma)
keep_mask = 1 - torch.max_pool2d(
del_mask, kernel_size=self.block_size,
stride=1, padding=self.block_size // 2
)
# Feature icon standardization
# gain = square / keep_mask.view(*bs_ch, -1).sum(2).view(*bs_ch, 1, 1)
return keep_mask * x / self.keep_prob
# In verification mode , Update parameters
self.keep_prob = max([
self._keep_prob_tar,
self.keep_prob * (1 - self._keep_prob_decay)
])
self.gamma = None
return x
Code testing
# Using grayscale images , Set the pixels with low brightness to 0
image = cv.imread('YouXiZi.jpg')
mask = cv.cvtColor(image, cv.COLOR_BGR2GRAY) > 100
for i in range(3):
image[..., i] *= mask
cv.imshow('debug', image)
cv.waitKey(0)
# Turn into tensor, Use DropBlock
tensor = tf.ToTensor()(image)
db = DropBlock(block_size=31, keep_prob_init=0.9)
image = db(tensor.unsqueeze(0))[0]
image = image.permute(1, 2, 0).data.numpy()
cv.imshow('debug', image)
cv.waitKey(0)
Use the gray image to set the pixels with dark brightness to zero , The bright area is the active unit
The center point of the erased area appears in the bright area , And the brightness of the image is higher than that of the original image ( Standardization coefficient > 1)
边栏推荐
- Reprint some Qt development experience written by great Xia 6.5
- Deep learning notes (constantly updating...)
- [shutter] bottom navigation bar implementation (bottomnavigationbar bottom navigation bar | bottomnavigationbaritem navigation bar entry | pageview)
- Trial setup and use of idea GoLand development tool
- 502 (bad gateway) causes and Solutions
- Summary of ES6 filter() array filtering methods
- Redis: simple use of redis
- Leetcode 183 Customers who never order (2022.07.02)
- What are the key points often asked in the redis interview
- RestCloud ETL 跨库数据聚合运算
猜你喜欢
微信小程序开发工具 POST net::ERR_PROXY_CONNECTION_FAILED 代理问题
机器学习笔记(持续更新中。。。)
MySQL学习03
详细些介绍如何通过MQTT协议和华为云物联网进行通信
[shutter] shutter debugging (debugging fallback function | debug method of viewing variables in debugging | console information)
Hard core observation 547 large neural network may be beginning to become aware?
[camera topic] complete analysis of camera dtsi
树形结构数据的处理
微信小程序開發工具 POST net::ERR_PROXY_CONNECTION_FAILED 代理問題
[camera topic] how to save OTP data in user-defined nodes
随机推荐
[Yu Yue education] reference materials of love psychology of China University of mining and technology
[Flutter] dart: class;abstract class;factory;类、抽象类、工厂构造函数
[fluent] fluent debugging (debug debugging window | viewing mobile phone log information | setting normal breakpoints | setting expression breakpoints)
stm32F407-------ADC
[camera topic] how to save OTP data in user-defined nodes
[leetcode] 797 and 1189 (basis of graph theory)
【Camera专题】HAL层-addChannel和startChannel简析
Servlet中数据传到JSP页面使用el表达式${}无法显示问题
Exception handling in kotlin process
[Yu Yue education] China Ocean University job search OMG reference
Kotlin middle process understanding and Practice (I)
How do browsers render pages?
小程序開發的部分功能
Recommendation letter of "listing situation" -- courage is the most valuable
Wechat applet development tool post net:: err_ PROXY_ CONNECTION_ Failed agent problem
y54.第三章 Kubernetes从入门到精通 -- ingress(二七)
How to refresh the opening amount of Oracle ERP
Where is the future of test engineers? Confused to see
Cfdiv2 fixed point guessing- (interval answer two points)
[shutter] hero animation (hero realizes radial animation | hero component createrecttween setting)