当前位置:网站首页>Understanding weight sharing in convolutional neural networks
Understanding weight sharing in convolutional neural networks
2022-07-26 15:58:00 【Hua Weiyun】
First of all, it introduces the weight sharing implemented by single-layer network. Yuan Li introduces
Simply from share From the perspective of : Weight sharing is filter Value sharing for
Convolution neural network two core ideas :
1. Network local connection (Local Connectivity)
2. Convolution kernel parameter sharing (Parameter Sharing)
A key function of both is to reduce the number of parameters , Make the operation simple 、 Efficient , Can operate on very large data sets .
Let's use the most intuitive diagram , To clarify the role of both .
CNN The right way to open , As shown below 
It can be summed up as : One
The convolution kernel of is scanned on the image , Feature extraction . Usually
,
,
Convolution kernel of is commonly used , If channels by [ The formula ] Words (32,64 Is the number of commonly used channels ), So the total number of parameters is
.
- Don't make parameter sharing
If not parameter sharing Realize the operation of the above figure , The convolution kernel structure will become as shown in the figure below

This is not difficult to find : The number of parameters of convolution kernel is consistent with the size of image pixel matrix , namely 
for example :Inception V3 The input image size of is 192192 Of ,** If the first floor 3332 The convolution kernel of removes parameter sharing , Then the number of parameters will become 192192*32, about 120 All the parameters , It's the original 288 Parameters 50 ten thousandfold .**
- Don't make local connectivity
If local connection is not used , That is, of course, a fully connected network (fully connect), That is, each element unit is fully connected with the neurons in the hidden layer , The network structure is as follows .
At this time, the parameter quantity becomes
, Because the pixel matrix is very large , Therefore, more hidden layer nodes will be selected , At this time, the number of parameters of a single hidden layer usually exceeds 1 Ten million , It makes it difficult for the network to train .
Here are pytorch Weight sharing code for multi-layer networks
import torchimport torch.nn as nnimport randomimport matplotlib.pyplot as plt # draw loss curve def plot_curve(data): fig = plt.figure() plt.plot(range(len(data)), data, color='blue') plt.legend(['value'], loc='upper right') plt.xlabel('step') plt.ylabel('value') plt.show() class DynamicNet(nn.Module): def __init__(self, D_in, H, D_out): super(DynamicNet, self).__init__() self.input_linear = nn.Linear(D_in, H) self.middle_linear = nn.Linear(H, H) self.output_linear = nn.Linear(H, D_out) def forward(self, x): h_relu = self.input_linear(x).clamp(min=0) # Repeated use Middle linear modular for _ in range(random.randint(0, 3)): h_relu = self.middle_linear(h_relu).clamp(min=0) y_pred = self.output_linear(h_relu) return y_pred # N It's batch size ;D It's the input dimension # H Is the hidden layer dimension ;D_out It's the output dimension N, D_in, H, D_out = 64, 1000, 100, 10 # Simulated training data x = torch.randn(N, D_in)y = torch.randn(N, D_out) model = DynamicNet(D_in, H, D_out)criterion = nn.MSELoss(reduction='sum')# It is difficult to train this strange model with ordinary random gradient descent , So we used momentum Method .optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9) loss_list = []for t in range(500): # Forward propagation y_pred = model(x) # Calculate the loss loss = criterion(y_pred, y) loss_list.append(loss.item()) # Zero gradient , Back propagation , Update weights optimizer.zero_grad() loss.backward() optimizer.step() plot_curve(loss_list)边栏推荐
猜你喜欢

潘多拉 IOT 开发板学习(RT-Thread)—— 实验17 ESP8266 实验(学习笔记)

ES6 advanced - query commodity cases

Understand │ XSS attack, SQL injection, CSRF attack, DDoS attack, DNS hijacking

Refuse noise, the entry journey of earphone Xiaobai
.NET 手动获取注入对象

Gcc/g++ and dynamic and static libraries and GDB

Development and implementation of campus epidemic prevention and control management system based on SSM

认识JS基础与浏览器引擎

我们被一个 kong 的性能 bug 折腾了一个通宵

gcc/g++与动静库以及gdb
随机推荐
Can the parameterized view get SQL with different rows according to the characteristics of the incoming parameters? For example, here I want to use the column in the transmission parameter @field
Credit card number recognition (openCV, code analysis)
信用卡数字识别(opencv,代码分析)
04 callable and common auxiliary classes
promise,async-await 和 跨域问题的解决--代理服务器的原理
Delta controller rmc200
Basic specification of component development, localstorage and sessionstorage, object data to basic value, prototype chain use
hawe螺旋插装式单向阀RK4
Kalibr calibration realsensed435i -- multi camera calibration
Tool skill learning (II): pre skills shell
81. (cesium home) cesium modifies the gray background (default blue)
DELTA控制器RMC200
PS + PL heterogeneous multicore case development manual for Ti C6000 tms320c6678 DSP + zynq-7045 (3)
第七章 在 REST 服务中支持 CORS
2022你的安全感是什么?沃尔沃年中问道
.net get injection object manually
我们被一个 kong 的性能 bug 折腾了一个通宵
理解卷积神经网络中的权值共享
German EMG e-anji thruster ed301/6 HS
.NET 手动获取注入对象
