当前位置:网站首页>Convolution operation in convolution neural network CNN
Convolution operation in convolution neural network CNN
2022-07-03 05:49:00 【code bean】
Understand from formula :
From the formula , Convolution is Multiply and add The process of
From this picture , Convolution is the influence of the previous point on the current point . Some people say , Convolution is the persistent consequence of instantaneous behavior .
CNN Convolution operation in
stay CNN In the convolution operation of , In fact, it's not First multiply and then add , Here through convolution kernel , Achieve a goal , It is the influence of surrounding pixels on the central pixels .
That different convolution kernel , The impact on the image is different :
Denoise :
Pick out , Vertical boundary and horizontal boundary
There's someone there , May have asked , Then I want to extract a special feature , How to set this convolution kernel ? In fact, the value of this convolution kernel can be preset randomly , Then optimize through back propagation after training .
in other words , We can pick out different features through different convolution kernels , Then take these features as the input of the full connection layer . The number of features can also be preset , The number of features is equal to the number of convolution kernels .
Here I also found the main purpose of convolution :
At first, our object was thousands of pixels , If we start with a full connection layer , For example, we have 4563 Pixel , That will build the full connection layer as shown in the figure below :
There will be too many w and b Need to adjust , Increase the difficulty of training , So we first need to extract features through convolution , Filter irrelevant information , Then through the full connection layer , Output probability .
Convolution layer
Now let's take a look at the operation process of this convolution :
That or the process is to cover the convolution kernel on the original graph , Multiply and add , Pan every time you complete such an operation . That picture expands the original picture by one circle , This operation is called Padding, The function is as described in the text above .
When translating again , There is another operation ,“ Translate multiple grids at a time ”, Achieve the function of reducing the picture .
This is the stride , It has a similar effect to the following pooling .
Pooling layer
In general Convolutional Neural Networks , The convolution layer is followed by a pooling layer :
Specific role , Please refer to the figure above .
Now look at it from the overall perspective :
Above picture , Although three convolution kernels are written , But it's actually three channels of a convolution kernel (RGB), We should pay great attention to ,
The number of convolution kernels and the type of characteristics are one-to-one correspondence , Not the number of channels of the picture .
As can be seen from the figure below , Finally, the characteristics of the three channels should be summarized , It is still classified as a feature . The dimension of this convolution kernel , I'll leave it to the end .
After pooling, you get a smaller Characteristic matrix , This matrix contains all the data of a feature , For the convenience of subsequent calculation , It will be flattened :
The pooled matrix in the above figure is 13*13=169, After flattening, a piece of data 169 Data .
because , In the figure above, there is only one convolution kernel , So there is only one feature , If there are two characteristics , Flattening should see this effect :
The dimension problem of convolution kernel
1 X Make pictures as input , As a three-dimensional data .Ci Indicates the number of channels ( Such as :RGB), This and convolution kernel Ci Must be equal . The latter two are the width and height of the picture .
2 W Is convolution kernel, which is a four-dimensional ,C0 It represents the number of features , It's also the number of convolution kernels ,Ci It says the number of channels . The latter two are the width and height of the picture .
3 Y It is the output of the feature acquisition part , You'll find that Ci Be missing , That's because no matter how many channels your image is , Finally, they will be added for feature summary , And become one-dimensional . and C0 Is the type of characteristics and W The first dimension of is the same . The latter two are the width and height of the picture .
Here is a case of two characteristics ( Two examples of convolution kernels ):
W0 and W1 Just two convolution kernels ( Figure type , Put the bias term of convolution kernel b Also drawn ), Finally, we get two kinds of feature output .
Another picture , It means the same : You can clearly see the data from n Dimension to m Dimensional changes :
Build convolution layer through program
1 First, we build image input data
import torch
# channels, The channel number
in_channels, out_channels= 3, 10
# The width and height of the picture
width, height = 60, 60
# Convolution kernel size 3*3
kernel_size = 3
# One batch Number of samples in
batch_size = 7
# Build input
input = torch.randn(batch_size,in_channels,width, height)
torch.randn It means that the values of the data itself conform to the normal distribution . resulting input Is a four-dimensional tensor :
torch.Size([7, 3, 60, 60])
2 Build convolution operation
torch.nn.Conv2d Used to build the convolution operation , You can set how to convolute the data after it reaches this layer
Setting up stride and padding
- stride: What is the step size of each sliding of convolution , The default is 1
- padding: Set to add... At all boundaries The value is 0 The size of the margin ( That is to say feature map Add a few laps around the periphery 0 )
At present, our most important thing is to tell Conv2d, How much input , How much output , And the size of the convolution kernel
conv_layer = torch.nn.Conv2d(in_channels,out_channels,
here Conv2d Has helped us build a convolution layer , So take a look at conv_layer.weight The shape of the :
torch.Size([10, 3, 3, 3])
- 10 It means 10 Convolution kernels , Corresponding out_channels Is the output of the channel .
- first 3, It corresponds to the number of channels of the image in_channels
- The last two 3 Is the size of convolution kernel 3*3
3 Finally, look at the output
This step is to use the convolution layer to convolute the input signal ,output Is the output after convolution :
output = conv_layer(input)
Let's see why :[7, 10, 30, 30]
7 yes batch_size This won't change , Come in seven pictures , Come out or 7 Zhang .
10 yes out_channels, This is what our convolution layer stipulates .
30*30 Is the size of the feature image after convolution :
because kernel_size=3 And padding=1 It means that the image size remains unchanged after convolution , however stride The stride is set to 2, So the picture is made of 60*60 It's changed to 30*30
everywhere , Our code and analysis get exactly the same results .
The actual part
Next, let's build a convolutional neural network by ourselves , Its structure is as follows :
Well, actually , We just need to replace the model of the last example in the above article :pytorch Loss function in multi classification _code bean The blog of -CSDN Blog https://blog.csdn.net/songhuangong123/article/details/125502262?spm=1001.2014.3001.5501
In the previous example , A fully connected neural network for direct use , After training, the accuracy rate reaches 97% Stop , This time, let's change it into convolutional neural network , What will happen ?
Replace the model with :
# Change to this network OK 了
class Net2(torch.nn.Module):
def __init__(self):
self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
self.pooling = torch.nn.MaxPool2d(2)
self.fc = torch.nn.Linear(320, 10)
def forward(self, x):
batch_size = x.size(0)
x = F.relu(self.pooling(self.conv1(x)))
x = F.relu(self.pooling(self.conv2(x)))
x = x.view(batch_size, -1) # flatten
# The last layer is not activated , No nonlinear transformation
x = self.fc(x)
return x
This code corresponds to the above figure one by one , First convolution and then pooling, first convolution and then pooling , Finally through view Function to flatten .
Here are a few questions :
1 torch.nn.Linear(320, 10) Why is the input of full connection here 320?
Because convolution finally becomes (batch_size,20,20,4), 20*20*4 Namely 320, The final output points are 320*batch_size
batch_size It's the number of pictures , Each of its sub elements is a picture , So when flattening , Still need to fix a dimension , That's it batch_size: x = x.view(batch_size, -1)
In the end, I put this x to : torch.nn.Linear(320, 10), The last point is pytorch The functions in are all functions dealing with matrices , therefore batch_size This dimension cannot be lost , Just as we defined the input data before , This must be two-dimensional :
# Note that this must be written as a two-dimensional matrix
x_data = torch.Tensor([[1.0], [2.0], [3.0]])
y_data = torch.Tensor([[0], [0], [1]])
2 The last layer is not activated , No nonlinear transformation
Because we chose the loss function of cross entropy , There's a softmax Made nonlinear changes .
Finally, the complete code :
import torch
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.optim as optim
import torch.nn.functional as F
# Prepare the dataset
batch_size = 64
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_dataset = datasets.MNIST(root='./dataset/mnist/', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_dataset = datasets.MNIST(root='./dataset/mnist/', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)
# Construct a network model
class Net(torch.nn.Module):
def __init__(self):
self.l1 = torch.nn.Linear(784, 512)
self.l2 = torch.nn.Linear(512, 256)
self.l3 = torch.nn.Linear(256, 128)
self.l4 = torch.nn.Linear(128, 64)
self.l5 = torch.nn.Linear(64, 10)
def forward(self, x):
# take C*W*H The three-dimensional tensor becomes the two-dimensional tensor , For deep learning processing
x = x.view(-1, 784)
x = F.relu(self.l1(x))
x = F.relu(self.l2(x))
x = F.relu(self.l3(x))
x = F.relu(self.l4(x))
# The last layer is not activated , No nonlinear transformation
return self.l5(x)
# Change to this network OK 了
class Net2(torch.nn.Module):
def __init__(self):
self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
self.pooling = torch.nn.MaxPool2d(2)
self.fc = torch.nn.Linear(320, 10)
def forward(self, x):
batch_size = x.size(0)
x = F.relu(self.pooling(self.conv1(x)))
x = F.relu(self.pooling(self.conv2(x)))
x = x.view(batch_size, -1) # flatten
# The last layer is not activated , No nonlinear transformation
x = self.fc(x)
return x
model = Net2()
# Construct loss function and optimizer
criterion = torch.nn.CrossEntropyLoss() # This function , An inactive input is required , It will Cross entropy and softmax The calculation of .( This calculation is faster and more stable !)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) # momentum: impulse
def train(epoch):
running_loss = 0
for batch_idx, data in enumerate(train_loader, 0):
# Get the input and label of a batch
inputs, target = data
# Start training
# Positive communication
y_pred = model(inputs)
# Calculate the loss
loss = criterion(y_pred, target)
# Back propagation
# Update gradient
running_loss = running_loss + loss
if batch_idx % 300 == 299:
print('[%d, %5d] loss: %.3f' % (epoch + 1, batch_idx + 1, running_loss / 300))
running_loss = 0.0
def my_test():
correct = 0
total = 0
# Don't calculate the gradient
with torch.no_grad():
for data in test_loader:
inputs, labels = data
prec = model(inputs)
torch.max(input, dim) function
Input :
input yes softmax One of the outputs of the function tensor
dim yes max The dimension of the functional index 0/1,0 Is the maximum per column ,1 Is the maximum per line
Output :
The function returns two tensor, first tensor Is the maximum per line ,softmax The largest of the outputs is 1,
So the first one tensor It's all. 1 Of tensor; the second tensor Is the index of the maximum value per row , The value of this index is exactly equal to the predicted number .
_, predicted = torch.max(prec.data, dim=1) # predicated Dimensionality (784,1) Tensor
total += labels.size(0)
# Comparison between tensors
correct += (predicted == labels).sum().item()
print('accuracy on test set: %d %% ' % (100 * correct / total))
if __name__ == "__main__":
for epoch in range(10): # After each round of training , Predict once
[1, 300] loss: 0.689
[1, 600] loss: 0.214
[1, 900] loss: 0.135
accuracy on test set: 96 %
[2, 300] loss: 0.115
[2, 600] loss: 0.105
[2, 900] loss: 0.084
accuracy on test set: 97 %
[3, 300] loss: 0.081
[3, 600] loss: 0.080
[3, 900] loss: 0.065
accuracy on test set: 98 %
[4, 300] loss: 0.069
[4, 600] loss: 0.061
[4, 900] loss: 0.059
accuracy on test set: 98 %
[5, 300] loss: 0.057
[5, 600] loss: 0.050
[5, 900] loss: 0.055
accuracy on test set: 98 %
[6, 300] loss: 0.050
[6, 600] loss: 0.048
[6, 900] loss: 0.048
accuracy on test set: 98 %
[7, 300] loss: 0.046
[7, 600] loss: 0.045
[7, 900] loss: 0.043
accuracy on test set: 98 %
[8, 300] loss: 0.038
[8, 600] loss: 0.045
[8, 900] loss: 0.039
accuracy on test set: 98 %
[9, 300] loss: 0.040
[9, 600] loss: 0.039
[9, 900] loss: 0.035
accuracy on test set: 98 %
[10, 300] loss: 0.037
[10, 600] loss: 0.035
[10, 900] loss: 0.035
accuracy on test set: 98 %
very nice , Accuracy increases 1%
Finally, take a look at the first commercial neural network convolution model as a whole :
The left side of the blue line in the figure , It is the feature extraction part , Including convolution and pooling , On the right is the full connection layer , To classify .
It feels similar to what we built ~~~
The videos in resources are very classic ( The second one can be ignored ), You can watch it over and over again .
Reference material :
- [teacher Zhao Yuqiang] index in mongodb (Part 2)
- 期末复习DAY8
- Capacity expansion mechanism of map
- The server data is all gone! Thinking caused by a RAID5 crash
- Solve the problem of automatic disconnection of SecureCRT timeout connection
- Altaro requirements for starting from backup on Hyper-V
- Installation du plug - in CAD et chargement automatique DLL, Arx
- [teacher Zhao Yuqiang] calculate aggregation using MapReduce in mongodb
- Final review Day8
- 32GB Jetson Orin SOM 不能刷机问题排查
Why is the website slow to open?
Mapbox tasting value cloud animation
[trivia of two-dimensional array application] | [simple version] [detailed steps + code]
pytorch 多分类中的损失函数
Solve the problem of automatic disconnection of SecureCRT timeout connection
Capacity expansion mechanism of map
pytorch DataLoader实现miniBatch(未完成)
今天很多 CTO 都是被干掉的,因为他没有成就业务
Redis cannot connect remotely.
If function of MySQL
Common exceptions when Jenkins is released (continuous update...)
JS implements the problem of closing the current child window and refreshing the parent window
2022.7.2 模拟赛
[function explanation (Part 2)] | [function declaration and definition + function recursion] key analysis + code diagram
Detailed explanation of iptables (1): iptables concept
How does win7 solve the problem that telnet is not an internal or external command
Error 1045 (28000) occurs when Linux logs in MySQL: access denied for user 'root' @ 'localhost' (using password: yes)
How to set up altaro offsite server for replication
理解 YOLOV1 第一篇 预测阶段
pytorch DataLoader实现miniBatch(未完成)
[Shangshui Shuo series together] day 10
How do I migrate my altaro VM backup configuration to another machine?
Redhat7 system root user password cracking
Export the altaro event log to a text file