当前位置：网站首页>CNN convolutional neural network

CNN convolutional neural network

2022-07-29 06:47:00 【yc_ ZZ】

One 、 Application scenarios

In the field of computer vision

Image detection

Insert picture description here

Image classification and retrieval

retrieval ： Enter a product , Return similar items . Similar to Taobao map
Insert picture description here

Super resolution reconstruction

Make the image clearer
Insert picture description here

Medical tasks

Cell detection
Font recognition, etc
Insert picture description here
...

Two 、 Problem introduction

Why not use fully connected neural network to process images , To solve the problem of image data 2 A thought , introduce CNN

Too many parameters

A neuron in the full connection layer needs to connect each component of the input data to itself ,
For images
Insert picture description here

Insert picture description here
When there are many parameters , The ability of model expression is very strong, which is easy to lead to Over fitting , There are very good results for training data , But for the test data can not have good results . Therefore, ordinary fully connected neural network cannot meet the requirements of image processing

solve ：

1、 Local connection

Images have Strong regional , Each part has little to do with the rest , So change the full connection to local connection , Reduce the amount of parameters
Insert picture description here

There are 100 individual 10*10 A small range of ,100 A small range, each with 10^6 Two neurons are connected , So weight parameters w1,w2…wn altogether 100 ×10 ^ 6 individual

Insert picture description here

2、 Parameters of the Shared

The image features have nothing to do with position , The same parameters are used for the local connection of each neural unit
that w1=w2=…=wn, Parameter only 100 individual
Insert picture description here

3、 ... and 、 Network details

Insert picture description here
The input is 3D data , Directly extract the image feature vector
A two-dimensional -> The three dimensional

Insert picture description here

1、 Input layer

Three dimensions of image
（height、weight、C（RGB The channel number ））
Usually , We think of the image RGB The number of channels has three dimensions of red, green and blue , So the image RGB The number of channels is usually 3

2、 Convolution layer

use ： Image features are extracted by convolution kernel to get feature map

Convolution kernel

Convolution kernel is the weight parameter of local connection
Convolution kernel is left to right on the input image , Slide from top to bottom （ The idea of local connection 、 The idea of parameter sharing ）
Dynamic graph understanding
Output size= Input size- Convolution kernel size+1(3=5-3+1)

Insert picture description here

step

Small step size ： Fine grained feature extraction
Step by step ： Coarse grained feature extraction （ Text tasks are often used ）
Insert picture description here

Insert picture description here

padding

The output of convolution kernel becomes smaller relative to the input after the image is crossed ,CNN More than one convolution , If the convolution layer becomes smaller and smaller , Finally, the size of the image will become 1 Problems arise
padding Make convolution input and output size unchanged

Insert picture description here

Convolution processing multichannel

Images are all ternary channels , We only demonstrated one channel convolution operation in the previous step , The three channels need to be separated add up to
Insert picture description here
Ternary channel pixels are inconsistent , We need to convolute each other , Add the corresponding position and add the offset term to the final result （ Finally, the upper left corner of the feature map 3）
Be careful ： The step size here is set to 2

Insert picture description here
We can have many convolution kernels to extract image features to get feature maps , Make features richer （ For example, in the picture above filterw0 And filterw1 Two green matrices are obtained , Finally stack up ）

Stack convolution

A convolution kernel produces a characteristic graph , Feature extraction once is not enough , We will continue to do convolution based on the characteristic graph
Insert picture description here

Convolution parameter calculation

For example, this is a convolution kernel , His parameters are 3×3×3=27 individual
Insert picture description here

Insert picture description here

3、 Pooling layer

Compress the convoluted characteristic graph （ Down sampling ）, Not every feature is very important . Choose the important ones to stay
Insert picture description here
Here we only talk about maximum pooling , Because the effect is the best

Insert picture description here

Four 、 The overall architecture

Insert picture description here

5、 ... and 、 Code combat

Handwritten digit recognition

import numpy
import torch
from torch import nn
from PIL import Image
import matplotlib.pyplot as plt
import os
from torchvision import datasets, transforms,utils
# Neural network model 
import torch.nn.functional as F
# Optimization function 
import torch.optim as optim
# route 
import os
transform = transforms.Compose([transforms.ToTensor(),
                               transforms.Normalize(mean=[0.5],std=[0.5])])
# Import training data 
train_data = datasets.MNIST(root = "./data/",
                            transform=transform,
                            train = True,
                            download =False)
# Import test data 
test_data = datasets.MNIST(root="./data/",
                           transform = transform,
                           train = False)
# print(len(train_data)) #60000
# print(len(test_data)) #10000

# Pack and put data_loader  Set up batch size , Speed up your training , The basic single in the loader is batch The data of 
train_loader = torch.utils.data.DataLoader(train_data,batch_size=64,
                                          shuffle=True,num_workers=2)
test_loader = torch.utils.data.DataLoader(test_data,batch_size=64,
                                          shuffle=True,num_workers=2)


# print(len(train_loader)) #938
# print(len(test_loader)) #157

# The format of a sample is [data,label], The first to store data , The second storage label 
# oneimg,label=train_data[0] # Here is three-dimensional 
# plt.imshow(oneimg.squeeze(0)) # You need to compress a dimension to draw a pixel matrix into a picture 
# plt.show()

# configure network 
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1=nn.Conv2d(1,32,kernel_size=3,padding=1,stride=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        # Initial size ：28*28*1  Pool once  14*14  Pool secondary  7*7  To quantify  7*7* The channel number （64）
        self.fc1 = nn.Linear(64 * 7 * 7, 1024)  #  Two pools , So it is 7*7 instead of 14*14
        self.fc2 = nn.Linear(1024, 512)
        self.fc3 = nn.Linear(512, 10)
    def forward(self,x):
        x= self.pool(F.relu(self.conv1(x))) # First convolution + Pooling 
        x = self.pool(F.relu(self.conv2(x))) # The second convolution + Pooling 
        x = x.view(-1, 64 * 7 * 7)  #  Flatten the data into one-dimensional 
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net=CNN() # Instantiation 
criterion = nn.CrossEntropyLoss()# Cross entropy loss function 
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # Random gradient descent optimization function 

def train():
    train_accs = []
    train_loss = []
    test_accs = []
    for epoch in range(3):
        running_loss=0.0
        for i,data in enumerate(train_loader,0):
            inputs, labels = data[0], data[1]
            optimizer.zero_grad()

            #  Forward direction + Back + Optimize 
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # loss  Output , One hundred each batch Output , average loss
            running_loss += loss.item()
            if i % 100 == 0:
                print('[%d,%5d] loss :%.3f' %
                      (epoch, i, running_loss / 100))
                running_loss = 0.0
            train_loss.append(loss.item())
            if i % 100==0:
                torch.save(net.state_dict(),"./CNNmodel/model.pkl")
                torch.save(optimizer.state_dict(),"./CNNmodel/optimizer.pkl")

def test():
    loss_list = []
    acc_list = []
    for idx, (input, target) in enumerate(test_loader):
         with torch.no_grad():
            output = net(input)

            cur_loss = criterion(output, target)
            loss_list.append(cur_loss)

            pred = output.max(dim=-1)[-1]  #  Get the maximum value on each line 
            cur_acc = pred.eq(target).float().mean()  #  Accuracy rate 
            acc_list.append(cur_acc)
    print(" Average accuracy , Average loss ", numpy.mean(acc_list), numpy.mean(loss_list))




if __name__ == '__main__':
    train()
    test()