当前位置:网站首页>Convolutional neural network (CNN) explanation and tensorflow2 code implementation

Convolutional neural network (CNN) explanation and tensorflow2 code implementation

2022-06-26 21:33:00 Romantic data analysis

Convolutional neural network sounds scary , This article explains... In an easy to understand way . Everyone can understand .


What is convolution

Convolution neural network is the traditional neural network using matrix convolution technology .
Two dimensional linear convolution :
Matrix example :
 Insert picture description here

Part of the content is excerpted from this article
Now there is a picture ( The following figure on the left ) And a kernel nucleus ( In the middle of the picture below ). The result on the right of the figure below can be obtained by convolution .
We know , The picture is actually a huge numerical matrix , That's what we often call pixels . A gray image is a huge two-dimensional matrix , Each element in the matrix represents the degree of black and white , It can be understood as a mathematical matrix .
 Insert picture description here

Two dimensional convolution : Get the first value from the two-dimensional matrix , That is, the red box in the figure 1,1 The position of the upper convolution kernel kernel Center position , That is, the red box in the figure ( The convolution kernel is a matrix ), The value of the corresponding position is multiplied by the matrix element and finally summed , The resulting value is the new value -8( See the matrix convolution operation above for the calculation method ), And so on , Get new values for all positions , Finally, fill in the outermost layer of the picture 0 That's all right. . The size of two-dimensional convolution depends on the convolution kernel , The size of convolution kernel is usually odd ( example 3×3, 5×5), It means that the pixel value depends on the surrounding pixel value , The weight depends on the convolution kernel , Extract specific features of the region .


One 、 Convolution neural network introduction

Convolutional neural networks (CNN) It can be used to make the machine visualize things and perform such as image classification , Image recognition , Object detection , Instance segmentation and other tasks . This is a CNN The most common areas , Such as handwriting recognition .

Convolution layer – Extract local picture features

 Insert picture description here
The picture has RGB Three color channels , So the input is 3 layer , because 3 Input channels ( Red R, green G And blue B), Any image we see is 3 Channel image . It can be understood as 3 layer , So the convolution kernel is also 3 layer , It's equivalent to two 3*3 The magic cube of 9 Multiply elements , Last 9 Add the products . therefore , Convolution kernel is equivalent to using 3 Layer 2D ( Length and width ) filter , The number of channels in the image ( The layer number ) Same number of layers as filter .
And 2D The convolution operation is similar to , We will slide the filter horizontally . Every time you move the filter , We will all get three channels of the whole picture (3 layer ) Weighted average of , namely RGB Weighted neighborhood of values . Since we only slide the kernel in two dimensions - From left to right , From top to bottom , The output of this operation will be 2D Output .
Suppose we have a size of 7x7 Of 2D Input , And it is being applied to the image from the upper left corner of the image 3x3 Filter of . When we go from left to right , When sliding the kernel over the image from top to bottom , Obviously , The output is less than the input , namely 5x5.
 Insert picture description here
What if we want the output to be the same size as the input ?
If the size of the original input is 7x7, We also want the output size to be 7x7. therefore , In that case , What we can do is to add an artificial fill around the input evenly ( Value is zero ), So we can filter K(3x3) Place on image pixels , And calculate the weighted average of neighbors .
A convolution kernel is to extract a feature , therefore , In order to fully extract the features of the picture , We need multiple convolution check images for feature extraction , This is called the depth of the convolution kernel , The result is multiple 2D Output , Stack together , There are multiple levels of output . Pictured :
 Insert picture description here
Understand this diagram , To understand the architecture of the last convolutional neural network . Multilayer convolution will increase the number of layers of this color .

expand –padding, Keep the length and width of the convoluted image unchanged

By adding a circle around the input ( zero ) This artificial filling , We can keep the output shape the same as the input . If we had a bigger filter K(5x5), Then the number of zero fills we need to apply will also increase , So we can keep the same output size . In the process , The output size is the same as the output size , So it's called Padding. See this link for the original text
 Insert picture description here

Pooling layer — Reduce dimension , Reduce model complexity and computation

Obtained the characteristic diagram , Usually we will execute a program called Pooling operation The operation of . Because the number of hidden layers required to learn the complex relationships in the image will be very large . We apply pooling operations to reduce the representation of input features , Thus reducing the computing power required by the network .
Once the input feature map is obtained , We will apply shape determining filters to the feature graph , To obtain the maximum value from this part of the characteristic graph . This is called maximum pooling . This is also called subsampling , Because from the entire part of the feature map covered by the kernel , We are sampling a maximum value .
 Insert picture description here
 Insert picture description here

flatten Flattening – Turn multidimensional data into a huge one-dimensional vector

We get several pink convolution results , It's a multidimensional , As shown in the figure at the end of the convolution section .
But our prediction is one-dimensional , For example, two categories , No 0 Namely 1, How can multidimensional data get one-dimensional output ?
It's easy , Spread out all multidimensional data , Into a one-dimensional array , Just like you put a lot of magic squares , One by one , In a row . The cube is multidimensional , You just put multiple multidimensional arrays , Is it a one-dimensional array ?
 Insert picture description here
 Insert picture description here

Fully connected layer – Output results

Once we have performed a series of convolution and pooling operation ( Maximum consolidation or average consolidation , Also called down sampling ). We flatten the output of the final pooled layer to a vector , And pass it through the full connection layer with different number of hidden layers ( Feedforward neural networks ) Pass on , Finally, it is fitted by multi-layer depth neural network .
Last , The output of the fully connected layer will pass through the required size Softmax layer .Softmax The vector of the layer output probability distribution , This helps to perform image classification tasks . In the digital recognizer problem ( As shown above ) in , Output softmax Layers have 10 Neurons , The input can be classified as 10 One of the categories (0–9 A digital ). If it is a binary problem , be Softmax Layer is 2 Neurons , Output, respectively, 0,1, So the last Softmax The layer is determined according to how many classes the final result needs to be divided into .
 Insert picture description here
If it is 2 classification , The last one softmax Achievement is only two neurons , Express 2 Class output . Insert picture description here

Two 、TensorFlow2 Code implementation

1. Import data

We use it TensorFlow2 Self contained mnist Test handwritten 0-9 Numbers , Then decide which number he wrote .
Import data to , Create a new one MNISTLoader Class .
The code is as follows ( Name it testData.py):

import numpy as np
import tensorflow as tf

class MNISTLoader():
    def __init__(self):
        mnist = tf.keras.datasets.mnist
        (self.train_data,self.train_label),(self.test_data,self.test_label) = mnist.load_data()
        # MNIST The image in defaults to uint8(0-255 The number of ). The following code normalizes it to 0-1 The floating point number between , And add one dimension as the color channel at the end RGB, If there is no such dimension, it is a gray-scale image , No color .
        self.train_data = np.expand_dims(self.train_data.astype(np.float)/255.0,axis=-1)  # [60000, 28, 28, 1]
        self.test_data = np.expand_dims(self.test_data.astype(np.float32) / 255.0, axis=-1)        # [10000, 28, 28, 1]
        self.train_label = self.train_label.astype(np.int32)    # [60000]
        self.test_label = self.test_label.astype(np.int32)      # [10000]
        self.num_train_data, self.num_test_data = self.train_data.shape[0], self.test_data.shape[0]   #60000,10000

    def get_batch(self, batch_size):
        #  Random fetch from dataset batch_size Elements and return 
        index = np.random.randint(0, self.num_train_data, batch_size)  # You can retrieve a piece of data repeatedly 
        return self.train_data[index, :], self.train_label[index]

# mnist = MNISTLoader()
# batch_size = 1
# train_data,train_label = mnist.get_batch(batch_size)
# print(train_data*255)
# print(train_label)
# print(train_data[0,:,1])

2. use TensorFlow2 Construct a CNN The Internet

The code structure is as follows :
1、 Define super parameters
2、 It depends on the model structure
3、 Train the model
4、 Predict the test set and test the accuracy

import numpy as np
import tensorflow as tf
from testData import *
import time

class CNN(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.conv1= tf.keras.layers.Conv2D(
            filters=32, # The number of convolution kernels 32, extract 32 Whitman's sign 
            kernel_size=[5,5], # Sensory field , Length and width of convolution kernel 
            padding='same', #padding  Strategy  (vaild、same)
            activation= tf.nn.relu # Activation function 

        )
        self.pool1 = tf.keras.layers.MaxPool2D(pool_size=[2,2],strides=2) # The pool layer is generally 2X2 matrix 
        self.conv2 = tf.keras.layers.Conv2D(
            filters=64,
            kernel_size=[5,5],
            padding='same',
            activation=tf.nn.relu
        )
        self.pool2 = tf.keras.layers.MaxPool2D(pool_size=[2,2],strides=2) # The pool layer is generally 2X2 matrix 
        self.flatten = tf.keras.layers.Reshape(target_shape=(7*7*64,))  # Flatten the two-dimensional matrix to 1 dimension 
        self.dense1 = tf.keras.layers.Dense(units=1024,activation=tf.nn.relu) # The first floor is fully connected ,1024 Neurons 
        self.dense2 = tf.keras.layers.Dense(units=10) # The last layer is the fully connected layer , The activation function uses softmax, The number of neurons is classified 

    def call(self,inputs):
        x = self.conv1(inputs) # Through the first convolution 
        x = self.pool1(x)  # Through the first pool layer , Down sampling 
        x = self.conv2(x)  # Through the second accretion layer 
        x = self.pool2(x)  # Through the second pool layer , Down sampling 
        x = self.flatten(x)  # Flatten the intermediate result into a large one-dimensional vector 
        x = self.dense1(x)   # Through the first full connection layer 
        x = self.dense2(x)   # The result is a second fully connected layer , And the last floor , It's called softmax layer 
        output = tf.nn.softmax(x)
        return output

# Main control procedure , Call the data and train the model 
# Define super parameters 
num_epochs = 5  # Number of repetitions per element 
batch_size = 50
learning_rate = 0.001

print('now begin the train, time is ')
print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime()))
model = CNN()
data_loader = MNISTLoader()
optimier = tf.keras.optimizers.Adam(learning_rate=learning_rate)

num_batches = int(data_loader.num_train_data//batch_size*num_epochs)
for batch_index in range(num_batches):
    X,y = data_loader.get_batch(batch_size)
    with tf.GradientTape() as tape:
        y_pred = model(X)
        loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y,y_pred=y_pred)
        loss = tf.reduce_sum(loss)
        print("batch %d: loss %f"%(batch_index,loss.numpy()))
    grads = tape.gradient(loss,model.variables)
    optimier.apply_gradients(grads_and_vars=zip(grads,model.variables))

print('now end the train, time is ')
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()))
# Evaluation of the model 
sparse_categorical_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
num_batches_test = int(data_loader.num_test_data//batch_size)  # Split the test data into multiple batches , Each batch 50 A picture 
for batch_index in range(num_batches_test):
    start_index,end_index = batch_index*batch_size,(batch_index+1)*batch_size
    y_pred = model.predict(data_loader.test_data[start_index:end_index])
    sparse_categorical_accuracy.update_state(
        y_true = data_loader.test_label[start_index:end_index],
        y_pred=y_pred
    )
print('test accuracy: %f'%sparse_categorical_accuracy.result())
print('now end the test, time is ')
print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime()))

The prediction accuracy can reach 99.15%. It's amazing .
Output results :

batch 5999: loss 0.094517
now end the train, time is 
2021-03-18 17:15:46
test accuracy: 0.991500
now end the test, time is 
2021-03-18 17:16:05

summary

Building a convolutional neural network only needs to meet : Determine the number of layers 、 According to convolution 、 Activate 、 Processes such as pooling define each layer 、 Layer to layer input / output matching , To output the results , We need to 2 A dimensional or even multidimensional matrix is flattened into a large one 1 D matrix , Then, with full connection, a multilayer neural network at the output end can be constructed , The last output layer uses softmax Function to classify , The result with the highest probability of output , That's our prediction .
Therefore, a convolutional neural network is built . The accuracy is already quite good .

原网站

版权声明
本文为[Romantic data analysis]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206262127177449.html