当前位置:网站首页>Convolutional neural network (CNN) explanation and tensorflow2 code implementation
Convolutional neural network (CNN) explanation and tensorflow2 code implementation
2022-06-26 21:33:00 【Romantic data analysis】
Convolutional neural network sounds scary , This article explains... In an easy to understand way . Everyone can understand .
List of articles
- What is convolution
- One 、 Convolution neural network introduction
- Convolution layer -- Extract local picture features
- expand --padding, Keep the length and width of the convoluted image unchanged
- Pooling layer --- Reduce dimension , Reduce model complexity and computation
- flatten Flattening -- Turn multidimensional data into a huge one-dimensional vector
- Fully connected layer -- Output results
- Two 、TensorFlow2 Code implementation
- summary
What is convolution
Convolution neural network is the traditional neural network using matrix convolution technology .
Two dimensional linear convolution :
Matrix example :
( Part of the content is excerpted from this article )
Now there is a picture ( The following figure on the left ) And a kernel nucleus ( In the middle of the picture below ). The result on the right of the figure below can be obtained by convolution .
We know , The picture is actually a huge numerical matrix , That's what we often call pixels . A gray image is a huge two-dimensional matrix , Each element in the matrix represents the degree of black and white , It can be understood as a mathematical matrix .
Two dimensional convolution : Get the first value from the two-dimensional matrix , That is, the red box in the figure 1,1 The position of the upper convolution kernel kernel Center position , That is, the red box in the figure ( The convolution kernel is a matrix ), The value of the corresponding position is multiplied by the matrix element and finally summed , The resulting value is the new value -8( See the matrix convolution operation above for the calculation method ), And so on , Get new values for all positions , Finally, fill in the outermost layer of the picture 0 That's all right. . The size of two-dimensional convolution depends on the convolution kernel , The size of convolution kernel is usually odd ( example 3×3, 5×5), It means that the pixel value depends on the surrounding pixel value , The weight depends on the convolution kernel , Extract specific features of the region .
One 、 Convolution neural network introduction
Convolutional neural networks (CNN) It can be used to make the machine visualize things and perform such as image classification , Image recognition , Object detection , Instance segmentation and other tasks . This is a CNN The most common areas , Such as handwriting recognition .
Convolution layer – Extract local picture features

The picture has RGB Three color channels , So the input is 3 layer , because 3 Input channels ( Red R, green G And blue B), Any image we see is 3 Channel image . It can be understood as 3 layer , So the convolution kernel is also 3 layer , It's equivalent to two 3*3 The magic cube of 9 Multiply elements , Last 9 Add the products . therefore , Convolution kernel is equivalent to using 3 Layer 2D ( Length and width ) filter , The number of channels in the image ( The layer number ) Same number of layers as filter .
And 2D The convolution operation is similar to , We will slide the filter horizontally . Every time you move the filter , We will all get three channels of the whole picture (3 layer ) Weighted average of , namely RGB Weighted neighborhood of values . Since we only slide the kernel in two dimensions - From left to right , From top to bottom , The output of this operation will be 2D Output .
Suppose we have a size of 7x7 Of 2D Input , And it is being applied to the image from the upper left corner of the image 3x3 Filter of . When we go from left to right , When sliding the kernel over the image from top to bottom , Obviously , The output is less than the input , namely 5x5.
What if we want the output to be the same size as the input ?
If the size of the original input is 7x7, We also want the output size to be 7x7. therefore , In that case , What we can do is to add an artificial fill around the input evenly ( Value is zero ), So we can filter K(3x3) Place on image pixels , And calculate the weighted average of neighbors .
A convolution kernel is to extract a feature , therefore , In order to fully extract the features of the picture , We need multiple convolution check images for feature extraction , This is called the depth of the convolution kernel , The result is multiple 2D Output , Stack together , There are multiple levels of output . Pictured :
Understand this diagram , To understand the architecture of the last convolutional neural network . Multilayer convolution will increase the number of layers of this color .
expand –padding, Keep the length and width of the convoluted image unchanged
By adding a circle around the input ( zero ) This artificial filling , We can keep the output shape the same as the input . If we had a bigger filter K(5x5), Then the number of zero fills we need to apply will also increase , So we can keep the same output size . In the process , The output size is the same as the output size , So it's called Padding. See this link for the original text 
Pooling layer — Reduce dimension , Reduce model complexity and computation
Obtained the characteristic diagram , Usually we will execute a program called Pooling operation The operation of . Because the number of hidden layers required to learn the complex relationships in the image will be very large . We apply pooling operations to reduce the representation of input features , Thus reducing the computing power required by the network .
Once the input feature map is obtained , We will apply shape determining filters to the feature graph , To obtain the maximum value from this part of the characteristic graph . This is called maximum pooling . This is also called subsampling , Because from the entire part of the feature map covered by the kernel , We are sampling a maximum value .

flatten Flattening – Turn multidimensional data into a huge one-dimensional vector
We get several pink convolution results , It's a multidimensional , As shown in the figure at the end of the convolution section .
But our prediction is one-dimensional , For example, two categories , No 0 Namely 1, How can multidimensional data get one-dimensional output ?
It's easy , Spread out all multidimensional data , Into a one-dimensional array , Just like you put a lot of magic squares , One by one , In a row . The cube is multidimensional , You just put multiple multidimensional arrays , Is it a one-dimensional array ?

Fully connected layer – Output results
Once we have performed a series of convolution and pooling operation ( Maximum consolidation or average consolidation , Also called down sampling ). We flatten the output of the final pooled layer to a vector , And pass it through the full connection layer with different number of hidden layers ( Feedforward neural networks ) Pass on , Finally, it is fitted by multi-layer depth neural network .
Last , The output of the fully connected layer will pass through the required size Softmax layer .Softmax The vector of the layer output probability distribution , This helps to perform image classification tasks . In the digital recognizer problem ( As shown above ) in , Output softmax Layers have 10 Neurons , The input can be classified as 10 One of the categories (0–9 A digital ). If it is a binary problem , be Softmax Layer is 2 Neurons , Output, respectively, 0,1, So the last Softmax The layer is determined according to how many classes the final result needs to be divided into .
If it is 2 classification , The last one softmax Achievement is only two neurons , Express 2 Class output .
Two 、TensorFlow2 Code implementation
1. Import data
We use it TensorFlow2 Self contained mnist Test handwritten 0-9 Numbers , Then decide which number he wrote .
Import data to , Create a new one MNISTLoader Class .
The code is as follows ( Name it testData.py):
import numpy as np
import tensorflow as tf
class MNISTLoader():
def __init__(self):
mnist = tf.keras.datasets.mnist
(self.train_data,self.train_label),(self.test_data,self.test_label) = mnist.load_data()
# MNIST The image in defaults to uint8(0-255 The number of ). The following code normalizes it to 0-1 The floating point number between , And add one dimension as the color channel at the end RGB, If there is no such dimension, it is a gray-scale image , No color .
self.train_data = np.expand_dims(self.train_data.astype(np.float)/255.0,axis=-1) # [60000, 28, 28, 1]
self.test_data = np.expand_dims(self.test_data.astype(np.float32) / 255.0, axis=-1) # [10000, 28, 28, 1]
self.train_label = self.train_label.astype(np.int32) # [60000]
self.test_label = self.test_label.astype(np.int32) # [10000]
self.num_train_data, self.num_test_data = self.train_data.shape[0], self.test_data.shape[0] #60000,10000
def get_batch(self, batch_size):
# Random fetch from dataset batch_size Elements and return
index = np.random.randint(0, self.num_train_data, batch_size) # You can retrieve a piece of data repeatedly
return self.train_data[index, :], self.train_label[index]
# mnist = MNISTLoader()
# batch_size = 1
# train_data,train_label = mnist.get_batch(batch_size)
# print(train_data*255)
# print(train_label)
# print(train_data[0,:,1])
2. use TensorFlow2 Construct a CNN The Internet
The code structure is as follows :
1、 Define super parameters
2、 It depends on the model structure
3、 Train the model
4、 Predict the test set and test the accuracy
import numpy as np
import tensorflow as tf
from testData import *
import time
class CNN(tf.keras.Model):
def __init__(self):
super().__init__()
self.conv1= tf.keras.layers.Conv2D(
filters=32, # The number of convolution kernels 32, extract 32 Whitman's sign
kernel_size=[5,5], # Sensory field , Length and width of convolution kernel
padding='same', #padding Strategy (vaild、same)
activation= tf.nn.relu # Activation function
)
self.pool1 = tf.keras.layers.MaxPool2D(pool_size=[2,2],strides=2) # The pool layer is generally 2X2 matrix
self.conv2 = tf.keras.layers.Conv2D(
filters=64,
kernel_size=[5,5],
padding='same',
activation=tf.nn.relu
)
self.pool2 = tf.keras.layers.MaxPool2D(pool_size=[2,2],strides=2) # The pool layer is generally 2X2 matrix
self.flatten = tf.keras.layers.Reshape(target_shape=(7*7*64,)) # Flatten the two-dimensional matrix to 1 dimension
self.dense1 = tf.keras.layers.Dense(units=1024,activation=tf.nn.relu) # The first floor is fully connected ,1024 Neurons
self.dense2 = tf.keras.layers.Dense(units=10) # The last layer is the fully connected layer , The activation function uses softmax, The number of neurons is classified
def call(self,inputs):
x = self.conv1(inputs) # Through the first convolution
x = self.pool1(x) # Through the first pool layer , Down sampling
x = self.conv2(x) # Through the second accretion layer
x = self.pool2(x) # Through the second pool layer , Down sampling
x = self.flatten(x) # Flatten the intermediate result into a large one-dimensional vector
x = self.dense1(x) # Through the first full connection layer
x = self.dense2(x) # The result is a second fully connected layer , And the last floor , It's called softmax layer
output = tf.nn.softmax(x)
return output
# Main control procedure , Call the data and train the model
# Define super parameters
num_epochs = 5 # Number of repetitions per element
batch_size = 50
learning_rate = 0.001
print('now begin the train, time is ')
print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime()))
model = CNN()
data_loader = MNISTLoader()
optimier = tf.keras.optimizers.Adam(learning_rate=learning_rate)
num_batches = int(data_loader.num_train_data//batch_size*num_epochs)
for batch_index in range(num_batches):
X,y = data_loader.get_batch(batch_size)
with tf.GradientTape() as tape:
y_pred = model(X)
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y,y_pred=y_pred)
loss = tf.reduce_sum(loss)
print("batch %d: loss %f"%(batch_index,loss.numpy()))
grads = tape.gradient(loss,model.variables)
optimier.apply_gradients(grads_and_vars=zip(grads,model.variables))
print('now end the train, time is ')
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()))
# Evaluation of the model
sparse_categorical_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
num_batches_test = int(data_loader.num_test_data//batch_size) # Split the test data into multiple batches , Each batch 50 A picture
for batch_index in range(num_batches_test):
start_index,end_index = batch_index*batch_size,(batch_index+1)*batch_size
y_pred = model.predict(data_loader.test_data[start_index:end_index])
sparse_categorical_accuracy.update_state(
y_true = data_loader.test_label[start_index:end_index],
y_pred=y_pred
)
print('test accuracy: %f'%sparse_categorical_accuracy.result())
print('now end the test, time is ')
print(time.strftime('%Y-%m-%d %H:%M:%S',time.localtime()))
The prediction accuracy can reach 99.15%. It's amazing .
Output results :
batch 5999: loss 0.094517
now end the train, time is
2021-03-18 17:15:46
test accuracy: 0.991500
now end the test, time is
2021-03-18 17:16:05
summary
Building a convolutional neural network only needs to meet : Determine the number of layers 、 According to convolution 、 Activate 、 Processes such as pooling define each layer 、 Layer to layer input / output matching , To output the results , We need to 2 A dimensional or even multidimensional matrix is flattened into a large one 1 D matrix , Then, with full connection, a multilayer neural network at the output end can be constructed , The last output layer uses softmax Function to classify , The result with the highest probability of output , That's our prediction .
Therefore, a convolutional neural network is built . The accuracy is already quite good .
边栏推荐
- Stop being a giant baby
- 不同的子序列问题I
- Leetcode question brushing: String 02 (reverse string II)
- 基于启发式搜索的一字棋
- 剑指 Offer II 098. 路径的数目 / 剑指 Offer II 099. 最小路径之和
- Chapter 2 construction of self defined corpus
- Is there any risk in opening a mobile stock registration account? Is it safe?
- VB.net类库(进阶——2 重载)
- AI智能抠图工具--头发丝都可见
- lotus configurations
猜你喜欢

Shiniman household sprint A shares: annual revenue of nearly 1.2 billion red star Macalline and incredibly home are shareholders

会计要素包括哪些内容

Leetcode(763)——划分字母区间

Leetcode: hash table 08 (sum of four numbers)

YuMinHong: New Oriental does not have a reversal of falling and turning over, destroying and rising again

MATLAB与Mysql数据库连接并数据交换(基于ODBC)

AI智能抠图工具--头发丝都可见

Kdd2022 𞓜 unified session recommendation system based on knowledge enhancement prompt learning

DAST black box vulnerability scanner part 5: vulnerability scanning engine and service capability

GameFi 活跃用户、交易量、融资额、新项目持续性下滑,Axie、StepN 能摆脱死亡螺旋吗?链游路在何方?
随机推荐
不要做巨嬰了
Talk about my remote work experience | community essay solicitation
Mr. Sun's version of JDBC (21:34:25, June 12, 2022)
lotus configurations
AI智能抠图工具--头发丝都可见
聊聊我的远程工作体验 | 社区征文
线性模型LN、单神经网络SNN、深度神经网络DNN与CNN测试对比
在哪家证券公司开户最方便最安全可靠
不要做巨婴了
Matrix calculator design for beginners of linear algebra based on Qt development
【贝叶斯分类3】半朴素贝叶斯分类器
About appium trample pit: encountered internal error running command: error: cannot verify the signature of (solved)
lotus configurations
Vi/vim editor
【 protobuf 】 quelques puits causés par la mise à niveau de protobuf
网易云信正式加入中国医学装备协会智慧医院分会,为全国智慧医院建设加速...
SAP Spartacus 默认路由配置的工作原理
Godson China Science and technology innovation board is listed: the market value is 35.7 billion yuan, becoming the first share of domestic CPU
DAST black box vulnerability scanner part 5: vulnerability scanning engine and service capability
Cause analysis of 12 MySQL slow queries