MNISTIs an introduction to machine learning data sets,全称是 Mixed National Institute of Standards and Technology database ,来自美国国家标准与技术研究所,是NIST(National Institute of Standards and Technology)的缩小版
训练集(training set)由来自 250 个不同人手写的数字构成,其中 50% 是高中学生,50% 来自人口普查局(the Census Bureau)的工作人员,数量为60000
测试集(test set)也是同样比例的手写数字数据,数量为10000
MNIST数据集可以在 MNIST官网中下载,也可以之间通过keras库导入(Behind the is this)
from keras import datasets
# 导入数据集
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data() # datasets内部集成了MNIST数据集
MNIST数据集共有70000张图片,Of the specifications of the pictures are28*28,So the image pixel values as784
Therefore, in the traditional neural network to see, 可以设置二维数组[70000][784],而Each image each pixel value between 0~1 之间,The array element value in0~1之间
而卷积神经网络CNNNeural network is optimized for above,Solve the traditional neural network parameters are caused by too much difficult to deal with、效率低、过拟合的问题
卷积神经网络CNN可以Dealing with most of the computer vision
1.Can be used to complete test task:Detect the image what
2.Can search and classification task
Classification is the image information categories,Retrieval is based on a similar image capture images
3.Can realize the super-resolution reconstruction,提高图像的清晰度
4.可以实现OCR、无人驾驶、High and new technology such as face recognition
The traditional neural network isTransform the image into vector processing,每个像素点值在 0~1 内变化
而CNNNo longer will the image according to vector processing,而是Input is the 3 d data( 其中h、w为图像的长、宽,c为图像的通道数,rgb图像有r、g、b三个通道,Each channel to separate processing)
CNNBecause it is 3 d data processing,所以一般用GPU(图像处理器)更快
输入层:The image data input into the trained network
卷积层:根据The convolution kernel size and convolution step stipulated in the segmentation of each channel data,According to each channel current special weight value,According to the inner productCalculating integral characteristic value of each part,The final will each channelThe corresponding segmentation partThe eigenvalues of the get together,Plus the offset parameters,Obtains the current convolution characteristic figure “特征提取”
输入为 ,(The actual input data is,Because of the edge effect in here,进行了+pad 1The edge of the fill,变成了 ),卷积核为 ,卷积步长为2,卷积核个数为2.
We start with the first characteristic diagram calculation(卷积核是W0),The second figure is so(卷积核是W1):
池化层:Pooling is the eigenvalue of trade-off,Is generally based on maximum pooling method to compress feature maps “特征压缩”
Usually the biggest pooling method is adopted to calculate:具体思想是Take the largest of each as a last eigenvalue
全连接层:Transformed to get multiple characteristics of figure,映射到样本标记空间 “特征分类”
卷积步长(滑动窗口步长):Refers to the convolution kernels in channel data on how many units each move
步长越小,得到的结果越精确,But the training efficiency and lower
一般图像处理,Step length selection1 ; 文本处理,Step generally is not1
卷积核尺寸 : Refers to the channel data for each characteristic value how many computing
卷积核尺寸越小,得到的结果越精确,But the training efficiency and lower
边缘填充 :
When the characteristic value is calculated by convolution layer every move will have partial data reuse,这样的话More on the edge of the involved in the calculation 次数越少,The more and more in the middle of the calculation of the number of,As far as possible in order to reduce the number of image data is differ,We can in the externalAdd several layers will not affect the results of the calculation of0,So the image of the real data involved in the eigenvalue calculation number will be small difference
General is edge filling taking1,Or layer and a0
The characteristic of convolution kernel number and final figure
Characteristics of figure wide high calculating
其中 Is the input image wide high, Convolution kernels is wide and high,Edge is fill the number,d是步长
Number of parameters to calculate
The number of parameters in convolution layer as: All of the matrix parameters+ 偏置参数
例如 的卷积核有10个,则有10A bias parameters were
Neural network layer refers to the necessary parameter computing layer,Such as convolution layer and the whole connection,
Pooling layer is beyond the scope of calculation
MNISTTo identify the network structure of
KStudents adopted by the network structure of:
The network structure is3层的神经网络,A convolution roots after a pool,There are two convolution pooling
Input layer is,只有一个通道
(Because the image edge information is not important here and no edges fill)
第一次卷积,Convolution kernels were32个,Of the specifications of the convolution kernels is ,步长为1
第一次池化,采用每4Data for a maximum pool,The wide high compression to the original0.5倍,步长为2
第二次卷积,卷积核为64个,Specification remains the same,步长不变
第二次池化,Pooling specification remains the same
relu层:Relu是一个激活函数,Convolution finish layer performs the function,作用:Remove the negative in the convolution result,保留正值不变
Flatten层,Maps show into vector,全连接层进行特征分类,再输出结果
import tensorflow as tf
from keras import datasets, layers, models # 这里KClassmate import would be wrong
import matplotlib.pyplot as plt
Set by computerGPU训练
# 设置采用GPU训练程序
gpus = tf.config.list_physical_devices("GPU") # 获取电脑GPU列表
if gpus: # gpus不为空
gpu0 = gpus[0] # 选取GPU列表中的第一个
tf.config.experimental.set_memory_growth(gpu0, True) # 设置GPUThe graphics card according to the need to use
tf.config.set_visible_devices([gpu0], "GPU") # 设置GPUIs the equipment list of,The default is all visible,这里只设置了gpu0可见
According to the data set size select data format(个数,宽,高,通道数)
# According to the data set size adjust the data to the format we need
print(train_images.size) # 47040000
print(test_images.size) # 7840000
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))
这里的47040000和7840000Does not support we build three-channel
利用models包内的SequentialMethod to establish model of order
Here chose three layer neural network:2Convolution pooling layer 1层Flatten 1层全连接 然后输出
# 构建CNN网络模型
model = models.Sequential([ # 采用Sequential 顺序模型
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
# 卷积层1,卷积核个数32,卷积核3*3*1 reluActivate to remove negative retention comes at a time,输入是28*28*1
layers.MaxPooling2D((2, 2)), # 池化层1,2*2采样
layers.Conv2D(64, (3, 3), activation='relu'),
# 卷积层2,卷积核64个,卷积核3*3,reluActivate to remove negative retention comes at a time
layers.MaxPooling2D((2, 2)), # 池化层2,2*2采样
layers.Flatten(), # Flatten层,连接卷积层与全连接层
layers.Dense(64, activation='relu'), # 全连接层,64张特征图,特征进一步提取
layers.Dense(10) # 输出层,输出预期结果
# 打印网络结构
激活函数Can realize nonlinear transform data,Makes the neural network nonlinear model can be applied to more
相似于Sigmoid,减少了迭代次数,But power operation problems still exist,There is still time consuming problem
The optimizer is the optimization of parameters,指的是Update the neural network model parameters used by the algorithm
Commonly used the optimizer has the following kinds:
BDG 批量梯度下降
优点:可以得到全局最优解;易于并行实现;The number of iterations is less. 缺点:数据集较大时,训练很慢
SGD 随机梯度下降
优点:训练速度快; 缺点:准确度下降,并不是全局最优;不易于并行实现.Judging from the number of iterations,SGDThe number of iterations is more,在解空间的搜索过程看起来很盲目.
MBGD 小批量梯度下降
优点:收敛更稳定,另一方面可以充分地利用深度学习库中高度优化的矩阵操作来进行更有效的梯度计算. 缺点:About the choice of more,如果太小,Convergence speed will be slow,如果太大,Loss function will be kept at minimum turbulence and even deviated from.
Momentum 动量算法
是对Momentum的改进,提高了灵敏度,But artificial vector set very hard
缺点:可能会不收敛,Miss the global optimal solution
一般用adam,The result is bad then find other optimizer
Loss function is a measure of loss and the degree of error function,损失越小,模型越好
MSE 均方差 MAE 平均绝对误差 MAPE 相对百分误差 MSLE 对MSEAdd a layer of logarithmic optimization KLD = KL散度 从预测值概率分布Q到真值概率分布P的信息增益,用以度量两个分布的差异. cosine 预测值与真实标签的余弦距离平均值的相反数. binary_crossentropy 对数损失,logloss categorical_crossentropy 多类的对数损失 sparse_categorical_crossentrop:和 categorical_crossentropy一样,但接受稀疏标签
history =, train_labels, epochs=10,
validation_data=(test_images, test_labels))
# validation_dataKeyword announced to participate as test set data
# epochs为训练轮数,这里是10轮
epochs为训练轮数,In the process of gradient descent of model training,神经网络Never fitted to the optimum fitting state gradually,To achieve the optimal state after state will enter a fitting.因此epoch并非越大越好,一般是指在50到200之间.Data more diverse,相应epoch就越大.
1.Drawing test set pictures(取前20张),判断预测是否正确
# Drawing test set pictures
plt.figure(figsize=(20, 10)) # 这里只看20张,Don't really need to visual images this step
for i in range(20):
plt.subplot(5, 10, i + 1)
# 预测
pre = model.predict(test_images) # Forecast all test images
for x in range(5):
print(pre[x]) #输出预测的0-9The score of each,The more are more likely to,The more negative the impossible
Scores more are more likely to,The more negative the impossible
Forecast analysis points to the first five pictures for 7 2 1 0 4
Also can be directly predicted results
Forecast to modify code is as follows:
# 预测
pre = model.predict(test_images) # Forecast all test images
for x in range(5):
for x in range(5):
print(np.array(pre[x]).argmax()) #得到最大值的索引
