classification task on dataset-CIFAR10,by using Tensorflow/keras



该数据集共有60000张彩色图像,这些图像是32*32,分为10个类,每类6000张图。这里面有50000张用于训练,构成了5个训练批,每一批10000张图;另外10000用于测试,单独构成一批。测试批的数据里,取自10类中的每一类,每一类随机取1000张。抽剩下的就随机排列组成了训练批。注意一个训练批中的各类图像并不一定数量相同,总的来看训练批,每一类都有5000张图。 下面这幅图就是列举了10各类,每一类展示了随机的10张图片:

Image text




conda                     4.10.3
keras                     2.6.0
markdown                  3.3.4
matplotlib                3.4.2
numpy                     1.19.5
python                    3.9.7
tensorflow-gpu            2.6.0


BaseLine网络是我经过卷积神经网络基础知识学习之后搭建的最为基础的模型,将输入图像依次经过CBAPD五个层(即卷积层、批标准化层、激活层、池化层和舍弃层),随后用概率输出的基础网络。 该网络结构简单,比较适合进行测试,因此我首先使用了这个网络模型,使用CIFAR10数据集进行了训练和测试。


Model: "baseline"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  456       
batch_normalization (BatchNo multiple                  24        
activation (Activation)      multiple                  0         
max_pooling2d (MaxPooling2D) multiple                  0         
dropout (Dropout)            multiple                  0         
flatten (Flatten)            multiple                  0         
dense (Dense)                multiple                  196736    
dropout_1 (Dropout)          multiple                  0         
dense_1 (Dense)              multiple                  1290      
Total params: 198,506
Trainable params: 198,494
Non-trainable params: 12


Image text


LeNet-5卷积神经网络模型 LeNet-5:是Yann LeCun在1998年设计的用于手写数字识别的卷积神经网络,当年美国大多数银行就是用它来识别支票上面的手写数字的,它是早期卷积神经网络中最有代表性的实验系统之一。


Image text


Model: "le_net5"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  456       
max_pooling2d (MaxPooling2D) multiple                  0         
conv2d_1 (Conv2D)            multiple                  2416      
max_pooling2d_1 (MaxPooling2 multiple                  0         
flatten (Flatten)            multiple                  0         
dense (Dense)                multiple                  48120     
dense_1 (Dense)              multiple                  10164     
dense_2 (Dense)              multiple                  850       
Total params: 62,006
Trainable params: 62,006
Non-trainable params: 0


Image text


LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.


AlexNet是2012年ImageNet竞赛冠军获得者Hinton和他的学生Alex Krizhevsky设计的。也是在那年之后,更多的更深的神经网络被提出,比如优秀的vgg,GoogLeNet。 这对于传统的机器学习分类算法而言,已经相当的出色。 AlexNet中包含了几个比较新的技术点,也首次在CNN中成功应用了ReLU、Dropout和LRN等Trick。同时AlexNet也使用了GPU进行运算加速。 AlexNet将LeNet的思想发扬光大,把CNN的基本原理应用到了很深很宽的网络中。AlexNet主要使用到的新技术点如下:





(5)使用CUDA加速深度卷积网络的训练,利用GPU强大的并行计算能力,处理神经网络训练时大量的矩阵运算。AlexNet使用了两块GTX 580 GPU进行训练,单个GTX 580只有3GB显存,这限制了可训练的网络的最大规模。因此作者将AlexNet分布在两个GPU上,在每个GPU的显存中储存一半的神经元的参数。因为GPU之间通信方便,可以互相访问显存,而不需要通过主机内存,所以同时使用多块GPU也是非常高效的。同时,AlexNet的设计让GPU之间的通信只在网络的某些层进行,控制了通信的性能损耗。 


Image text


Model: "alex_net8"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  2688      
batch_normalization (BatchNo multiple                  384       
activation (Activation)      multiple                  0         
max_pooling2d (MaxPooling2D) multiple                  0         
conv2d_1 (Conv2D)            multiple                  221440    
batch_normalization_1 (Batch multiple                  1024      
activation_1 (Activation)    multiple                  0         
max_pooling2d_1 (MaxPooling2 multiple                  0         
conv2d_2 (Conv2D)            multiple                  885120    
conv2d_3 (Conv2D)            multiple                  1327488   
conv2d_4 (Conv2D)            multiple                  884992    
max_pooling2d_2 (MaxPooling2 multiple                  0         
flatten (Flatten)            multiple                  0         
dense (Dense)                multiple                  2099200   
dropout (Dropout)            multiple                  0         
dense_1 (Dense)              multiple                  4196352   
dropout_1 (Dropout)          multiple                  0         
dense_2 (Dense)              multiple                  20490     
Total params: 9,639,178
Trainable params: 9,638,474
Non-trainable params: 704


Image text


Technicolor T , Related S , Technicolor T , et al. ImageNet Classification with Deep Convolutional Neural Networks.


VGG的作者在论文中将它称为是Very Deep Convolutional Network,如上图所示的VGG16网络带权层就达到了16层,这在当时已经很深了。网络的前半部分,每隔2~3个卷积层接一个最大池化层,4次池化共经历了13个卷积层,加上最后3个全连接层共有16层,也正因此我们称这个网络为VGG16。


Image text


Model: "vgg16"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  1792      
batch_normalization (BatchNo multiple                  256       
activation (Activation)      multiple                  0         
conv2d_1 (Conv2D)            multiple                  36928     
batch_normalization_1 (Batch multiple                  256       
activation_1 (Activation)    multiple                  0         
max_pooling2d (MaxPooling2D) multiple                  0         
dropout (Dropout)            multiple                  0         
conv2d_2 (Conv2D)            multiple                  73856     
batch_normalization_2 (Batch multiple                  512       
activation_2 (Activation)    multiple                  0         
conv2d_3 (Conv2D)            multiple                  147584    
batch_normalization_3 (Batch multiple                  512       
activation_3 (Activation)    multiple                  0         
max_pooling2d_1 (MaxPooling2 multiple                  0         
dropout_1 (Dropout)          multiple                  0         
conv2d_4 (Conv2D)            multiple                  295168    
batch_normalization_4 (Batch multiple                  1024      
activation_4 (Activation)    multiple                  0         
conv2d_5 (Conv2D)            multiple                  590080    
batch_normalization_5 (Batch multiple                  1024      
activation_5 (Activation)    multiple                  0         
conv2d_6 (Conv2D)            multiple                  590080    
batch_normalization_6 (Batch multiple                  1024      
activation_6 (Activation)    multiple                  0         
max_pooling2d_2 (MaxPooling2 multiple                  0         
dropout_2 (Dropout)          multiple                  0         
conv2d_7 (Conv2D)            multiple                  1180160   
batch_normalization_7 (Batch multiple                  2048      
activation_7 (Activation)    multiple                  0         
conv2d_8 (Conv2D)            multiple                  2359808   
batch_normalization_8 (Batch multiple                  2048      
activation_8 (Activation)    multiple                  0         
conv2d_9 (Conv2D)            multiple                  2359808   
batch_normalization_9 (Batch multiple                  2048      
activation_9 (Activation)    multiple                  0         
max_pooling2d_3 (MaxPooling2 multiple                  0         
dropout_3 (Dropout)          multiple                  0         
conv2d_10 (Conv2D)           multiple                  2359808   
batch_normalization_10 (Batc multiple                  2048      
activation_10 (Activation)   multiple                  0         
conv2d_11 (Conv2D)           multiple                  2359808   
batch_normalization_11 (Batc multiple                  2048      
activation_11 (Activation)   multiple                  0         
conv2d_12 (Conv2D)           multiple                  2359808   
batch_normalization_12 (Batc multiple                  2048      
activation_12 (Activation)   multiple                  0         
max_pooling2d_4 (MaxPooling2 multiple                  0         
dropout_4 (Dropout)          multiple                  0         
flatten (Flatten)            multiple                  0         
dense (Dense)                multiple                  262656    
dropout_5 (Dropout)          multiple                  0         
dense_1 (Dense)              multiple                  262656    
dropout_6 (Dropout)          multiple                  0         
dense_2 (Dense)              multiple                  5130      
Total params: 15,262,026
Trainable params: 15,253,578
Non-trainable params: 8,448


Image text


Simonyan K , Zisserman A . Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014.


ResNet(Residual Neural Network)网络作者想到了常规计算机视觉领域常用的residual representation的概念,并进一步将它应用在了CNN模型的构建当中,于是就有了基本的residual learning的block。它通过使用多个有参层来学习输入输出之间的残差表示,而非像一般CNN网络(如Alexnet/VGG等)那样使用有参层来直接尝试学习输入、输出之间的映射。实验表明使用一般意义上的有参层来直接学习残差比直接学习输入、输出间映射要容易得多(收敛速度更快),也有效得多(可通过使用更多的层来达到更高的分类精度)。

ResNet的主要思想是在网络中增加了直连通道,即Highway Network的思想。此前的网络结构是性能输入做一个非线性变换,而Highway Network则允许保留之前网络层的一定比例的输出。ResNet的思想和Highway Network的思想也非常类似,允许原始输入信息直接传到后面的层中,如下图所示。

Image text


Model: "res_net18"
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  1728      
batch_normalization (BatchNo multiple                  256       
activation (Activation)      multiple                  0         
sequential (Sequential)      (None, 4, 4, 512)         11176448  
global_average_pooling2d (Gl multiple                  0         
dense (Dense)                multiple                  5130      
Total params: 11,183,562
Trainable params: 11,173,962
Non-trainable params: 9,600


Image text


He K , Zhang X , Ren S , et al. Deep Residual Learning for Image Recognition[J]. IEEE, 2016.



(1)使用tf.keras.datasets.cifar10函数直接读取cifar10数据集,并将其分割成x_train, y_train, x_test, y_test,分别表示训练集和测试集的图像与标签。










(1)optimizer :用于选择训练的优化器,常见的有sgd、adagrad、adadelta、adam等,区别在于一阶动量和二阶动量的差别。



(4)batch_size:表示送入网络的数据尺寸,batch_size太大,深度学习的优化(training loss降不下去)和泛化(generalization gap很大)都会出问题。而batch_size太小,会来不及收敛。一般常见的batch_size约为32。


(6)validation_freq:表示在执行新的验证运行之前要运行多少个训练时期,如,validation_freq = 1时,每1个时期运行一次验证。一般默认为1.


该文更新为2021.10.9,作者:黄一骏 邮箱:[email protected]

