当前位置：网站首页>Convolutional neural network CNN based cat dog battle picture classification (tf2.1 py3.6)

Convolutional neural network CNN based cat dog battle picture classification (tf2.1 py3.6)

2022-06-12 08:07:00 【HNU_ Liu Yuan】

Preface ： Convolutional neural network is more and more widely used in life , And it can really solve some problems . such as ： Fruit classification 、 Gender classification, etc . And the information we usually contact is mostly in the form of images , So we want to use neural network to classify and recognize pictures , Because I haven't touched deep learning before , So it is also a try and learn . For this neural network , It can not only be used to classify cats and dogs , As long as there are corresponding training pictures , The corresponding two classification training can be carried out , For example, use the image training of face database , You can classify gender by photos , Has certain universality .

This routine uses kaggle The data set of cat dog battle is as follows ： Cat and dog competition data set download link

The first convolutional neural network is 1987 Year by year Alexander Waibel And so on （Time Delay Neural Network, TDNN）,TDNN It is a convolutional neural network applied to speech recognition .

Then convolutional neural network has some development , stay 2006 year Deep learning After the theory was put forward , The representational learning ability of convolutional neural networks has been concerned , And with the update of computing equipment . since 2012 Year of AlexNet Start , obtain GPU The complex convolutional neural networks supported by computing clusters have become ImageNet Large scale visual recognition competition （ImageNet Large Scale Visual Recognition Challenge, ILSVRC） The winning algorithm of .

stay 2015 year ,ResNet To obtain the ILSVRC 2015 The winner of the .ResNet The biggest contribution is to solve the problem that it is difficult to train the deep network （ Back propagation gradient dispersion ）, Its network depth has reached 152 layer ！

With the improvement of computer computing power , The complexity and accuracy of neural networks are improving .

By 2017 year , A typical CNN The accuracy and complexity of the network ：

The data set used in this training is kaggle Part of the data in the competition of cats and dogs , The training set is about 2000 Zhang , The validation set is approximately 200 Zhang . This game is Top2% The correct rate of 99% above ,Top1 Of Tog Loss Reached 0.03302.

Of course, it is limited by the hardware 、 Algorithm and other limitations , This tutorial is just an introduction to .

Convolution network （ConvNets） It is a special type of neural network , It is especially suitable for computer vision applications , Because they have strong abstract representation ability for local operations .

The general flow of neural network is as follows ：

The training data and inspection data are stored in the folder of the following example .

Data Medium cat and dog The training pictures are stored separately , Each kind is roughly 1000 Zhang .

Validation Medium cat and dog Store the inspection photos separately , Each kind is roughly 100 Zhang .

Remove To clean the data program ;

Test To judge the picture categorizer ;

Training For training neural network program .

Training course （ One ）

First, run the first version of the program to get the following neural network output ：

The left figure shows the accuracy of the training set and the verification set （ Points are training sets , The line is the verification set ）;

The figure on the right shows the training set and the verification set loss（ Points are training sets , The line is the verification set ）.

You can see from the curve that ： The model is basically unchanged , It can be seen that there is a problem with the structure of the model . The default activation function of the analysis neural network is not applicable , The problem to be dealt with is a binary classification problem , Output is 0 and 1, Then select the output layer sigmoid function , Other neuron choices ReLU function .

Solution ： Set the activation function of the output layer to “sigmoid” type , namely ： tf.keras.layers.Dense(1, activation ='sigmoid'), Used for dichotomies . The remaining activation functions are relu Activation function .

Training course （ Two ）

Then run the second version of the program to get the following neural network output ：

You can see from the curve that ： The accuracy of the training set of the model is always higher than that of the verification set , Too close to the characteristics of training data , Better in the training set , But the accuracy in the verification set is not enough , No generalization , There is a phenomenon of over fitting .

The solution is to enlarge the data set for training . For example, in the image field, the training data set can be increased by stretching, rotation, transformation and clipping , And then through Dropout Random zeroing of parameters to avoid over fitting . So we used both methods , The first use of tf.keras Provided ImageDataGenerator Class to process images randomly （ The zoom 、 rotate 、 Mirror image, etc ）, In this way, more pictures can be derived from one picture ; Then use the random deactivation layer Dropout, Random zeroing of partial weights or outputs of hidden layers .tf.keras.layers.Dropout(x), Among them x take 0.3 about , Can solve the over fitting problem .

Training course （ 3、 ... and ）

Then run the third version of the program to get the following neural network output ：

After solving the problem of over fitting , It is found that the accuracy of the training set is almost always higher than that of the training set , The reason may be that the training set data is not pure , Cleaning required .

Solution ： To write remove Program , Load the neural network saved before , Clean the training set , Clear the training data which is quite different from the neural network prediction . Through the following code , Which makes the training focus different from the prediction results , Move to another folder .

if abs(predict[0]) < 0.05:

shutil.move(image_path, 'D:/Python/AI/catvsdog/remove/cat')

The above are some of the cleaned photos , It can be seen that the reasons why the photos were cleaned are as follows ：

1. Dogs or cats are black , Or the light is dim

2. The background of the photo is complex , There are often people in it

3. The picture is blurry, etc

Training course （ Four ）

Then run the fourth version of the program to get the following neural network output ：

Finally, we continuously adjust the parameters （ For example, the parameters of the deactivation layer 、 The number of iterations 、 Number of data per feed 、 Learning rate ）, Make the model accuracy close to 90%, It meets our requirements .

adopt model_new.save('the_save_model.h5'), With h5 Save the model obtained from the final training in a complete format , For later model verification loading .

Screenshot of training interface , Use GPU Speed up , The training set consists of 2041 A picture , The test set consists of 201 Photo data .

Training iterations 30 Time , The last correct rate is 85 above , And save the trained neural network model .

Neural network effect display

use first tf.keras.models.load_model(‘’) Load the model saved by the training , Then read the picture to be judged , Process the pictures , Change the format of its data 、 dimension , Then it is sent to the neural network , adopt predict = new_model.predict(), To get the output .

Because the output layer adopts Sigmoid Function for binary classification , Therefore, in the model test, it is necessary to Sigmoid Function output , Output is 0-1 Between .

With 0.5 For the dividing line , The closer the 1 The more likely you are to be a dog , The closer the 0 The more likely you are to be considered a cat .

Considering that the actual input is not a picture of a cat or a dog , So we are 0.5 Set a threshold value near , When the predicted value subtracts 0.5 The absolute value of is less than 0.05 when , That neither belongs to the cat , Nor does it belong to dogs .

The core code of judgment is as follows ：

if abs(predict[0]-0.5) < 0.05:

print("can not predict")

else:

if predict[0]>0.5:

print("it is a dog")

else:

print("it is a cat")

Actual sample inspection display

The training code of convolutional neural network is as follows ：

import os
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.optimizers import RMSprop

base_dir = 'D:/Python/AI/catvsdog/'
train_dir = os.path.join(base_dir, 'data/')
validation_dir = os.path.join(base_dir, 'validation/')
train_cats_dir = os.path.join(train_dir, 'cat')  # directory with our training cat pictures
train_dogs_dir = os.path.join(train_dir, 'dog')  # directory with our training dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cat')  # directory with our validation cat pictures
validation_dogs_dir = os.path.join(validation_dir, 'dog')  # directory with our validation dog pictures

batch_size = 64 # 64
epochs = 40 # 40
IMG_HEIGHT = 128
IMG_WIDTH = 128
 
num_cats_tr = len(os.listdir(train_cats_dir))  
num_dogs_tr = len(os.listdir(train_dogs_dir)) 
 
num_cats_val = len(os.listdir(validation_cats_dir)) 
num_dogs_val = len(os.listdir(validation_dogs_dir)) 
total_train = num_cats_tr + num_dogs_tr  
total_val = num_cats_val + num_dogs_val  
def plotImages(images_arr):
    fig, axes = plt.subplots(1, 5, figsize=(20, 20))
    axes = axes.flatten()
    for img, ax in zip(images_arr, axes):
        ax.imshow(img)
        ax.axis('off')
    plt.tight_layout()
    plt.show()
  
image_gen_train = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale = 1. / 255,
    rotation_range=5,
    width_shift_range=.1,
    height_shift_range=.1,
    horizontal_flip=True,
    zoom_range=0.1
)

train_data_gen = image_gen_train.flow_from_directory(batch_size=batch_size,
                                                     directory=train_dir,
                                                     shuffle=True,
                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                     class_mode='binary')
augmented_images = [train_data_gen[0][0][0] for i in range(5)]
plotImages(augmented_images)
 
#  Create validation set data generator 
image_gen_val = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255)
 
val_data_gen = image_gen_val.flow_from_directory(batch_size=batch_size,
                                                 directory=validation_dir,
                                                 target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                 class_mode='binary')

#  Creating models 
model_new = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu',
                           input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Dropout(0.4),# 0.4
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 1, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Dropout(0.3),# 0.3
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(1, activation ='sigmoid')
])
#
# #  Compile model 
# #  Choice here ADAM Optimizer and binary cross entropy loss function . To view the training and verify the accuracy of each training period , Please deliver metrics Parameters .
model_new.compile(optimizer=RMSprop(lr = 0.0005),
                    loss='binary_crossentropy',
                    metrics=['acc'])
 
model_new.summary()

print(model_new.trainable_variables)

#  After successfully introducing the data expansion into the training sample and adding Dropout after , Train this new network :
history = model_new.fit_generator(
    train_data_gen,
    steps_per_epoch=total_train // batch_size,
    epochs=epochs,
    validation_data=val_data_gen,
    validation_steps=total_val // batch_size
)

model_new.save('the_save_model.h5') #  Save the model 
print("model save")

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Convolutional neural network data cleaning code ：

import tensorflow as tf
import numpy as np
import glob
import shutil
import os
from PIL import Image
import cv2
from pathlib import Path
from tensorflow.keras.preprocessing.image import ImageDataGenerator
new_model = tf.keras.models.load_model('path') #  Load network 
new_model.summary()
for image_path in glob.glob(r'path /*.jpg'): #  Read all the pictures 
    image = cv2.imread(image_path)
    test = cv2.resize(np.array(image),(128,128)) #  Resize the picture 
    test = test.astype('float32')

    test = test/255.0  #  normalization 
    test = test[tf.newaxis, ...]
    test = np.array(test)
    predict = new_model.predict(test) #  To make predictions 
    print(predict[0])
    if abs(predict[0]) < 0.05: #  Unqualified 
        shutil.move(image_path, 'path') #  Move to another folder

Neural network prediction code ：

import tensorflow as tf
import numpy as np
import os
from PIL import Image
import cv2
from pathlib import Path
from tensorflow.keras.preprocessing.image import ImageDataGenerator

new_model = tf.keras.models.load_model('D:/Python/AI/catvsdog/__pycache__/87the_save_model.h5') #  Load model 
new_model.summary()

image = cv2.imread('d:/Python/AI/catvsdog/pre/predict/cat1.jpg') #  Read the picture 
test_img = cv2.resize(np.array(image),(128,128)) #  Resize 
test = test_img.astype('float32') #  Adjust the format 
 
test = test/255.0 #  normalization 
test = test[tf.newaxis, ...] #  Increase the dimension 
test = np.array(test)
test.shape
predict = new_model.predict(test) #  To make predictions 
print(predict)
if abs(predict[0]-0.5) < 0.05: #  To classify 
    print("can not predict")
    title = "can not predict"
else:
    if predict[0]>0.5:
        print("it is a dog")
        title = "it is a dog"
    else:
        print("it is a cat")
        title = "it is a cat"
cv2.imshow(title, image) #  display picture 
cv2.waitKey(0) 
cv2.destroyAllWindows() #  Destruction of the window

Conclusion and prospect

The neural network built this time can be used for binary classification , And it is applicable to most of the two classifications , But for multi classification problems , Its ability is insufficient , It needs to be improved .

As a simple introduction to convolutional neural network , It makes me understand the building process of neural network and parameter setting , With a certain understanding , It can pave the way for future study .

This neural network can be used in other places in the future , For example, the classification of male and female photos 、 Classification of peach and pear trees ; Increase the amount of training , That is, increase the number of training pictures and iterations to achieve better results ; Introduce breakpoint continuation training, etc , Improve operational efficiency ; Optimize the network structure to achieve higher accuracy .

原网站

版权声明
本文为[HNU_ Liu Yuan]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203010551304894.html