当前位置:网站首页>Facial expression recognition based on pytorch convolution -- graduation project
Facial expression recognition based on pytorch convolution -- graduation project
2022-07-03 08:49:00 【Thebluewinds】
Facial expression recognition based on convolutional neural network
This article records the content of my undergraduate graduation project . My topic is facial expression recognition , Originally, according to the tradition of previous seniors, it was MATLAB Using traditional machine learning methods to achieve classification . But since I have been exposed to a little bit of in-depth learning before , I think it may be better to use convolutional neural network to realize this network . So I collected a lot of information on the Internet , According to this, I made a project based on Pytorch Convolution model implemented , Added a program to call the camera for real-time recognition . First contact with machine vision , No experience , I hope you can give me some advice . The reference of this design comes from the following :
1. Facial expression recognition based on convolution neural network (Pytorch Realization )– Autumn showers . link :LINK
2.Pytorch Facial expression recognition based on convolutional neural network -marika. link :LINK
3.Python Neural network programming - tariq
Introduction to the content of graduation project
This article uses Pytorch The deep learning framework uses convolutional neural network to design a facial expression recognition program , Will be part of FER2013 Train as a data set , The other part is the test set . The data set is trained by using three layers of convolution layer and four layers of full connection layer , The accuracy of the training set can reach 99.4%, The highest accuracy of the test set is 60.5%. At the same time, a real-time facial expression recognition system is designed by using the trained model , It can call the camera to analyze facial expressions in real time , Be able to recognize the basic expression categories and display them on the window through labels , At the same time, show the probability of system judgment .
Design of convolutional neural network
The model of convolution network
The data set is FER2013{ Baidu network disk link :Link}
link :https://pan.baidu.com/s/1MTQ12vq60vVOWIPTLlfOGw
Extraction code :xili
The data is csv Format store ,0-6 They represent :(0) angry 、(1) Hate 、(2) Fear 、(3) happy 、(4) sad 、(5) surprised 、(6) natural . These data need to be processed and transformed into jpg Format input network training . The following figure shows the processed data set . Put the dataset
The following figure shows the convolutional neural network model used in this system . The data set image size is 48x48, The model consists of three convolution layers and pooling layers , And four full connection layers . Simple structure , Therefore, the recognition accuracy of the finally trained model is limited .
Detailed description of convolution pooling process
The model is built , How to realize the specific process , Next, I will give a detailed description of each convolution and pooling process . Model input is 1 passageway 48x48 Image , Output is 256 The tunnel 6x6 Characteristic graph . In the convolution process 3x3 Convolution kernel , In order to keep the size of the characteristic image unchanged after convolution , A circle of edge filling is carried out before convolution .
fill It means originally 48x48 Image ,padding by 1 when , Will become 49x49 The size of the .
step It refers to the span of each convolution kernel or pooled kernel movement .
The first layer convolution pooling process
The first convolution kernel pooling process : As shown in the figure below , In the first convolution , We're going to use 3x3 Convolution kernel , The input image size is 48x48, After convolution, we get 64 Passageway 48x48 Characteristic graph . And then after ReLU Function operation , Keep the pixel value in the feature map within a reasonable range . Then use the size of 2 Pool the core of , In steps of 2, In the end, we will get 64 The tunnel 24x24 Characteristic graph .
The second layer convolution pooling process
The second layer convolution and pooling process : As shown in the figure , In the second convolution layer , It's using 128 individual 3x3 Convolution kernel , In steps of 1, Edge fill is 1, After convolution, the output is a 128 The tunnel 24x24 Characteristic graph . After ReLU After the function , Use 2x2 Pool the core of , Got it 128 Zhang 12x12 Characteristic graph .
The third layer convolution pooling process
The third layer is the process of convolution and pooling : As shown in the figure below , The third layer convolution adopts 256 A convolution check 128 Convolution operation is performed on the characteristic graph output from the upper layer of the channel ,256 Passageway 12x12 Characteristic graph , Then go through ReLU Function operation , On the 2x2 Pool operation of , Got it 256 passageway 6x6 Characteristic graph .
Full connection layer process
After three layers of convolution , In the fourth layer, the output of the third layer 256 passageway 3x3 The characteristic graph of is expanded in one dimension , Then connect it all to 4096 On a neuron node , after ReLU Activation layer , Proceed again Dropout. On the fifth floor , take 4096 Nodes connected to 1024 A neuron node , after ReLU layer , after Dropout. On the sixth floor , Will be on the next level 1024 Neuron nodes on the next layer 256 Nodes are fully connected , Then connect to the output layer of the last layer 7 On a neuron node , after Softmax The function can do 7 Classification of facial expressions , The probability of outputting each expression .
The training process of the model
Convolution and pooling principle
After the model is built , The parameters of the network are still random , It doesn't make any sense , So how can we make convolutional neural network complete the task of classification ? Let me talk about the implementation process of convolutional neural network . Convolution is actually a process of feature extraction of image content , Convolution kernel is like a small window , Go to the image one by one , Look at the window position you move to every time “ Fit ”, If “ Fit ” high , The sum of dot products will result in higher values , Otherwise it will be smaller . After this operation, you will get a feature map . The following figure shows the specific process of image convolution , Convolution kernels share a common bias . In the image , The shallow convolution layer is used to extract the edge contour information on the original image , The deep convolution layer can iteratively propose complex image abstract information from the bottom level feature map .
Convolution process
Pooling process : The pooling layer can effectively reduce the spatial size and parameters of the image , Thus, the amount of calculation is reduced . therefore , When the input image is large , It usually leads to a large number of network operation parameters , The pooling operation is introduced between adjacent convolution layers at intervals , It can effectively solve the problem of too many training parameters , It can compress the image size while retaining effective information , Speed up the calculation . The pooling layer can choose average pooling or maximum pooling . Average pooling refers to averaging all elements in a cell , Maximum pooling is to take the maximum value of all element values in all corresponding cells as the most pooled output . The following figure shows the pooling process , Maximum pooling is adopted .
Dynamic diagram of pool process
How does the model train
Expression feature training is the process of extracting expression features from the preprocessed data set with the designed model . The training process is roughly divided into : Forward propagation 、 Back propagation 、 Loss value calculation 、 Calculate the weight gradient and update the weight . The following figure shows the neural network training process .
The implementation process of back propagation : In deep learning, the learning method is generally back-propagation method ,– Back propagation itself is developed from gradient descent method , By introducing a neural element error delta, This variable can change the cumbersome process of solving partial derivatives in the gradient descent method into a recursive relationship of sequence . Generally speaking, it means , By the chain rule , The partial derivative of the global loss function for a certain weight or offset can be solved . And then multiplied by the eta, That is, the learning rate to update the parameter size . The following formula is neural unit error delta The definition of .
δ j l = ∂ C ∂ z j l ( l = 2 , 3 , ⋯ ) \delta _{j}^{l}=\frac{\partial C}{\partial z_{j}^{l}}\left( l=2,3,\cdots \right) δjl=∂zjl∂C(l=2,3,⋯)
How to calculate the error of neural unit ? According to the chain rule , Got the first l Tier and tier l+1 The relationship of layers is shown in the following formula . As long as we know the neural unit error of the last layer , The error of the previous neural unit can be obtained by recursion through the formula , Then we can get the parameter gradient value that we need to update the parameters .
δ i l = { δ 1 l + 1 w 1 i l + 1 + δ 2 l + 1 w 2 i l + 1 + ⋯ + δ m l + 1 w m i l + 1 } a ′ ( z i l ) \delta _{i}^{l}=\left\{ \delta _{1}^{l+1}w_{1i}^{l+1}+\delta _{2}^{l+1}w_{2i}^{l+1}+\cdots +\delta _{m}^{l+1}w_{mi}^{l+1} \right\} a'\left( z_{i}^{l} \right) δil={ δ1l+1w1il+1+δ2l+1w2il+1+⋯+δml+1wmil+1}a′(zil)
Use the above formula , We can calculate the error of the neural unit of all the parameters that need to be updated one by one through the recursive relationship , Find out the delta in the future , We can calculate the partial derivative of the weight and bias to the loss function through the relationship between the neural unit error and the weight and weight . As shown in the following formula .
{ ∂ C ∂ w j i l = δ j l a j l − 1 ∂ C ∂ b j l = δ j l ( l = 2 , 3 ⋯ ) \begin{cases} \frac{\partial C}{\partial w_{ji}^{l}}=\delta _{j}^{l}a_{j}^{l-1}\\ \frac{\partial C}{\partial \mathrm{b}_{j}^{l}}=\delta _{j}^{l}\\ \end{cases}\left( l=2,3\cdots \right) ⎩⎨⎧∂wjil∂C=δjlajl−1∂bjl∂C=δjl(l=2,3⋯)
After getting the offset for the loss function , Our goal is to update the weight of the network , The error of the network is further reduced , Update the weight through the following formula .
Evaluation indicators of the model
After the model is built , We need to evaluate the effect of the model , Then take the evaluation effect as a reference, and then continue to adjust the weight and offset of the model , To achieve the best results . Evaluate indicators according to different models , There will be different evaluation results , And no model can guarantee to be the best in all evaluations , Therefore, the strength of the model also depends on what kind of problems to solve . For example, the fruit quality detection machine in the automatic generation line , Its purpose is to send good fruits to the front of the production line , Pick out the bad fruit , Even if there are still some miscalculations , But the impact will not be too great . The criminal investigation in criminal investigation is different , The purpose of the model is to identify criminals , And don't want to have too high misjudgment .
The evaluation index commonly used in convolutional neural networks, that is, the loss function, is the cross entropy loss function . Cross entropy can measure the difference degree of two different probability distributions in the same random variable , In machine learning, it is expressed as the difference between the real probability distribution and the predicted probability distribution . The smaller the cross entropy is , The better the model predicts . The following formula is the cross entropy loss function .
C = − 1 n ∑ i = 1 n [ y i ln ( σ ( z ) ) + ( 1 − y i ) ln ( 1 − σ ( z ) ) ] C=-\frac{1}{n}\sum_{i=1}^n{\left[ y_i\ln \left( \sigma \left( z \right) \right) +\left( 1-y_i \right) \ln \left( 1-\sigma \left( z \right) \right) \right]} C=−n1i=1∑n[yiln(σ(z))+(1−yi)ln(1−σ(z))]
Analysis of training results
Through training curve analysis
The following figure shows the loss value, test set accuracy and test set accuracy curve for each period of training . among Batch Size The size is 128, The learning rate is 0.1, Training on models 50 Time . The horizontal axis value in the figure represents the number of iteration rounds , The vertical axis value represents the model loss rate and training accuracy . As you can see from the diagram , Training iterations to 35 The accuracy of the training set and the accuracy of the test set are basically stable , The test accuracy of the training set can reach 99.4%, The highest accuracy of the test set is 60.5%. At this time, the loss value is stable at a small value and fluctuates slightly , Basically, it can maintain a stable trend , At this time, the performance of the training model is better .
Output curve implementation code :
def draw(train_acc, val_acc, epoch, loss):
x = torch.tensor(train_acc_list)
y = torch.tensor(val_acc_list)
l = torch.tensor(loss_list)
z = torch.tensor(epoch_list)
plt.ylim((0, 2))
# plt.scatter(z.numpy(), x.data.numpy(), c='r', marker='x')
# plt.scatter(z.numpy(), y.data.numpy(), c='b')
# plt.scatter(z.numpy(), l.data.numpy(), c='g', marker='o')
plt.plot(z.numpy(), x.data.numpy(), color='red', linestyle='--', label='train_acc')
plt.plot(z.numpy(), y.data.numpy(), color='blue',linestyle='-.', label='val_acc')
plt.plot(z.numpy(), l.data.numpy(), color='green', label='loss_rate')
# plt.pause(0.005)
Call this function once every training round , Then update the image .
The following is the dynamic picture effect in the training process . In order to display the effect intuitively , Points are marked in each round of training .
Analyze the effect through confusion matrix
Confusion matrix is also called error matrix , It is a standard format for precision evaluation , use n That's ok n The matrix form of a column . As shown in the figure below , It means 7 Confusion matrix of expressions , The positive diagonal of the matrix represents the judgment accuracy of each expression . The abscissa in this figure represents the prediction type of facial expression , The ordinate represents the correct category of facial expressions .
From the figure, we can see that the accuracy of discrimination for happiness can reach 85%, That is to say, in the training set, there are 85% Can be correctly identified , The happy expression was wrongly judged as angry 、 sad 、 The error rates of surprise and nature are 3%、5%、2% and 5%. stay 7 Class expression judgment , The recognition accuracy of fear is the lowest , The correct rate is only 28%, That is, in all fear data sets , Only 28% The picture of is judged to be correct . so , Because there are few training sets corresponding to fear , The error rate of final judgment is high .
Confusion matrix implementation code :
def plot_confusion_matrix(cm, savename, title='Confusion Matrix'):
plt.figure(figsize=(12, 8), dpi=100)
np.set_printoptions(precision=2) # Control the number of decimal points of output to 2
# The probability value of each lattice in the confusion matrix
ind_array = np.arange(len(classes)) # Generate a length of * A list of [1,2,3....]
x, y = np.meshgrid(ind_array, ind_array) # Draw a square coordinate on the grid
for x_val, y_val in zip(x.flatten(), y.flatten()):
c = cm[y_val][x_val]
if c > 0.001:
plt.text(x_val, y_val, "%0.2f" % (c,), color='red', fontsize=15, va='center', ha='center')
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Pastel2)
xlocations = np.array(range(len(classes)))
plt.xticks(xlocations, classes, rotation=90)
plt.yticks(xlocations, classes)
plt.ylabel('Actual label')
plt.xlabel('Predict label')
# offset the tick
tick_marks = np.array(range(len(classes))) + 0.5
plt.gca().set_xticks(tick_marks, minor=True)
plt.gca().set_yticks(tick_marks, minor=True)
plt.grid(True, which='minor', linestyle='-')
# show confusion matrix
plt.savefig(savename, format='png')
Recognize the expression through the camera
Design process
According to the model trained in the previous section, design a system that can call the camera to recognize facial expressions in real time , The expressions to be recognized by the system are divided into 7 Kind of , Namely : sad 、 happy 、 Fear 、 anger 、 Neutral 、 Disgust and surprise . The recognition result will display the label on the recognition interface , At the same time, the recognition probability is displayed . The identification process of the system has the following steps :
(1) The program that runs the system , Automatically turn on the camera ;
(2) After the camera is turned on , Automatically capture pictures , Face detected , Frame the face ;
(3) Cut the face in the box , And send it to the system for gray processing , Size normalization , Turn into TENSOR Format into convolutional neural network ;
(3) Convolution neural network is propagated forward to get the pair 7 An array of prediction probabilities of expression like
(4) Take the maximum output of convolution neural as the prediction value , Output the result label and corresponding probability on the system interface .
Effect demonstration
The above figure demonstrates the effect of expression recognition , The number represents the value after rounding off the maximum value in the probability matrix output by the convolution neural network . Indicates the correct probability of the system judging the expression .
Part of the code shows
pre = model.forward(img)
pre = F.softmax(pre, dim=1) # dim = 0 Is column operation ,dim = 1 It's a line operation
pro = pre.cpu().data.numpy()
max_num = max(pro[0])
result = round(max_num, 2)
# print(max_num)
pred = np.argmax(pro, axis=1)
# pre = model(img).item()
# print(pre)
# pre = pre.max(1)[1]
frame = cv2.putText(frame, emotion[pred[0]], (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 2.5, (55,255,155), 2)
# The first parameter to display the window is the window name , The second parameter is content
frame = cv2.putText(frame, str(result), (400, 100), cv2.FONT_HERSHEY_SIMPLEX, 2.5, (55, 255, 155), 2)
cv2.imshow('emotion', frame)
if cv2.waitKey(1) == ord('q'):# Press q sign out
Bloggers write this blog to record some of their learning experiences , Blog for the first time , So there may be some mistakes in many places , I hope you will forgive me a lot , Please give more advice in the future .
- Final review of Database Principles
- PHP mnemonic code full text 400 words to extract the first letter of each Chinese character
- 单调栈-503. 下一个更大元素 II
- Concurrent programming (V) detailed explanation of atomic and unsafe magic classes
- How to deal with the core task delay caused by insufficient data warehouse resources
- 22-06-28 西安 redis(02) 持久化机制、入门使用、事务控制、主从复制机制
- [rust notes] 13 iterator (Part 1)
- 796 · unlock
- OpenGL learning notes
- 【Rust笔记】06-包和模块
Gif remove blank frame frame number adjustment
Es8 async and await learning notes
Concurrent programming (III) detailed explanation of synchronized keyword
Unity interactive water ripple post-treatment
了解小程序的笔记 2022/7/3
Redux - learning notes
Deep parsing (picture and text) JVM garbage collector (II)
Explain sizeof, strlen, pointer, array and other combination questions in detail
单调栈-42. 接雨水
Drawing maze EasyX library with recursive backtracking method
Advanced OSG collision detection
Chocolate installation
JS ternary operator - learning notes (with cases)
Message pack in C deserializes array objects
Cesium for unreal quick start - simple scenario configuration
22-05-26 西安 面试题(01)准备
[rust notes] 11 practical features
[redis] redis persistent RDB vs AOF (source code)
Osgearth target selection
Location of package cache downloaded by unity packagemanager
Really explain the five data structures of redis
[concurrent programming] Table hopping and blocking queue
Downward compatibility and upward compatibility
[linear table] basic operation of bidirectional linked list specify node exchange
Life cycle of Servlet
Animation_ IK overview
file_ put_ contents
[RPC] RPC remote procedure call