当前位置:网站首页>4. Cross entropy
4. Cross entropy
2022-07-08 01:02:00 【booze-J】
article
Cross entropy (cross-entropy)
1. Quadratic cost function (quadratic cost)
among ,c It's a cost function ,x Presentation sample ,y Represents the actual value ,a Represents the output value ,n Represents the total number of samples . For the sake of simplicity , Use a sample as an example to illustrate , At this time, the quadratic cost function is :
Suppose we use the gradient descent method (Gradient descent) To adjust the size of the weight parameter , A weight w And offset b The gradient of is derived as follows :
among ,z Represents the input of a neuron , α \alpha α Is the activation function .w and b Is proportional to the gradient of the activation function , The greater the gradient of the activation function ,w and b The faster you resize , The faster the training converges . Suppose our activation function is sigmoid function :
Suppose our goal is to converge to 1.0.1 Points for 0.82 It's far from the target , The gradient is bigger , The weight adjustment is relatively large .2 Points for 0.98 Closer to the target , The gradient is smaller , The weight adjustment is relatively small . The adjustment plan is reasonable .
If our goal is to converge to 0.1 Points for 0.82 The goal is relatively close , The gradient is bigger , The weight adjustment is relatively large .2 Points for 0.98 It's far from the target , The gradient is smaller , The weight adjustment is relatively small . The adjustment plan is unreasonable .
2. Cross entropy cost function (cross-entropy)
Another way of thinking , We don't change the activation function , It's changing the cost function , Use the cross entropy cost function instead :
among ,C It's a cost function ,x Presentation sample ,y Represents the actual value ,a Represents the output value ,n Represents the total number of samples .
If the output neuron is linear , Then the quadratic cost function is a suitable choice . If the output neuron is S Type of function , Then it is more suitable to use the cross entropy cost function .
3. Logarithmic interpretive cost function (log-likelihood cost)
Logarithmic interpretive function is often used as softmax The cost function of regression , Then the neurons in the output layer are sigmoid function , Cross entropy cost function can be used . The more common practice in deep learning is to softmax As the last layer , At this time, the commonly used cost function is the logarithmic interpretive cost function .
Log likelihood cost function and softmax The combination and cross entropy of sigmoid The combination of functions is very similar . Logarithmic interpretive cost function can be reduced to the form of cross drop cost function in binary classification .
stay tensorflow of use :tf.nn.sigmoid_cross_entropy_with_logits()
To show the following sigmoid Cross line for collocation .tf.nn.softmax_cross_entropy with_logits()
To show the following softmax Cross line for collocation .
Easy to use
We apply it to 3.MNIST Data set classification In the code in , Just modify a simple sentence .
take 3. In the training model
# Define optimizer ,loss_function, The accuracy of calculation during training
model.compile(
optimizer=sgd,
loss="mse",
metrics=['accuracy']
)
It is amended as follows
# Define optimizer ,loss_function, The accuracy of calculation during training
model.compile(
optimizer=sgd,
loss="categorical_crossentropy",
metrics=['accuracy']
)
Then run the whole code :
contrast 3.MNIST Data set classification Results of operation , It can be found that the classification model using cross entropy as the loss function can make the model converge faster , The effect is better. .
Complete code
The code running platform is jupyter-notebook, Code blocks in the article , According to jupyter-notebook Written in the order of division in , Run article code , Glue directly into jupyter-notebook that will do .
1. Import third-party library
import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
2. Loading data and data preprocessing
# Load data
(x_train,y_train),(x_test,y_test) = mnist.load_data()
# (60000, 28, 28)
print("x_shape:\n",x_train.shape)
# (60000,) Not yet one-hot code You need to operate by yourself later
print("y_shape:\n",y_train.shape)
# (60000, 28, 28) -> (60000,784) reshape() Middle parameter filling -1 Parameter results can be calculated automatically Divide 255.0 To normalize
x_train = x_train.reshape(x_train.shape[0],-1)/255.0
x_test = x_test.reshape(x_test.shape[0],-1)/255.0
# in one hot Format
y_train = np_utils.to_categorical(y_train,num_classes=10)
y_test = np_utils.to_categorical(y_test,num_classes=10)
3. Training models
# Creating models Input 784 Neurons , Output 10 Neurons
model = Sequential([
# Define output yes 10 Input is 784, Set offset to 1, add to softmax Activation function
Dense(units=10,input_dim=784,bias_initializer='one',activation="softmax"),
])
# Define optimizer
sgd = SGD(lr=0.2)
# Define optimizer ,loss_function, The accuracy of calculation during training
model.compile(
optimizer=sgd,
loss="categorical_crossentropy",
metrics=['accuracy']
)
# Training models
model.fit(x_train,y_train,batch_size=32,epochs=10)
# Evaluation model
loss,accuracy = model.evaluate(x_test,y_test)
print("\ntest loss",loss)
print("accuracy:",accuracy)
边栏推荐
- Service Mesh的基本模式
- STL -- common function replication of string class
- 攻防演练中沙盘推演的4个阶段
- NTT template for Tourism
- CVE-2022-28346:Django SQL注入漏洞
- Marubeni official website applet configuration tutorial is coming (with detailed steps)
- Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades(KDD20)
- Is it safe to speculate in stocks on mobile phones?
- Cross modal semantic association alignment retrieval - image text matching
- 13.模型的保存和載入
猜你喜欢
SDNU_ ACM_ ICPC_ 2022_ Summer_ Practice(1~2)
3.MNIST数据集分类
Su embedded training - Day6
基于人脸识别实现课堂抬头率检测
Jemter distributed
SDNU_ACM_ICPC_2022_Summer_Practice(1~2)
[deep learning] AI one click to change the sky
DNS series (I): why does the updated DNS record not take effect?
9. Introduction to convolutional neural network
Invalid V-for traversal element style
随机推荐
Stock account opening is free of charge. Is it safe to open an account on your mobile phone
国外众测之密码找回漏洞
13.模型的保存和载入
Su embedded training - C language programming practice (implementation of address book)
STL -- common function replication of string class
丸子官网小程序配置教程来了(附详细步骤)
13.模型的保存和載入
String usage in C #
13. Enregistrement et chargement des modèles
The method of server defense against DDoS, Hangzhou advanced anti DDoS IP section 103.219.39 x
They gathered at the 2022 ecug con just for "China's technological power"
AI zhetianchuan ml novice decision tree
50MHz generation time
Reentrantlock fair lock source code Chapter 0
[reprint] solve the problem that CONDA installs pytorch too slowly
Introduction to ML regression analysis of AI zhetianchuan
Tapdata 的 2.0 版 ,开源的 Live Data Platform 现已发布
完整的模型训练套路
FOFA-攻防挑战记录
[OBS] the official configuration is use_ GPU_ Priority effect is true