当前位置:网站首页>4. Cross entropy
4. Cross entropy
2022-07-08 01:02:00 【booze-J】
article
Cross entropy (cross-entropy)
1. Quadratic cost function (quadratic cost)
among ,c It's a cost function ,x Presentation sample ,y Represents the actual value ,a Represents the output value ,n Represents the total number of samples . For the sake of simplicity , Use a sample as an example to illustrate , At this time, the quadratic cost function is :
Suppose we use the gradient descent method (Gradient descent) To adjust the size of the weight parameter , A weight w And offset b The gradient of is derived as follows :
among ,z Represents the input of a neuron , α \alpha α Is the activation function .w and b Is proportional to the gradient of the activation function , The greater the gradient of the activation function ,w and b The faster you resize , The faster the training converges . Suppose our activation function is sigmoid function :
Suppose our goal is to converge to 1.0.1 Points for 0.82 It's far from the target , The gradient is bigger , The weight adjustment is relatively large .2 Points for 0.98 Closer to the target , The gradient is smaller , The weight adjustment is relatively small . The adjustment plan is reasonable .
If our goal is to converge to 0.1 Points for 0.82 The goal is relatively close , The gradient is bigger , The weight adjustment is relatively large .2 Points for 0.98 It's far from the target , The gradient is smaller , The weight adjustment is relatively small . The adjustment plan is unreasonable .
2. Cross entropy cost function (cross-entropy)
Another way of thinking , We don't change the activation function , It's changing the cost function , Use the cross entropy cost function instead :
among ,C It's a cost function ,x Presentation sample ,y Represents the actual value ,a Represents the output value ,n Represents the total number of samples .
If the output neuron is linear , Then the quadratic cost function is a suitable choice . If the output neuron is S Type of function , Then it is more suitable to use the cross entropy cost function .
3. Logarithmic interpretive cost function (log-likelihood cost)
Logarithmic interpretive function is often used as softmax The cost function of regression , Then the neurons in the output layer are sigmoid function , Cross entropy cost function can be used . The more common practice in deep learning is to softmax As the last layer , At this time, the commonly used cost function is the logarithmic interpretive cost function .
Log likelihood cost function and softmax The combination and cross entropy of sigmoid The combination of functions is very similar . Logarithmic interpretive cost function can be reduced to the form of cross drop cost function in binary classification .
stay tensorflow of use :tf.nn.sigmoid_cross_entropy_with_logits()
To show the following sigmoid Cross line for collocation .tf.nn.softmax_cross_entropy with_logits()
To show the following softmax Cross line for collocation .
Easy to use
We apply it to 3.MNIST Data set classification In the code in , Just modify a simple sentence .
take 3. In the training model
# Define optimizer ,loss_function, The accuracy of calculation during training
model.compile(
optimizer=sgd,
loss="mse",
metrics=['accuracy']
)
It is amended as follows
# Define optimizer ,loss_function, The accuracy of calculation during training
model.compile(
optimizer=sgd,
loss="categorical_crossentropy",
metrics=['accuracy']
)
Then run the whole code :
contrast 3.MNIST Data set classification Results of operation , It can be found that the classification model using cross entropy as the loss function can make the model converge faster , The effect is better. .
Complete code
The code running platform is jupyter-notebook, Code blocks in the article , According to jupyter-notebook Written in the order of division in , Run article code , Glue directly into jupyter-notebook that will do .
1. Import third-party library
import numpy as np
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
2. Loading data and data preprocessing
# Load data
(x_train,y_train),(x_test,y_test) = mnist.load_data()
# (60000, 28, 28)
print("x_shape:\n",x_train.shape)
# (60000,) Not yet one-hot code You need to operate by yourself later
print("y_shape:\n",y_train.shape)
# (60000, 28, 28) -> (60000,784) reshape() Middle parameter filling -1 Parameter results can be calculated automatically Divide 255.0 To normalize
x_train = x_train.reshape(x_train.shape[0],-1)/255.0
x_test = x_test.reshape(x_test.shape[0],-1)/255.0
# in one hot Format
y_train = np_utils.to_categorical(y_train,num_classes=10)
y_test = np_utils.to_categorical(y_test,num_classes=10)
3. Training models
# Creating models Input 784 Neurons , Output 10 Neurons
model = Sequential([
# Define output yes 10 Input is 784, Set offset to 1, add to softmax Activation function
Dense(units=10,input_dim=784,bias_initializer='one',activation="softmax"),
])
# Define optimizer
sgd = SGD(lr=0.2)
# Define optimizer ,loss_function, The accuracy of calculation during training
model.compile(
optimizer=sgd,
loss="categorical_crossentropy",
metrics=['accuracy']
)
# Training models
model.fit(x_train,y_train,batch_size=32,epochs=10)
# Evaluation model
loss,accuracy = model.evaluate(x_test,y_test)
print("\ntest loss",loss)
print("accuracy:",accuracy)
边栏推荐
- Su embedded training - Day9
- 第四期SFO销毁,Starfish OS如何对SFO价值赋能?
- Service Mesh介绍,Istio概述
- NVIDIA Jetson test installation yolox process record
- 基于人脸识别实现课堂抬头率检测
- 跨模态语义关联对齐检索-图像文本匹配(Image-Text Matching)
- Get started quickly using the local testing tool postman
- How does starfish OS enable the value of SFO in the fourth phase of SFO destruction?
- Complete model verification (test, demo) routine
- [Yugong series] go teaching course 006 in July 2022 - automatic derivation of types and input and output
猜你喜欢
130. Surrounding area
AI遮天传 ML-回归分析入门
Tapdata 的 2.0 版 ,开源的 Live Data Platform 现已发布
Introduction to ML regression analysis of AI zhetianchuan
What has happened from server to cloud hosting?
130. 被圍繞的區域
Huawei switch s5735s-l24t4s-qa2 cannot be remotely accessed by telnet
8.优化器
Cancel the down arrow of the default style of select and set the default word of select
130. 被围绕的区域
随机推荐
【GO记录】从零开始GO语言——用GO语言做一个示波器(一)GO语言基础
Cve-2022-28346: Django SQL injection vulnerability
Su embedded training - Day9
Analysis of 8 classic C language pointer written test questions
8.优化器
STL--String类的常用功能复写
完整的模型训练套路
Handwriting a simulated reentrantlock
Reentrantlock fair lock source code Chapter 0
Su embedded training - Day8
Introduction to ML regression analysis of AI zhetianchuan
C# ?,?.,?? .....
Password recovery vulnerability of foreign public testing
Jemter distributed
130. Surrounding area
Y59. Chapter III kubernetes from entry to proficiency - continuous integration and deployment (III, II)
1293_ Implementation analysis of xtask resumeall() interface in FreeRTOS
Stock account opening is free of charge. Is it safe to open an account on your mobile phone
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades(KDD20)
Cancel the down arrow of the default style of select and set the default word of select