当前位置:网站首页>[deep learning 05] cross entropy loss function
[deep learning 05] cross entropy loss function
2022-06-10 13:34:00 【Only a little of everything】
List of articles
Basic loss function
The loss function acts
- Calculate the difference between the actual output and the target
- Provide the basis for us to update the output ( Back propagation )
o u t p u t : 10 , 10 , 20 t a r g e t : 30 , 20 , 50 l o s s = ∣ ( 30 − 10 ) ∣ + ∣ ( 20 − 10 ) ∣ + ∣ ( 50 − 10 ) ∣ = 70 L 1 l o s s = 70 / 3 = 23.33 output: 10,10,20 \\ target : 30,20,50 \\ loss=|(30-10)|+|(20-10)|+\mid ( 50-10) \mid=70\\ L1loss=70 / 3=23.33 output:10,10,20target:30,20,50loss=∣(30−10)∣+∣(20−10)∣+∣(50−10)∣=70L1loss=70/3=23.33
Cross entropy

Two different models are compared , Entropy is needed as an intermediary . such as , The comparative value of gold and silver , They need to be converted into dollars , To compare
1. The amount of information
Different information , Contains different amounts of information , Suppose that the probability of winning the title against Argentina in the following table is 1/8,A My classmate told me that Argentina won the championship , Then the amount of information is very large ( Because it includes Argentina in the top four , finals );B My classmate told me that Argentina was in the finals , Then the amount of information is small .

hypothesis f(x):= The amount of information (:= Is a definition. ),x It's information
f ( o root Ting Seize crown ) = f ( o root Ting Into the " " ) + f ( o root Ting win 了 " " ) f( Argentina wins the championship )= f( Argentina reached the finals )+ f( Argentina won the final ) f( o root Ting Seize crown )=f( o root Ting Into the " " )+f( o root Ting win 了 " " )
Because the more uncertain the event is , The more information it contains , therefore
The independent variable can be changed into the probability of the eventThen there are :
f ( 1 / 8 ) = f ( 1 / 4 ) + f ( 1 / 2 ) f(1/8)= f(1/4)+ f(1/2) f(1/8)=f(1/4)+f(1/2)
meanwhile , Must also meet
P ( o root Ting Seize crown ) = P ( o root Ting Into the " " ) ∗ P ( o root Ting win have to 了 " " ) P( Argentina wins the championship )= P( Argentina reached the finals )*P( Argentina won the final ) P( o root Ting Seize crown )=P( o root Ting Into the " " )∗P( o root Ting win have to 了 " " )
therefore
f ( P ( o root Ting Seize crown ) ∗ P ( o root Ting win have to 了 " " ) ) = f ( P ( o root Ting Into the " " ) ) + f ( P ( o root Ting win have to 了 " " ) ) f(P( Argentina wins the championship )*P( Argentina won the final ))=f(P( Argentina reached the finals ))+ f(P( Argentina won the final )) f(P( o root Ting Seize crown )∗P( o root Ting win have to 了 " " ))=f(P( o root Ting Into the " " ))+f(P( o root Ting win have to 了 " " ))
therefore , There must be... In the expression log
And because the probability of an event is inversely proportional to the amount of information , So there is -log

2. entropy
entropy : An event , From the original uncertainty to complete certainty , How big difficulty . And the expectation of information quantity , Is entropy H ( P ) H(P) H(P)

H ( P ) : = E ( P f ) = ∑ i = 1 m p i ⋅ f ( p i ) = ∑ i = 1 m p i ( − log 2 p i ) = − ∑ i = 1 m p i ⋅ log 2 p i \begin{array}{c} H(P):=E\left(P_{f}\right) \\ =\sum_{i=1}^{m} p_{i} \cdot f\left(p_{i}\right)=\sum_{i=1}^{m} p_{i}\left(-\log _{2} p_{i}\right)=-\sum_{i=1}^{m} p_{i} \cdot \log _{2} p_{i} \end{array} H(P):=E(Pf)=∑i=1mpi⋅f(pi)=∑i=1mpi(−log2pi)=−∑i=1mpi⋅log2pi
P f P_f Pf Is the total amount of information , f ( p i ) f(p_i) f(pi) Is the amount of information about the event , p i p_i pi Is the probability that the event will occur
The smaller the cross entropy , The closer the two models are
3. Relative entropy (KL The divergence )
f Q ( q i ) f_Q(q_i) fQ(qi) Express Q The amount of information in the system ; f P ( p i ) f_P(p_i) fP(pi) yes P The amount of information in the system
D K L ( P ∣ ∣ Q ) D_KL(P||Q) DKL(P∣∣Q) Represents the relative entropy of two systems , Or say KL The divergence
D K L ( P ∣ ∣ Q ) D_KL(P||Q) DKL(P∣∣Q) With P Benchmarking , Think about it P、Q What's the difference
D K L ( Q ∣ ∣ P ) D_KL(Q||P) DKL(Q∣∣P) Said to Q Benchmarking
f Q ( q i ) − f P ( p i ) f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right) fQ(qi)−fP(pi) Indicates an event , stay Q The amount of information in the system , subtract P The amount of information in the system

D K L ( P ∥ Q ) : = ∑ i = 1 m p i ⋅ ( f Q ( q i ) − f P ( p i ) ) = ∑ i = 1 m p i ⋅ ( ( − log 2 q i ) − ( − log 2 p i ) ) = ∑ i = 1 m p i ⋅ ( − log 2 q i ) − ∑ m i = 1 m p i ⋅ ( − log 2 p i ) \begin{array}{l} \boldsymbol{D}_{\boldsymbol{K} \boldsymbol{L}}(\boldsymbol{P} \| \boldsymbol{Q}) \\ :=\sum_{i=1}^{m} p_{i} \cdot\left(f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(\left(-\log _{2} q_{i}\right)-\left(-\log _{2} p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right)-\sum_{m i=1}^{m} p_{i} \cdot\left(-\log _{2} p_{i}\right) \end{array} DKL(P∥Q):=∑i=1mpi⋅(fQ(qi)−fP(pi))=∑i=1mpi⋅((−log2qi)−(−log2pi))=∑i=1mpi⋅(−log2qi)−∑mi=1mpi⋅(−log2pi)
∑ m i = 1 m p i ⋅ ( − log 2 p i ) \sum_{m i=1}^{m} p_{i} \cdot(-\log _{2} p_{i}) ∑mi=1mpi⋅(−log2pi):P The entropy of , Because we put P Customized benchmark , So when looking at divergence , Just think about it ∑ i = 1 m p i ⋅ ( − log 2 q i ) \sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) ∑i=1mpi⋅(−log2qi), This part , Namely Cross entropy 了

Dichotomous problem
Cross entropy should include all possible results , The result of the second classification is : yes / no , So there must be ( 1 − x i ) ⋅ log 2 ( 1 − y i ) (1-x_{i}) \cdot \log _{2}\left(1-y_{i}\right) (1−xi)⋅log2(1−yi)
H ( P , Q ) = − ∑ i = 1 n ( x i ⋅ log 2 y i + ( 1 − x i ) ⋅ log 2 ( 1 − y i ) ) \begin{array}{ll} \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) & =-\sum_{i=1}^{n}\left(x_{i} \cdot \log _{2} y_{i}+\left(1-x_{i}\right) \cdot \log _{2}\left(1-y_{i}\right)\right) \end{array} H(P,Q)=−∑i=1n(xi⋅log2yi+(1−xi)⋅log2(1−yi))Multiple classification problem
H ( P , Q ) = ∑ i = 1 m p i ⋅ ( − log 2 q i ) \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) \\ H(P,Q)=i=1∑mpi⋅(−log2qi)
Be careful :python in log What's the bottom line e, namely ln

4. Cross entropy

pytorch The cross entropy in is a little different , It is a softmax Function as the probability of the event
w c w_c wc Weight.
Theory is hard , It's really easy to use , Just a code thing ~
loss_fn = nn.CrossEntropyLoss() # Cross entropy loss
边栏推荐
- Shell Encyclopedia
- [NLP] NLP full path learning recommendation
- How about the one-stop machine learning opening platform mlflow?
- Apple邮箱配置QQ邮箱,163邮箱,edu邮箱,gmail邮箱,获取gmail日历
- [Huang ah code] Why is php7 twice as fast as PHP5?
- 2022 ciscn preliminary satool
- 谷歌提出超强预训练模型CoCa,在ImageNet上微调Top-1准确率达91%!在多个下游任务上SOTA!
- RecyclerView多布局写法,“我的”、“个人中心” 页面经典写法演示
- Pychart installation tutorial
- Nanomq newsletter 2022-05 | release of V0.8.0, new webhook extension interface and connection authentication API
猜你喜欢

On distributed transaction

【无标题】音频蓝牙语音芯片,WT2605C-32N实时录音上传技术方案介绍

Simple integration of client go gin six list watch two (about the improvement of RS, pod and deployment)

如果再写for循环,我就锤自己了

智慧校园安全通道及视频监控解决方案

How does the API detect security configuration errors?
![[NLP] NLP full path learning recommendation](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
[NLP] NLP full path learning recommendation

32. Simple test of raspberry pie serial port communication and ultrasonic module ranging
![[spark] (task8) pipeline channel establishment in sparkml](/img/9c/69c6d0cb27906eb895cfc7e4f45f96.png)
[spark] (task8) pipeline channel establishment in sparkml

QT database application 22 file coding format recognition
随机推荐
Recommend an efficient IO component - okio
On distributed transaction
Apple邮箱配置QQ邮箱,163邮箱,edu邮箱,gmail邮箱,获取gmail日历
Ekuiper newsletter 2022-05 protobuf codec support, visual drag and drop writing rules
list. Remove (index) returns false, removal failed
Error:top-left corner pixel must be either opaque white or transparent.
Sohu employees encounter wage subsidy fraud. What is the difference between black property and gray property and how to trace the source?
Neuron Newsletter 2022 - 05 | ajout de 2 entraînements Sud et 1 Application Nord, mise en œuvre de l'extension personnalisée par Modbus TCP
12、 Process address space (PMAP; vdso; MMAP)
Let resources flow freely in the cloud and locally
buuctf [Discuz]wooyun-2010-080723
2022 ciscn preliminary satool
Source of concurrent bugs (I) - visibility
智慧校园安全通道及视频监控解决方案
工作中记录MySQL中的常用函数
施一公等团队登Science封面:AI与冷冻电镜揭示「原子级」NPC结构,生命科学突破
'getcolor (int) 'is deprecated, getcolor is obsolete
TextInputLayout使用详解
Simple integration of client go gin six list watch two (about the improvement of RS, pod and deployment)
Performance test plan (plan) template