当前位置:网站首页>[deep learning 05] cross entropy loss function
[deep learning 05] cross entropy loss function
2022-06-10 13:34:00 【Only a little of everything】
List of articles
Basic loss function
The loss function acts
- Calculate the difference between the actual output and the target
- Provide the basis for us to update the output ( Back propagation )
o u t p u t : 10 , 10 , 20 t a r g e t : 30 , 20 , 50 l o s s = ∣ ( 30 − 10 ) ∣ + ∣ ( 20 − 10 ) ∣ + ∣ ( 50 − 10 ) ∣ = 70 L 1 l o s s = 70 / 3 = 23.33 output: 10,10,20 \\ target : 30,20,50 \\ loss=|(30-10)|+|(20-10)|+\mid ( 50-10) \mid=70\\ L1loss=70 / 3=23.33 output:10,10,20target:30,20,50loss=∣(30−10)∣+∣(20−10)∣+∣(50−10)∣=70L1loss=70/3=23.33
Cross entropy

Two different models are compared , Entropy is needed as an intermediary . such as , The comparative value of gold and silver , They need to be converted into dollars , To compare
1. The amount of information
Different information , Contains different amounts of information , Suppose that the probability of winning the title against Argentina in the following table is 1/8,A My classmate told me that Argentina won the championship , Then the amount of information is very large ( Because it includes Argentina in the top four , finals );B My classmate told me that Argentina was in the finals , Then the amount of information is small .

hypothesis f(x):= The amount of information (:= Is a definition. ),x It's information
f ( o root Ting Seize crown ) = f ( o root Ting Into the " " ) + f ( o root Ting win 了 " " ) f( Argentina wins the championship )= f( Argentina reached the finals )+ f( Argentina won the final ) f( o root Ting Seize crown )=f( o root Ting Into the " " )+f( o root Ting win 了 " " )
Because the more uncertain the event is , The more information it contains , therefore
The independent variable can be changed into the probability of the eventThen there are :
f ( 1 / 8 ) = f ( 1 / 4 ) + f ( 1 / 2 ) f(1/8)= f(1/4)+ f(1/2) f(1/8)=f(1/4)+f(1/2)
meanwhile , Must also meet
P ( o root Ting Seize crown ) = P ( o root Ting Into the " " ) ∗ P ( o root Ting win have to 了 " " ) P( Argentina wins the championship )= P( Argentina reached the finals )*P( Argentina won the final ) P( o root Ting Seize crown )=P( o root Ting Into the " " )∗P( o root Ting win have to 了 " " )
therefore
f ( P ( o root Ting Seize crown ) ∗ P ( o root Ting win have to 了 " " ) ) = f ( P ( o root Ting Into the " " ) ) + f ( P ( o root Ting win have to 了 " " ) ) f(P( Argentina wins the championship )*P( Argentina won the final ))=f(P( Argentina reached the finals ))+ f(P( Argentina won the final )) f(P( o root Ting Seize crown )∗P( o root Ting win have to 了 " " ))=f(P( o root Ting Into the " " ))+f(P( o root Ting win have to 了 " " ))
therefore , There must be... In the expression log
And because the probability of an event is inversely proportional to the amount of information , So there is -log

2. entropy
entropy : An event , From the original uncertainty to complete certainty , How big difficulty . And the expectation of information quantity , Is entropy H ( P ) H(P) H(P)

H ( P ) : = E ( P f ) = ∑ i = 1 m p i ⋅ f ( p i ) = ∑ i = 1 m p i ( − log 2 p i ) = − ∑ i = 1 m p i ⋅ log 2 p i \begin{array}{c} H(P):=E\left(P_{f}\right) \\ =\sum_{i=1}^{m} p_{i} \cdot f\left(p_{i}\right)=\sum_{i=1}^{m} p_{i}\left(-\log _{2} p_{i}\right)=-\sum_{i=1}^{m} p_{i} \cdot \log _{2} p_{i} \end{array} H(P):=E(Pf)=∑i=1mpi⋅f(pi)=∑i=1mpi(−log2pi)=−∑i=1mpi⋅log2pi
P f P_f Pf Is the total amount of information , f ( p i ) f(p_i) f(pi) Is the amount of information about the event , p i p_i pi Is the probability that the event will occur
The smaller the cross entropy , The closer the two models are
3. Relative entropy (KL The divergence )
f Q ( q i ) f_Q(q_i) fQ(qi) Express Q The amount of information in the system ; f P ( p i ) f_P(p_i) fP(pi) yes P The amount of information in the system
D K L ( P ∣ ∣ Q ) D_KL(P||Q) DKL(P∣∣Q) Represents the relative entropy of two systems , Or say KL The divergence
D K L ( P ∣ ∣ Q ) D_KL(P||Q) DKL(P∣∣Q) With P Benchmarking , Think about it P、Q What's the difference
D K L ( Q ∣ ∣ P ) D_KL(Q||P) DKL(Q∣∣P) Said to Q Benchmarking
f Q ( q i ) − f P ( p i ) f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right) fQ(qi)−fP(pi) Indicates an event , stay Q The amount of information in the system , subtract P The amount of information in the system

D K L ( P ∥ Q ) : = ∑ i = 1 m p i ⋅ ( f Q ( q i ) − f P ( p i ) ) = ∑ i = 1 m p i ⋅ ( ( − log 2 q i ) − ( − log 2 p i ) ) = ∑ i = 1 m p i ⋅ ( − log 2 q i ) − ∑ m i = 1 m p i ⋅ ( − log 2 p i ) \begin{array}{l} \boldsymbol{D}_{\boldsymbol{K} \boldsymbol{L}}(\boldsymbol{P} \| \boldsymbol{Q}) \\ :=\sum_{i=1}^{m} p_{i} \cdot\left(f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(\left(-\log _{2} q_{i}\right)-\left(-\log _{2} p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right)-\sum_{m i=1}^{m} p_{i} \cdot\left(-\log _{2} p_{i}\right) \end{array} DKL(P∥Q):=∑i=1mpi⋅(fQ(qi)−fP(pi))=∑i=1mpi⋅((−log2qi)−(−log2pi))=∑i=1mpi⋅(−log2qi)−∑mi=1mpi⋅(−log2pi)
∑ m i = 1 m p i ⋅ ( − log 2 p i ) \sum_{m i=1}^{m} p_{i} \cdot(-\log _{2} p_{i}) ∑mi=1mpi⋅(−log2pi):P The entropy of , Because we put P Customized benchmark , So when looking at divergence , Just think about it ∑ i = 1 m p i ⋅ ( − log 2 q i ) \sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) ∑i=1mpi⋅(−log2qi), This part , Namely Cross entropy 了

Dichotomous problem
Cross entropy should include all possible results , The result of the second classification is : yes / no , So there must be ( 1 − x i ) ⋅ log 2 ( 1 − y i ) (1-x_{i}) \cdot \log _{2}\left(1-y_{i}\right) (1−xi)⋅log2(1−yi)
H ( P , Q ) = − ∑ i = 1 n ( x i ⋅ log 2 y i + ( 1 − x i ) ⋅ log 2 ( 1 − y i ) ) \begin{array}{ll} \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) & =-\sum_{i=1}^{n}\left(x_{i} \cdot \log _{2} y_{i}+\left(1-x_{i}\right) \cdot \log _{2}\left(1-y_{i}\right)\right) \end{array} H(P,Q)=−∑i=1n(xi⋅log2yi+(1−xi)⋅log2(1−yi))Multiple classification problem
H ( P , Q ) = ∑ i = 1 m p i ⋅ ( − log 2 q i ) \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) \\ H(P,Q)=i=1∑mpi⋅(−log2qi)
Be careful :python in log What's the bottom line e, namely ln

4. Cross entropy

pytorch The cross entropy in is a little different , It is a softmax Function as the probability of the event
w c w_c wc Weight.
Theory is hard , It's really easy to use , Just a code thing ~
loss_fn = nn.CrossEntropyLoss() # Cross entropy loss
边栏推荐
- Apple邮箱配置QQ邮箱,163邮箱,edu邮箱,gmail邮箱,获取gmail日历
- Shell Encyclopedia
- 数码管驱动芯片+语音芯片的应用场景介绍,WT588E02B-24SS
- typescript入门笔记(个人用)
- TabLayout 使用详解(修改文字大小、下划线样式等)
- Pycharm安装详细教程
- Google Earth Engine(GEE)——基于s2影像的实时全球10米土地利用/土地覆盖(LULC)数据集
- 如果再写for循环,我就锤自己了
- [Huang ah code] teacher, I want to choose software development related majors after the college entrance examination. Which direction do you think is better? How to fill in the college entrance examin
- 【无标题】音频蓝牙语音芯片,WT2605C-32N实时录音上传技术方案介绍
猜你喜欢

im即时通讯开发:进程被杀底层原理、APP应对被杀技巧

In depth analysis of "circle group" relationship system design | series of articles on "circle group" technology
![buuctf [PHP]XDebug RCE](/img/e2/bcae10e2051b7e9dce918bf87fdc05.png)
buuctf [PHP]XDebug RCE

QT database application 22 file coding format recognition

Copying and deleting files
![[Multisim Simulation] differential amplifier circuit 2](/img/4e/f346a4e0e6171b4b7d8469ead7f250.png)
[Multisim Simulation] differential amplifier circuit 2

Introduction to assembly language - Summary

智慧校园安全通道及视频监控解决方案

The deep neural network classifies nearly 2billion images per second, and the new brain like optical classifier chip is on nature

传奇登录器提示连接服务器失败是怎么回事?怎么解决?
随机推荐
Apple邮箱配置QQ邮箱,163邮箱,edu邮箱,gmail邮箱,获取gmail日历
client-go gin的简单整合六-list-watch二(关于Rs与Pod以及Deployment的完善)
Pychart installation tutorial
Comparison of two BigDecimal data types, addition, subtraction, multiplication and division, and formatting
常见的自动化测试框架有哪些?上海软件测试公司安利
[yellow code] SVN version control tutorial
list. Remove (index) returns false, removal failed
[NLP] NLP full path learning recommendation
将anaconda的bin目录加入PATH
Qt: 访问其他窗体中的控件
【Golang】创建有配置参数的结构体时,可选参数应该怎么传?
传奇登录器提示连接服务器失败是怎么回事?怎么解决?
2022大厂高频软件测试面试真题(附答案)
【无标题】
buuctf [PHP]CVE-2019-11043
Nanomq newsletter 2022-05 | release of V0.8.0, new webhook extension interface and connection authentication API
buuctf [Jupyter]notebook-rce
移动app性能测试有哪些需要进行?性能测试报告如何收费?
Im instant messaging development: the underlying principle of process killed and app skills to deal with killed
Google Earth Engine(GEE)——基于s2影像的实时全球10米土地利用/土地覆盖(LULC)数据集