当前位置：网站首页>[deep learning 05] cross entropy loss function

[deep learning 05] cross entropy loss function

2022-06-10 13:34:00 【Only a little of everything】

List of articles

Basic loss function

The loss function acts

Calculate the difference between the actual output and the target
Provide the basis for us to update the output （ Back propagation ）

$\\ target : 30,20,50 \\ loss=|(30-10)|+|(20-10)|+\mid ( 50-10) \mid=70\\ L1loss=70 / 3=23.33$

Cross entropy

Two different models are compared , Entropy is needed as an intermediary . such as , The comparative value of gold and silver , They need to be converted into dollars , To compare

1. The amount of information

Different information , Contains different amounts of information , Suppose that the probability of winning the title against Argentina in the following table is 1/8,A My classmate told me that Argentina won the championship , Then the amount of information is very large （ Because it includes Argentina in the top four , finals ）;B My classmate told me that Argentina was in the finals , Then the amount of information is small .
hypothesis f(x):= The amount of information （:= Is a definition. ）,x It's information
$f （ o root Ting Seize crown ） = f （ o root Ting Into the " " ） + f （ o root Ting win 了 " " ）$
Because the more uncertain the event is , The more information it contains , therefore The independent variable can be changed into the probability of the event
Then there are ：
$f （ 1 / 8 ） = f （ 1 / 4 ） + f (1 / 2)$
meanwhile , Must also meet
$P （ o root Ting Seize crown ） = P (o root Ting Into the " ") * P （ o root Ting win have to 了 " " ）$
therefore
$f (P (o root Ting Seize crown) * P (o root Ting win have to 了 " ")) = f (P (o root Ting Into the " ")) + f (P (o root Ting win have to 了 " "))$
therefore , There must be... In the expression log
And because the probability of an event is inversely proportional to the amount of information , So there is -log

2. entropy

entropy ： An event , From the original uncertainty to complete certainty , How big difficulty . And the expectation of information quantity , Is entropy $H (P)$

$\begin{array}{c} H(P):=E\left(P_{f}\right) \\ =\sum_{i=1}^{m} p_{i} \cdot f\left(p_{i}\right)=\sum_{i=1}^{m} p_{i}\left(-\log _{2} p_{i}\right)=-\sum_{i=1}^{m} p_{i} \cdot \log _{2} p_{i} \end{array}$

$P_f$ Is the total amount of information , $f(p_i)$ Is the amount of information about the event , $p_i$ Is the probability that the event will occur

The smaller the cross entropy , The closer the two models are

3. Relative entropy （KL The divergence ）

$f_Q(q_i)$ Express Q The amount of information in the system ; $f_P(p_i)$ yes P The amount of information in the system

$D_KL(P||Q)$ Represents the relative entropy of two systems , Or say KL The divergence

$D_KL(P||Q)$ With P Benchmarking , Think about it P、Q What's the difference

$D_KL(Q||P)$ Said to Q Benchmarking
$f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right)$ Indicates an event , stay Q The amount of information in the system , subtract P The amount of information in the system

$\begin{array}{l} \boldsymbol{D}_{\boldsymbol{K} \boldsymbol{L}}(\boldsymbol{P} \| \boldsymbol{Q}) \\ :=\sum_{i=1}^{m} p_{i} \cdot\left(f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(\left(-\log _{2} q_{i}\right)-\left(-\log _{2} p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right)-\sum_{m i=1}^{m} p_{i} \cdot\left(-\log _{2} p_{i}\right) \end{array}$

$\sum_{m i=1}^{m} p_{i} \cdot(-\log _{2} p_{i})$ ：P The entropy of , Because we put P Customized benchmark , So when looking at divergence , Just think about it $\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right)$ , This part , Namely Cross entropy 了

Dichotomous problem
Cross entropy should include all possible results , The result of the second classification is ： yes / no , So there must be $(1-x_{i}) \cdot \log _{2}\left(1-y_{i}\right)$
$\begin{array}{ll} \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) & =-\sum_{i=1}^{n}\left(x_{i} \cdot \log _{2} y_{i}+\left(1-x_{i}\right) \cdot \log _{2}\left(1-y_{i}\right)\right) \end{array}$
Multiple classification problem

$\boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) \\$

Be careful ：python in log What's the bottom line e, namely ln

4. Cross entropy

pytorch The cross entropy in is a little different , It is a softmax Function as the probability of the event

$w_c$ Weight.

Theory is hard , It's really easy to use , Just a code thing ~

loss_fn = nn.CrossEntropyLoss() #  Cross entropy loss

原网站

版权声明
本文为[Only a little of everything]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/161/202206101302413290.html