当前位置:网站首页>[deep learning 05] cross entropy loss function

[deep learning 05] cross entropy loss function

2022-06-10 13:34:00 Only a little of everything

Basic loss function

The loss function acts

  1. Calculate the difference between the actual output and the target
  2. Provide the basis for us to update the output ( Back propagation )

o u t p u t : 10 , 10 , 20 t a r g e t : 30 , 20 , 50 l o s s = ∣ ( 30 − 10 ) ∣ + ∣ ( 20 − 10 ) ∣ + ∣ ( 50 − 10 ) ∣ = 70 L 1 l o s s = 70 / 3 = 23.33 output: 10,10,20 \\ target : 30,20,50 \\ loss=|(30-10)|+|(20-10)|+\mid ( 50-10) \mid=70\\ L1loss=70 / 3=23.33 output:10,10,20target:30,20,50loss=(3010)+(2010)+(5010)=70L1loss=70/3=23.33

Cross entropy

image-20220603115920208

Two different models are compared , Entropy is needed as an intermediary . such as , The comparative value of gold and silver , They need to be converted into dollars , To compare

1. The amount of information

  1. Different information , Contains different amounts of information , Suppose that the probability of winning the title against Argentina in the following table is 1/8,A My classmate told me that Argentina won the championship , Then the amount of information is very large ( Because it includes Argentina in the top four , finals );B My classmate told me that Argentina was in the finals , Then the amount of information is small .

    image-20220604224650165
  2. hypothesis f(x):= The amount of information (:= Is a definition. ),x It's information

    f ( o root Ting Seize crown ) = f ( o root Ting Into the " " ) + f ( o root Ting win 了 " " ) f( Argentina wins the championship )= f( Argentina reached the finals )+ f( Argentina won the final ) f o root Ting Seize crown =f o root Ting Into the " " +f o root Ting win " "

    Because the more uncertain the event is , The more information it contains , therefore The independent variable can be changed into the probability of the event

    Then there are :

    f ( 1 / 8 ) = f ( 1 / 4 ) + f ( 1 / 2 ) f(1/8)= f(1/4)+ f(1/2) f1/8=f1/4+f(1/2)

    meanwhile , Must also meet

    P ( o root Ting Seize crown ) = P ( o root Ting Into the " " ) ∗ P ( o root Ting win have to 了 " " ) P( Argentina wins the championship )= P( Argentina reached the finals )*P( Argentina won the final ) P o root Ting Seize crown =P( o root Ting Into the " " )P o root Ting win have to " "

    therefore

    f ( P ( o root Ting Seize crown ) ∗ P ( o root Ting win have to 了 " " ) ) = f ( P ( o root Ting Into the " " ) ) + f ( P ( o root Ting win have to 了 " " ) ) f(P( Argentina wins the championship )*P( Argentina won the final ))=f(P( Argentina reached the finals ))+ f(P( Argentina won the final )) f(P( o root Ting Seize crown )P( o root Ting win have to " " ))=f(P( o root Ting Into the " " ))+f(P( o root Ting win have to " " ))

    therefore , There must be... In the expression log

    And because the probability of an event is inversely proportional to the amount of information , So there is -log

image-20220604225652608

2. entropy

entropy : An event , From the original uncertainty to complete certainty , How big difficulty . And the expectation of information quantity , Is entropy H ( P ) H(P) H(P)

image-20220604224535714

H ( P ) : = E ( P f ) = ∑ i = 1 m p i ⋅ f ( p i ) = ∑ i = 1 m p i ( − log ⁡ 2 p i ) = − ∑ i = 1 m p i ⋅ log ⁡ 2 p i \begin{array}{c} H(P):=E\left(P_{f}\right) \\ =\sum_{i=1}^{m} p_{i} \cdot f\left(p_{i}\right)=\sum_{i=1}^{m} p_{i}\left(-\log _{2} p_{i}\right)=-\sum_{i=1}^{m} p_{i} \cdot \log _{2} p_{i} \end{array} H(P):=E(Pf)=i=1mpif(pi)=i=1mpi(log2pi)=i=1mpilog2pi

P f P_f Pf Is the total amount of information , f ( p i ) f(p_i) f(pi) Is the amount of information about the event , p i p_i pi Is the probability that the event will occur

The smaller the cross entropy , The closer the two models are

3. Relative entropy (KL The divergence )

f Q ( q i ) f_Q(q_i) fQ(qi) Express Q The amount of information in the system ; f P ( p i ) f_P(p_i) fP(pi) yes P The amount of information in the system

D K L ( P ∣ ∣ Q ) D_KL(P||Q) DKL(PQ) Represents the relative entropy of two systems , Or say KL The divergence

D K L ( P ∣ ∣ Q ) D_KL(P||Q) DKL(PQ) With P Benchmarking , Think about it P、Q What's the difference

D K L ( Q ∣ ∣ P ) D_KL(Q||P) DKL(QP) Said to Q Benchmarking
f Q ( q i ) − f P ( p i ) f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right) fQ(qi)fP(pi) Indicates an event , stay Q The amount of information in the system , subtract P The amount of information in the system

image-20220604231434495

D K L ( P ∥ Q ) : = ∑ i = 1 m p i ⋅ ( f Q ( q i ) − f P ( p i ) ) = ∑ i = 1 m p i ⋅ ( ( − log ⁡ 2 q i ) − ( − log ⁡ 2 p i ) ) = ∑ i = 1 m p i ⋅ ( − log ⁡ 2 q i ) − ∑ m i = 1 m p i ⋅ ( − log ⁡ 2 p i ) \begin{array}{l} \boldsymbol{D}_{\boldsymbol{K} \boldsymbol{L}}(\boldsymbol{P} \| \boldsymbol{Q}) \\ :=\sum_{i=1}^{m} p_{i} \cdot\left(f_{Q}\left(q_{i}\right)-f_{P}\left(p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(\left(-\log _{2} q_{i}\right)-\left(-\log _{2} p_{i}\right)\right) \\ =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right)-\sum_{m i=1}^{m} p_{i} \cdot\left(-\log _{2} p_{i}\right) \end{array} DKL(PQ):=i=1mpi(fQ(qi)fP(pi))=i=1mpi((log2qi)(log2pi))=i=1mpi(log2qi)mi=1mpi(log2pi)

∑ m i = 1 m p i ⋅ ( − log ⁡ 2 p i ) \sum_{m i=1}^{m} p_{i} \cdot(-\log _{2} p_{i}) mi=1mpi(log2pi):P The entropy of , Because we put P Customized benchmark , So when looking at divergence , Just think about it ∑ i = 1 m p i ⋅ ( − log ⁡ 2 q i ) \sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) i=1mpi(log2qi), This part , Namely Cross entropy

202206042318786
  1. Dichotomous problem

    Cross entropy should include all possible results , The result of the second classification is : yes / no , So there must be ( 1 − x i ) ⋅ log ⁡ 2 ( 1 − y i ) (1-x_{i}) \cdot \log _{2}\left(1-y_{i}\right) (1xi)log2(1yi)
    H ( P , Q ) = − ∑ i = 1 n ( x i ⋅ log ⁡ 2 y i + ( 1 − x i ) ⋅ log ⁡ 2 ( 1 − y i ) ) \begin{array}{ll} \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) & =-\sum_{i=1}^{n}\left(x_{i} \cdot \log _{2} y_{i}+\left(1-x_{i}\right) \cdot \log _{2}\left(1-y_{i}\right)\right) \end{array} H(P,Q)=i=1n(xilog2yi+(1xi)log2(1yi))

  2. Multiple classification problem

H ( P , Q ) = ∑ i = 1 m p i ⋅ ( − log ⁡ 2 q i ) \boldsymbol{H}(\boldsymbol{P}, \boldsymbol{Q}) =\sum_{i=1}^{m} p_{i} \cdot\left(-\log _{2} q_{i}\right) \\ H(P,Q)=i=1mpi(log2qi)

Be careful :python in log What's the bottom line e, namely ln

image-20220609201335688

4. Cross entropy

image-20220603115920208

pytorch The cross entropy in is a little different , It is a softmax Function as the probability of the event

w c w_c wc Weight.

Theory is hard , It's really easy to use , Just a code thing ~

loss_fn = nn.CrossEntropyLoss() #  Cross entropy loss 
原网站

版权声明
本文为[Only a little of everything]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206101302413290.html