当前位置：网站首页>Learn CV two loss function from scratch (4)

Learn CV two loss function from scratch (4)

2022-07-08 02:19:00 【pogg_】

Insert picture description here
notes ： Most of the content of this blog is not original , But I sort out the data I collected before , And integrate them with their own stupid solutions , Convenient for review , All references have been cited , And has been praised and collected ~

To continue ： Learn from scratch CV Part II loss function （3）

1.3 Sphereface

The Sphereface First mentioned out of CVPR 2017 Of 《SphereFace: Deep Hypersphere Embedding for Face Recognition》, Also known as A-Softmax Loss function , Thesis link ：

https://arxiv.org/abs/1704.08063

The author thinks that ：

triplet loss You need to build triples carefully , inflexible ;
center loss The loss function only emphasizes the degree of aggregation within the class , Pay insufficient attention to the separability between classes .

therefore , The author raises questions ： Is the loss function based on Euclidean distance suitable for the model to learn distinguishing features ？

First , Take a look at it again softmax loss Loss function （ namely softmax+ Cross entropy ）：
Insert picture description here
among , $θ_{j,i} (0\leqθ_{j,i}\leq\pi)$ It's a vector $W_j$ and $x_i$ The angle between . You can see , Loss function and $W_j||$ , $\theta_{j,i}$ and $b_j$ of , Make $W_j||=1$ and $b_j=0$ , You can get modified-softmax Loss function , Pay more attention to angle information ：
Insert picture description here
Although the use of modified-softmax The loss function can learn that the feature has angle discrimination , But this distinction is still not strong enough . therefore , stay $θ_{j,i}$ Multiply by a greater than 1 The integer of , To improve the discrimination ：
Insert picture description here
such , It can expand the distance between classes , It can also reduce the distance within the class .
In short, it's in margin softmax loss On the basis of this, two restrictions are added ||W||=1 and b=0, Make the forecast only depend on W and x The angle between .
The picture below is margin softmax,modified softmax and A-Softmax Experimental comparison of three loss functions on the same batch of data .
Insert picture description here
The following figure shows the experimental results of the paper , Explain from the perspective of hypersphere , Different m The result of the value . among , Dots of different colors represent different categories . It can be seen that , Use A-Softmax Loss function , Will map the learned vector features to the hypersphere ,m=1 It means to degenerate into modified-softmax Loss function , It can be seen that , Although each category has obvious distribution , But the distinction is not obvious . With m The increase of , The distinction will become more and more , But it's getting harder and harder to train .
Insert picture description here

1.4 Cosface

Cosface My proposal is in Tencent AI Lab Published in CVPR 2018 The paper of 《CosFace: Large Margin Cosine Loss for Deep Face Recognition》, Thesis link ：

https://arxiv.org/pdf/1801.09414.pdf

Cosface Loss function , Also known as Large Margin Cosine Loss(LMCL), This method is based on cos Maximize the interval , To expand the distance between classes , Reduce the distance within the class .
from softmax set out （ And Sphereface similar ）, The author found , In order to achieve effective feature learning , $∣ ∣ W j = 1 ∣ ∣$ It is necessary to , That is, normalize the weight . meanwhile , In the test phase , The score of the face pair used in the test is usually calculated according to the cosine similarity between the two feature vectors . This shows that , $∣ ∣ x ∣ ∣$ It has little effect on score calculation , therefore , During the training phase $∣ ∣ x ∣ ∣ = s$ Fixed （ In the paper ,s=64）：
Insert picture description here
among $L_{ns}$ Represents normalized version softmax loss, $θ_{j,i}$ Express $W_j$ And $x$ The angle between . In order to increase differentiation , similar Sphereface equally , Introduce constant m：

among , $W=\frac{W^*}{||W^*||},x=\frac{x^*}{||x^*||},cos(\theta _j,i)=W^T_jx_i$
Insert picture description here
The above figure is the author's explanatory diagram ：

The first one means normal sotfmax loss, It can be seen that the classification boundaries of the two categories overlap , That is, the distinction is not strong ;
The second represents the normalized version softmax loss, At this time, the boundary is already obvious , There is no overlap , However, there are still deficiencies in differentiation ;
The third means A-softmax, At this time, the horizontal and vertical coordinates become $θ$ , Explain from this perspective , Use two lines as the dividing boundary , The author also suggests that , The disadvantage of this loss function is discontinuity （A-Softmax It's the right angle $θ$ Constraint , So now $c o s (θ)$ In the coordinates of , It's a fan-shaped page demarcation area . however A-Softmax Of margin It's not continuous , With $θ$ Reduction ,margin It's also decreasing , When $θ$ be equal to 0 When margin Even disappear ）;
The fourth means Cosface, stay cos(θ) Next , Use two lines as the dividing boundary , There is no intersection between features , The distinction is quite obvious .

With m Value increases , The distinction will become more and more obvious , But the training will be more difficult .
Insert picture description here

1.5 Arcface

The Arcface The loss function is proposed from 《ArcFace: Additive Angular Margin Loss for Deep Face Recognition》, Thesis link ：

https://arxiv.org/abs/1801.07698

Be similar to Sphereface and Cosface,Arcface It also needs to be ordered $∣ ∣ W ∣ ∣ = 1, ∣ ∣ x ∣ ∣ = s$ , Constants are also introduced m, But unlike the previous two , there m It's right $θ$ Make changes ：
Insert picture description here
The picture below is Arcface Calculation flow chart of , First of all, $x$ And $W$ Standardize , Then multiply to get $cos(θ_{j,i})$ , adopt $arccos(cos(θ_{j,i}))$ To get the angle $θ_{j,i}$ , Add the constant m To increase the spacing to get $θ_{j,i}+m$ , Calculated after $cos(θ_{j,i}+m)$ And multiply by the constant s, The last part softmax Functions and cross entropy loss To deal with .
Insert picture description here
The pseudocode given by the author is as follows ：

In fact, I think ,arcface The more important thing is to promote mxnet This framework , Now, ,mxnet It is indeed one of the commonly used frameworks for face recognition .

Analyze from the perspective of spatial characteristics ,ArcFace Than Softmax The feature distribution is more compact , The decision boundary is more obvious .
Insert picture description here
Add ：

arcface The paper experiment is done in great detail , The ideas elaborated in this paper are as follows ：

ArcFace loss：Additive Angular Margin Loss（ Plus angular spacing loss function ）, Normalize eigenvectors and weights , Yes $θ$ Plus angular spacing m , The influence of angle interval on angle is greater than cosine interval More direct . There is a constant linear angle in geometry margin.
ArcFace Is directly in angular space $θ$ Maximize classification boundaries in , and CosFace It's in cosine space $c o s (θ)$ Maximize classification boundaries in . Preprocessing （ Face alignment ）： The face key points are determined by MTCNN testing , Then the cropped aligned face is obtained by similarity transformation .
Training （ Face classifier ）：ResNet50 + ArcFace loss
test ： From face classifier FC1 Extract... From the output of the layer 512 Embedding feature of dimension （ What we often say Embeding）, Calculate the cosine distance for the two input features , Then face verification and face recognition .
The training time in the actual code is resnet + arcface loss + softmax + cross entropy loss.resnet
Extracting images （ It's also a face ） features （ Multidimensional tensor ）;arcface loss Add the angle spacing to the feature and weight parameters , Then output the prediction label ;softmax + cross entropy loss Calculate the error between the prediction label and the actual （ That is, the prediction value of the estimation model we mentioned in the previous article $f (x)$ And the real value $y$ The degree of inconsistency ）.

2. summary

In this article, we talk about several loss functions commonly used in face recognition , Borrow here Arcface The illustrations and knowledge in the paper Mengcius Several views in the boss are supplemented by image explanation ：
Insert picture description here

ArcFace Of has a linear angle in the whole interval margin, The proposed additive angular spacing has good geometric properties .
CosFace Of margen There are nonlinear angular intervals in angular space , But in cos It is a straight line in space ;
SphereFace Of margin It's a fan-shaped page demarcation area , But discontinuous , With $θ$ Reduction ,margin It's also decreasing , When $θ$
be equal to 0 When margin Even disappear ;
Normalized version softmax loss No, margin perhaps margin The distinction is not obvious .