当前位置：网站首页>A specially designed loss is used to deal with data sets with unbalanced categories

A specially designed loss is used to deal with data sets with unbalanced categories

2022-07-02 21:42:00 【Xiaobai learns vision】

Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”

Heavy dry goods , First time delivery

Reading guide

This article is about Google CVPR ' 19 A summary of an article published on , The title of the article is Class-Balanced Loss Based on Effective Number of Samples. It is the most commonly used loss (softmax-cross-entropy、focal loss etc. ) A reweighting scheme for each category is proposed , It can quickly improve the accuracy , Especially when dealing with highly unbalanced data .

This article is about Google CVPR ' 19 A summary of an article published on , The title of the article is Class-Balanced Loss Based on Effective Number of Samples.

It is the most commonly used loss (softmax-cross-entropy、focal loss etc. ) A reweighting scheme for each category is proposed , It can quickly improve the accuracy , Especially when dealing with highly unbalanced data

Of the paper PyTorch Implementation source code ：https://github.com/vandit15/Class-balanced-loss-pytorch

Effective number of samples

Dealing with long tailed data sets ( Most of the samples belong to very few classes , Many other classes have very few samples ) When , How to weight different types of losses may be tricky . Usually , The weight is set to the reciprocal of the class sample or the reciprocal of the square root of the class sample .

The traditional weight adjustment is different from the weight adjustment proposed here

However , As shown in the figure above , This excess is due to As the number of samples increases , The benefits of new data points will be reduced . The newly added sample is most likely an approximate copy of the existing sample , Especially when training neural networks, a large amount of data is used to enhance ( Such as rescaling 、 Random cutting 、 Flip, etc ) When , Many are such samples . Reweighting with the number of effective samples can get better results .

The number of effective samples can be imagined as n The actual volume covered by samples , The total volume N Represented by total samples .

Number of effective samples

We wrote ：

Number of effective samples

We can also write it as follows ：

Contribution of each sample

This means No j The contribution of samples to the number of effective samples is βj-1.

Another meaning of the above formula is , If β=0, be En=1. Again , When β→1 When En→n. The latter can be easily proved by lobida's law . It means to be N When a large , Number of effective samples and number of samples N identical . under these circumstances , Number of unique prototypes N It's big , Each sample is unique . However , If N=1, This means that all data can be represented by a prototype .

Category equilibrium loss

If there is no additional information , We cannot set separate for each class Beta value , therefore , When using the whole data , We will set it to a specific value ( Usually set to 0.9、0.99、0.999、0.9999 One of them ).

therefore , The category equilibrium loss can be expressed as ：

here , L(p,y) It can be any loss .

Class balance Focal Loss

Original version of focal loss There is one α Equilibrium variables . here , We will reweight each class with the number of valid samples .

Similarly , Such a reweighted term can also be applied to other well-known losses (sigmod -cross-entropy, softmax-cross-entropy etc. ).

Realization

Before we start the implementation , One thing to note is that , In use based on sigmoid The loss of training , Use b=-log(C-1) Initialize the deviation of the last layer , among C It's the number of classes , instead of 0. This is because of the settings b=0 It will cause huge losses at the beginning of training , Because the output probability of each class is close to 0.5. therefore , We can assume that a priori class is 1/C, And set it accordingly b Value .

Calculation of weight of each class

Calculate the normalized weight

The above code line is a simple implementation of getting weights and standardizing them .

Get labeled onehot tensor

ad locum , We get the exclusive calorific value of the weight , In this way, they can be multiplied by the loss value of each class .

experiment

Class equilibrium provides significant benefits , Especially when the data set is highly unbalanced ( out-off-balance = 200,100).

Conclusion

Using the concept of effective sample number , It can solve the problem of data overlap . Because we don't make any assumptions about the dataset itself , Therefore, reweighting is usually applied to multiple data sets and multiple loss functions . therefore , A more appropriate structure can be used to deal with class imbalance , This is important , Because most actual data sets have a large amount of data imbalance .

—END—

The original English text ：https://towardsdatascience.com/handling-class-imbalanced-data-using-a-loss-specifically-made-for-it-6e58fd65ffab

download 1：OpenCV-Contrib Chinese version of extension module

stay 「 Xiaobai studies vision 」 Official account back office reply ： Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , cover Expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing And more than 20 chapters .

download 2：Python Visual combat project 52 speak

stay 「 Xiaobai studies vision 」 Official account back office reply ：Python Visual combat project , You can download the Image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 face recognition etc. 31 A visual combat project , Help fast school computer vision .

download 3：OpenCV Actual project 20 speak

stay 「 Xiaobai studies vision 」 Official account back office reply ：OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 individual Actual project , Realization OpenCV Learn advanced .

Communication group

Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition （ It will be subdivided gradually in the future ）, Please scan the following micro signal clustering , remarks ：” nickname + School / company + Research direction “, for example ：” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Do not Send ads within the group , Or you'll be invited out , Thanks for your understanding ~

原网站

版权声明
本文为[Xiaobai learns vision]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202151257496550.html

当前位置：网站首页>A specially designed loss is used to deal with data sets with unbalanced categories

A specially designed loss is used to deal with data sets with unbalanced categories

Calculation of weight of each class

边栏推荐

猜你喜欢

随机推荐