当前位置:网站首页>A specially designed loss is used to deal with data sets with unbalanced categories
A specially designed loss is used to deal with data sets with unbalanced categories
2022-07-02 21:42:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery 
Reading guide
This article is about Google CVPR ' 19 A summary of an article published on , The title of the article is Class-Balanced Loss Based on Effective Number of Samples. It is the most commonly used loss (softmax-cross-entropy、focal loss etc. ) A reweighting scheme for each category is proposed , It can quickly improve the accuracy , Especially when dealing with highly unbalanced data .
This article is about Google CVPR ' 19 A summary of an article published on , The title of the article is Class-Balanced Loss Based on Effective Number of Samples.
It is the most commonly used loss (softmax-cross-entropy、focal loss etc. ) A reweighting scheme for each category is proposed , It can quickly improve the accuracy , Especially when dealing with highly unbalanced data
Of the paper PyTorch Implementation source code :https://github.com/vandit15/Class-balanced-loss-pytorch
Effective number of samples
Dealing with long tailed data sets ( Most of the samples belong to very few classes , Many other classes have very few samples ) When , How to weight different types of losses may be tricky . Usually , The weight is set to the reciprocal of the class sample or the reciprocal of the square root of the class sample .

However , As shown in the figure above , This excess is due to As the number of samples increases , The benefits of new data points will be reduced . The newly added sample is most likely an approximate copy of the existing sample , Especially when training neural networks, a large amount of data is used to enhance ( Such as rescaling 、 Random cutting 、 Flip, etc ) When , Many are such samples . Reweighting with the number of effective samples can get better results .
The number of effective samples can be imagined as n The actual volume covered by samples , The total volume N Represented by total samples .

We wrote :

We can also write it as follows :

This means No j The contribution of samples to the number of effective samples is βj-1.
Another meaning of the above formula is , If β=0, be En=1. Again , When β→1 When En→n. The latter can be easily proved by lobida's law . It means to be N When a large , Number of effective samples and number of samples N identical . under these circumstances , Number of unique prototypes N It's big , Each sample is unique . However , If N=1, This means that all data can be represented by a prototype .
Category equilibrium loss
If there is no additional information , We cannot set separate for each class Beta value , therefore , When using the whole data , We will set it to a specific value ( Usually set to 0.9、0.99、0.999、0.9999 One of them ).
therefore , The category equilibrium loss can be expressed as :

here , L(p,y) It can be any loss .
Class balance Focal Loss

Original version of focal loss There is one α Equilibrium variables . here , We will reweight each class with the number of valid samples .
Similarly , Such a reweighted term can also be applied to other well-known losses (sigmod -cross-entropy, softmax-cross-entropy etc. ).
Realization
Before we start the implementation , One thing to note is that , In use based on sigmoid The loss of training , Use b=-log(C-1) Initialize the deviation of the last layer , among C It's the number of classes , instead of 0. This is because of the settings b=0 It will cause huge losses at the beginning of training , Because the output probability of each class is close to 0.5. therefore , We can assume that a priori class is 1/C, And set it accordingly b Value .
Calculation of weight of each class

The above code line is a simple implementation of getting weights and standardizing them .

ad locum , We get the exclusive calorific value of the weight , In this way, they can be multiplied by the loss value of each class .
experiment

Class equilibrium provides significant benefits , Especially when the data set is highly unbalanced ( out-off-balance = 200,100).
Conclusion
Using the concept of effective sample number , It can solve the problem of data overlap . Because we don't make any assumptions about the dataset itself , Therefore, reweighting is usually applied to multiple data sets and multiple loss functions . therefore , A more appropriate structure can be used to deal with class imbalance , This is important , Because most actual data sets have a large amount of data imbalance .

—END—
The original English text :https://towardsdatascience.com/handling-class-imbalanced-data-using-a-loss-specifically-made-for-it-6e58fd65ffab
download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , cover Expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing And more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download the Image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 face recognition etc. 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 individual Actual project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Do not Send ads within the group , Or you'll be invited out , Thanks for your understanding ~


边栏推荐
- Research Report on crude oil tanker industry - market status analysis and development prospect forecast
- [shutter] shutter layout component (fractionallysizedbox component | stack layout component | positioned component)
- Structured text language XML
- Research Report on market supply and demand and strategy of Chinese garden equipment industry
- MySQL learning record (1)
- Investment strategy analysis of China's electronic information manufacturing industry and forecast report on the demand outlook of the 14th five year plan 2022-2028 Edition
- Centos7 installation and configuration of redis database
- Go web programming practice (1) -- basic syntax of go language
- Redis -- three special data types
- [shutter] shutter layout component (opacity component | clipprect component | padding component)
猜你喜欢
![[shutter] shutter layout component (opacity component | clipprect component | padding component)](/img/6b/4304be6a4c5427dcfc927babacb4d7.jpg)
[shutter] shutter layout component (opacity component | clipprect component | padding component)

How does esrally perform simple custom performance tests?

26 FPS video super-resolution model DAP! Output 720p Video Online
![[dynamic planning] p1220: interval DP: turn off the street lights](/img/b6/405e29ca88fac40caee669a3b7893f.jpg)
[dynamic planning] p1220: interval DP: turn off the street lights

【零基础一】Navicat下载链接

Off chip ADC commissioning record

MySQL learning record (2)

System (hierarchical) clustering method and SPSS implementation

Cardinality sorting (detailed illustration)
![[hands on deep learning]02 softmax regression](/img/47/eb67ec2c51f6bb7d6b2879b36e769d.jpg)
[hands on deep learning]02 softmax regression
随机推荐
[shutter] shutter layout component (opacity component | clipprect component | padding component)
Analyze comp-206 advanced UNIX utils
[dynamic planning] p1220: interval DP: turn off the street lights
Research Report on micro vacuum pump industry - market status analysis and development prospect prediction
Analysis of enterprise financial statements [3]
[shutter] the shutter plug-in is used in the shutter project (shutter plug-in management platform | search shutter plug-in | install shutter plug-in | use shutter plug-in)
股票开户要找谁?手机开户是安全么?
Analysis of enterprise financial statements [4]
[shutter] statefulwidget component (floatingactionbutton component | refreshindicator component)
How to test the process of restoring backup files?
[CV] Wu Enda machine learning course notes | Chapter 12
基本IO接口技术——微机第七章笔记
System (hierarchical) clustering method and SPSS implementation
Who do you want to open a stock account? Is it safe to open a mobile account?
[shutter] shutter layout component (wrap component | expanded component)
Internet Explorer ignores cookies on some domains (cannot read or set cookies)
Read a doctor, the kind that studies cows! Dr. enrollment of livestock technology group of Leuven University, milk quality monitoring
[shutter] shutter layout component (Introduction to layout component | row component | column component | sizedbox component | clipoval component)
Share the easy-to-use fastadmin open source system - Installation
Golang embeds variables in strings