当前位置：网站首页>How to balance multiple losses in deep learning?

How to balance multiple losses in deep learning?

2022-06-12 12:58:00 【Xiaobai learns vision】

Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”

 Heavy dry goods , First time delivery

come from | You know

Address | https://www.zhihu.com/question/375794498

edit | AI youdao

In an end-to-end training network , If the final loss = a*loss1+b*loss2+c*loss3..., about a,b,c The choice of these hyperparameters , Is there any way ？

author ：Evan
https://www.zhihu.com/question/375794498/answer/1052779937

In fact, this is an important issue that has been ignored to some extent in the field of deep learning , In recent years, the fire multi-task learning,generative adversarial networks, Many machine learning tasks and methods will encounter , quite a lot paper All the practices are the result of violent mediation and metaphysics …… Here I secretly share with you two very interesting research perspectives

1. Introduce... From the perspective of prediction uncertainty Bayesian frame , According to each loss The current size of the component automatically sets its weight . Representative work. See Alex Kendall Etc. CVPR2018 article ：

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

https://arxiv.org/abs/1705.07115

The second work of the article Yarin Gal yes Zoubin Ghahramani High school , In recent years Bayesian Thought and deep learning have done a lot solid The job of .

2. Build all loss Of Pareto, The corresponding results of multiple hyperparametric combinations are obtained at the ultra-low cost of one training . Representative work. See Intel stay 2018 year NeurIPS（ Yes , It's the machine learning that just changed its name ） Published ：

Multi-Task Learning as Multi-Objective Optimization

http://papers.nips.cc/paper/7334-multi-task-learning-as-multi-objective-optimization

Because they are old acquaintances with the author of the article , It's not embarrassing here , If you are interested, you can read it carefully , Dry cargo is full. .

author ： Yang Kuiyuan - Deep motion
link ：https://www.zhihu.com/question/375794498/answer/1050963528

1. Usually more than one loss There's a balance between , Even a single task , There will be weight decay term . The relatively simple combination can be realized by adjusting the super parameter .

2. For more complex multitasking loss There's a balance between , Here's a direct prediction through the network loss The method of weighting [1]. With two loss For example , and Output from the network , As a whole loss Minimum requirements , So the first two hopes The bigger the better , To prevent degradation , Finally, the third item hopes The smaller the better. . When two loss When one of them is bigger , Their corresponding It will also take a larger value , Make the whole loss To minimize the , That is to deal with dimensional inconsistencies or some loss Large variance problem .

This method was later extended to the field of object detection [2], Used to consider each 2D Possible uncertainties in box labeling .

[1] Alex Kendall, Yarin Gal, Roberto Cipolla. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. CVPR, 2018.

[2] Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang. Bounding Box Regression with Uncertainty for Accurate Object Detection. CVPR, 2019.

author ： Zheng Zejia
link ：https://www.zhihu.com/question/375794498/answer/1056695768

Focal loss According to each task The performance of helps you automatically adjust these parameters .

Our practice is generally divided into several stage Training .stage 0 : task 0, stage 1: task 0 and 1. And so on . stay stage 1 It will be used in the future focal loss.

========== I didn't think I could be two more ===============

That's true .

First of all, for each Task, You have one. Loss Function, And a mapping to [0, 1] Of KPI (key performance indicator) . For example, for classification tasks , Loss function It can be cross entropy loss,KPI It can be Accuracy perhaps Average Precision. about regression For example, we need to IOU Something normalized to [0, 1] Between .KPI The higher it is, the better the performance of the task .

For everyone who comes in batch, Every Task_i There is one loss_i. Every Task i There is a different KPI: k_i. That's the basis Focal loss The definition of ,FL(k_i, gamma_i) = -((1 - k_i)^gamma_i) * log(k_i). Generally speaking, we gamma take 2.

So for this batch Come on , Whole loss = sum(FL(k_i, gamma_i) * loss_i)

Intuitively , This FL, When a task KPI near 0 It tends to be infinitely large , Make your loss Completely by the one who didn't do well task to dominate. So your back prop Will make all the weights according to that kpi Bad task adjustment . When a task performs particularly well KPI near 1 When ,FL It would be 0, Throughout loss The proportion in the will also become very small .

Of course, depending on the learning rate, you may not learn well at the beginning task Behind the other task.http://svl.stanford.edu/assets/papers/guo2018focus.pdf This article talks about how to look like momentum The same gradual update KPI.

Because of the whole loss You should also be right now KPI Derivation , So there are some questions about KPI Derivation of derivation .

Of course, we also said ,KPI near 0 when ,Loss It's going to get big , So don't start training with focal loss, Ensure that the weight of the network is updated to a certain time before adding focal loss.

I hope you have a good training .

author ：Hanson
link ：https://www.zhihu.com/question/375794498/answer/1077922077

For multi task learning , Each group loss The order of magnitude and learning difficulty are not the same , It's hard to find a balance . I give two examples of problems I have encountered in practical applications .

The first is multi task learning algorithm MTCNN, This is one of the most classical algorithms in the field of face detection , Changed by various manufacturers , Its performance is also very good , There are also many versions of open source implementations （ If you don't understand , Portal ）. But I was testing various implementations , It is found that there is no implementation beyond the original version . The following figure shows the implementation of different versions , The code is the result of my repetition .

Different versions mtcnn stay FDDB On roc curve

This is a very disturbing thing , Parameters 、 The network structure is quite different . But the effect is quite different .

clsloss It means confidence score Of loss,boxloss Indicates the position of the prediction box box Of loss,landmarksloss Indicates the key position landmarks Of loss.

that These weights , What should be set to get a good result ？

Actually, I have a good idea , That is, only the necessary two sets of weights are retained , Set the other group to 0, such as . Why do you do this ？ The first reason is that the regression of key points is not necessary in face detection , There is still no big problem after going to this part , Only under this assumption can the next experiment be carried out .

Like this MTCNN Medium ONet, It returns to include score、bbox、landmarks, I am using pytorch When it comes back , There are some interesting situations , Will be landmarks After this task is frozen （ namely ）, Find out ONet The performance of has been greatly improved . It can surpass the performance of the original version .

But add landmarks After the task （） Would be right. cls_loss Impact , This is a contradictory phenomenon . And and a、b、c The corresponding size has a lot to do with . When set to （） The accuracy of key points is really terrible , Almost useless . When set to （） When ,loss To the same order of magnitude ,landmarks The accuracy of is really up , however score But not very satisfactory . If this happens , This proves that there are some defects in the design of this network structure , Need to be modified backbone After that multi-task Branch , Minimize the correlation between the two . Or is it ONet Don't do the key points , Instead, a single network is selected to predict the key points （ For example, add a LNet）.box The regression of is not particularly affected by the key points , Most of the time box and landmarks It is a positive promotion , The degree of influence can be seen as score It's consistent ,box Even if the accuracy of 5%, It can still frame the target , So don't worry too much .

The above experiment is intended to illustrate , Just exist loss Weight combination , Then your network structure must be well designed . Otherwise, you may need to verify your network structure through the above experiments . From the design of a variety of strategies to solve this problem loss Problems caused by imbalance .

The good news ！

Xiaobai learns visual knowledge about the planet

Open to the outside world

 download 1：OpenCV-Contrib Chinese version of extension module 

 stay 「 Xiaobai studies vision 」 Official account back office reply ： Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .


 download 2：Python Visual combat project 52 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply ：Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .


 download 3：OpenCV Actual project 20 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply ：OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .


 Communication group 

 Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition （ It will be subdivided gradually in the future ）, Please scan the following micro signal clustering , remarks ：” nickname + School / company + Research direction “, for example ：” Zhang San  +  Shanghai Jiaotong University  +  Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~

原网站

版权声明
本文为[Xiaobai learns vision]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/163/202206121242154828.html

当前位置：网站首页>How to balance multiple losses in deep learning?

How to balance multiple losses in deep learning?

边栏推荐

猜你喜欢

随机推荐