当前位置:网站首页>How to balance multiple losses in deep learning?
How to balance multiple losses in deep learning?
2022-06-12 12:58:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery come from | You know
Address | https://www.zhihu.com/question/375794498
edit | AI youdao
In an end-to-end training network , If the final loss = a*loss1+b*loss2+c*loss3..., about a,b,c The choice of these hyperparameters , Is there any way ?
author :Evan
https://www.zhihu.com/question/375794498/answer/1052779937
In fact, this is an important issue that has been ignored to some extent in the field of deep learning , In recent years, the fire multi-task learning,generative adversarial networks, Many machine learning tasks and methods will encounter , quite a lot paper All the practices are the result of violent mediation and metaphysics …… Here I secretly share with you two very interesting research perspectives
1. Introduce... From the perspective of prediction uncertainty Bayesian frame , According to each loss The current size of the component automatically sets its weight . Representative work. See Alex Kendall Etc. CVPR2018 article :
Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
https://arxiv.org/abs/1705.07115
The second work of the article Yarin Gal yes Zoubin Ghahramani High school , In recent years Bayesian Thought and deep learning have done a lot solid The job of .
2. Build all loss Of Pareto, The corresponding results of multiple hyperparametric combinations are obtained at the ultra-low cost of one training . Representative work. See Intel stay 2018 year NeurIPS( Yes , It's the machine learning that just changed its name ) Published :
Multi-Task Learning as Multi-Objective Optimization
http://papers.nips.cc/paper/7334-multi-task-learning-as-multi-objective-optimization
Because they are old acquaintances with the author of the article , It's not embarrassing here , If you are interested, you can read it carefully , Dry cargo is full. .

author : Yang Kuiyuan - Deep motion
link :https://www.zhihu.com/question/375794498/answer/1050963528
1. Usually more than one loss There's a balance between , Even a single task , There will be weight decay term . The relatively simple combination can be realized by adjusting the super parameter .
2. For more complex multitasking loss There's a balance between , Here's a direct prediction through the network loss The method of weighting [1]. With two loss For example ,
and
Output from the network , As a whole loss Minimum requirements , So the first two hopes
The bigger the better , To prevent degradation , Finally, the third item hopes
The smaller the better. . When two loss When one of them is bigger , Their corresponding
It will also take a larger value , Make the whole loss To minimize the , That is to deal with dimensional inconsistencies or some loss Large variance problem .

This method was later extended to the field of object detection [2], Used to consider each 2D Possible uncertainties in box labeling .

[1] Alex Kendall, Yarin Gal, Roberto Cipolla. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. CVPR, 2018.
[2] Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang. Bounding Box Regression with Uncertainty for Accurate Object Detection. CVPR, 2019.
author : Zheng Zejia
link :https://www.zhihu.com/question/375794498/answer/1056695768
Focal loss According to each task The performance of helps you automatically adjust these parameters .
Our practice is generally divided into several stage Training .stage 0 : task 0, stage 1: task 0 and 1. And so on . stay stage 1 It will be used in the future focal loss.
========== I didn't think I could be two more ===============
That's true .
First of all, for each Task, You have one. Loss Function, And a mapping to [0, 1] Of KPI (key performance indicator) . For example, for classification tasks , Loss function It can be cross entropy loss,KPI It can be Accuracy perhaps Average Precision. about regression For example, we need to IOU Something normalized to [0, 1] Between .KPI The higher it is, the better the performance of the task .
For everyone who comes in batch, Every Task_i There is one loss_i. Every Task i There is a different KPI: k_i. That's the basis Focal loss The definition of ,FL(k_i, gamma_i) = -((1 - k_i)^gamma_i) * log(k_i). Generally speaking, we gamma take 2.
So for this batch Come on , Whole loss = sum(FL(k_i, gamma_i) * loss_i)
Intuitively , This FL, When a task KPI near 0 It tends to be infinitely large , Make your loss Completely by the one who didn't do well task to dominate. So your back prop Will make all the weights according to that kpi Bad task adjustment . When a task performs particularly well KPI near 1 When ,FL It would be 0, Throughout loss The proportion in the will also become very small .
Of course, depending on the learning rate, you may not learn well at the beginning task Behind the other task.http://svl.stanford.edu/assets/papers/guo2018focus.pdf This article talks about how to look like momentum The same gradual update KPI.
Because of the whole loss You should also be right now KPI Derivation , So there are some questions about KPI Derivation of derivation .
Of course, we also said ,KPI near 0 when ,Loss It's going to get big , So don't start training with focal loss, Ensure that the weight of the network is updated to a certain time before adding focal loss.
I hope you have a good training .
author :Hanson
link :https://www.zhihu.com/question/375794498/answer/1077922077
For multi task learning , Each group loss The order of magnitude and learning difficulty are not the same , It's hard to find a balance . I give two examples of problems I have encountered in practical applications .
The first is multi task learning algorithm MTCNN, This is one of the most classical algorithms in the field of face detection , Changed by various manufacturers , Its performance is also very good , There are also many versions of open source implementations ( If you don't understand , Portal ). But I was testing various implementations , It is found that there is no implementation beyond the original version . The following figure shows the implementation of different versions , The code is the result of my repetition .

This is a very disturbing thing , Parameters 、 The network structure is quite different . But the effect is quite different .
clsloss It means confidence score Of loss,boxloss Indicates the position of the prediction box box Of loss,landmarksloss Indicates the key position landmarks Of loss.
that
These weights , What should be set to get a good result ?
Actually, I have a good idea , That is, only the necessary two sets of weights are retained , Set the other group to 0, such as
. Why do you do this ? The first reason is that the regression of key points is not necessary in face detection , There is still no big problem after going to this part , Only under this assumption can the next experiment be carried out .
Like this MTCNN Medium ONet, It returns to include score、bbox、landmarks, I am using pytorch When it comes back , There are some interesting situations , Will be landmarks After this task is frozen ( namely
), Find out ONet The performance of has been greatly improved . It can surpass the performance of the original version .
But add landmarks After the task (
) Would be right. cls_loss Impact , This is a contradictory phenomenon . And and a、b、c The corresponding size has a lot to do with . When set to (
) The accuracy of key points is really terrible , Almost useless . When set to (
) When ,loss To the same order of magnitude ,landmarks The accuracy of is really up , however score But not very satisfactory . If this happens , This proves that there are some defects in the design of this network structure , Need to be modified backbone After that multi-task Branch , Minimize the correlation between the two . Or is it ONet Don't do the key points , Instead, a single network is selected to predict the key points ( For example, add a LNet).box The regression of is not particularly affected by the key points , Most of the time box and landmarks It is a positive promotion , The degree of influence can be seen as score It's consistent ,box Even if the accuracy of 5%, It can still frame the target , So don't worry too much .
The above experiment is intended to illustrate , Just exist loss Weight combination , Then your network structure must be well designed . Otherwise, you may need to verify your network structure through the above experiments . From the design of a variety of strategies to solve this problem loss Problems caused by imbalance .
The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world

download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~边栏推荐
- Eight misunderstandings are broken one by one (2): poor performance? Fewer applications? You worry a lot about the cloud!
- 构建嵌入式系统软件开发环境-建立交叉编译环境
- 分享PDF高清版,系列篇
- 嵌入式驱动程序设计
- C language [23] classic interview questions [2]
- R语言ggplot2可视化:使用ggrepel包在线图(line plot)的尾端那个数据点添加数值标签(number label)
- MUI登录数据库完善与AJAX异步处理【MUI+Flask+MongoDB+HBuilderX】
- verilog-mode的简要介绍
- Newoj week 10 question solution
- [cloud native | kubernetes] in depth understanding of deployment (VIII)
猜你喜欢
随机推荐
Brush questions [de1ctf 2019]shellshellshell
在 Debian 10 上独立安装MySQL数据库
itkMultiResolutionImageRegistrationMethod
itk::SymmetricForcesDemonsRegistrationFilter
VTK image sequence mouse interactive flipping
Soft test network engineer notes
2022 ARTS|Week 23
嵌入式系统硬件构成-基于ARM的嵌入式开发板介绍
Object. Detailed explanation of assign()
Differences and recommended uses of VaR, let and const (interview)
【云原生 | Kubernetes篇】Ingress案例实战
Image comparison function after registration itk:: checkerboardimagefilter
Object value taking method in JS And []
常数时间删除/查找数组中的任意元素
Online picture material
ITK multiresolution image itk:: recursivemultiresolutionpyramidimagefilter
Further understanding of the network
verilog-mode的简要介绍
Introduction, installation and use of core JS
unittest框架






![[database] Navicat -- Oracle database creation](/img/40/95d222acd0ae85bd9a4be66aa20d1d.png)

