当前位置:网站首页>[0701] [paper reading] allowing data imbalance issue with perforated input during influence
[0701] [paper reading] allowing data imbalance issue with perforated input during influence
2022-07-02 19:04:00 【xiongxyowo】
[ Address of thesis ] [ Code ] [MICCAI 21]
Abstract
Due to the data imbalance between common diseases and rare diseases , Intelligent diagnosis tends to favor common diseases . Even if the rebalancing strategy is applied during model training , This prejudice may still exist . To further alleviate this prejudice , We have come up with a new method , This method works not in the training stage but in the reasoning stage . For any test input data , Based on the difference between the temperature adjusted classifier output and the target probability distribution obtained from the inverse frequency of different diseases , The input data can be slightly disturbed in a way similar to adversarial learning . Compared with the original input , Classifier predictions of disturbed inputs will become less biased towards common diseases . The proposed reasoning stage method can be naturally combined with the rebalancing strategy of any training stage . Extensive evaluation on three different medical image classification tasks and three classifier backbones shows , Our method can continuously improve the performance of the classifier , Even after training in any rebalancing strategy . Especially in a few categories , The performance improvement is huge , This proves the effectiveness of the proposed method in alleviating the bias of the classifier against the dominant category .
Method
This paper solves the problem of unbalanced data set in medical diagnosis , namely , Samples of rare diseases are difficult to collect . The specific method belongs to a kind of post-processing based on test (Test-Time Postprocessing) Methods , Compared with the traditional pre-test pretreatment (Training-Time Preprocessing) There is a certain novelty in the way . The overall process is as follows :
Consider that the data set contains C C C Class training data , Among them the first c c c The number of samples of class is n c n_c nc. If a certain category i i i Of the samples are dominant , So for any input sample x x x, Output softmax probability p p p It also tends to be predicted as a category i i i. Then the practice of this article is also very simple , Disturb the sample during the test , Make it softmax The output is biased towards a few classes .
In the sample x x x Input to the network , You can get FC Layer of logit Output vector z = [ z 1 , z 2 , . . . , z C ] T z = [z_1, z_2,...,z_C]^T z=[z1,z2,...,zC]T. The final classification probability is calculated by taking this z z z Input to softmax In the classifier . about softmax for , It has a temperature coefficient T T T, In normal classification, we set it to 1 Of , In some tasks, such as knowledge distillation , We will set it to be greater than 1, Thus making softmax Output is more " smooth ". So here is also a truth , increase T, Make the prediction probability of common classes lower , The prediction probability of uncommon classes is improved p ^ c = exp ( z c / T ) ∑ k = 1 C exp ( z k / T ) \hat{p}_{c}=\frac{\exp \left(z_{c} / T\right)}{\sum_{k=1}^{C} \exp \left(z_{k} / T\right)} p^c=∑k=1Cexp(zk/T)exp(zc/T) Of course , Just do this step , It can only be said that the prediction probability gap between different classes can be shortened , Not directly " The class with the second highest probability is optimized into the class with the first highest probability ". And in order to achieve this , This paper realizes a perturbation vector : p c ∗ = g ( n c ) ∑ k = 1 C g ( n k ) p_{c}^{*}=\frac{g\left(n_{c}\right)}{\sum_{k=1}^{C} g\left(n_{k}\right)} pc∗=∑k=1Cg(nk)g(nc) So the idea of this thing is actually very primitive . among g ( n c ) = l o g ( M / n c ) g(n_c) = log(M/n_c) g(nc)=log(M/nc), That is, the more frequently a class appears in the training set , So g ( n c ) g(n_c) g(nc) The lower it is . Now? , We will p c ∗ p_{c}^{*} pc∗ As a true value , Then you can get p c ∗ p_{c}^{*} pc∗ And the original forecast p c p_{c} pc The difference of . Based on this difference , You can deduce the corresponding noise that needs to be added : x ~ = x − ε ⋅ sign ( ∇ ℓ ( p ^ , p ∗ ) ) \tilde{\mathbf{x}}=\mathbf{x}-\varepsilon \cdot \operatorname{sign}\left(\nabla \ell\left(\hat{\mathbf{p}}, \mathbf{p}^{*}\right)\right) x~=x−ε⋅sign(∇ℓ(p^,p∗)) So as to achieve the effect of correction in the actual test stage .
Experiment
stay 3 Strip bias Medical data set for Skin7,OCTMNIST,X-ray6 We did experiments on .
Comparison methods include traditional class-level re-weighting,focal loss And more modern two-stage deferred re-sampling,margin-based method LDAM.
边栏推荐
- ORA-01455: converting column overflows integer datatype
- The difference between promise and observable
- options should NOT have additional properties
- Chain game system development (unity3d chain game development details) - chain game development mature technology source code
- MySQL advanced learning summary 7: MySQL data structure - Comparison of hash index, AVL tree, B tree and b+ tree
- R language uses lrtest function of epidisplay package to perform likelihood ratio test on multiple GLM models (logisti regression). Compare whether the performance of the two models is different, and
- @Component cannot get Dao layer
- R语言dplyr包filter函数筛选dataframe数据、如果需要筛选的数据列(变量)名称中包含引号则需要使用!!sym语法处理、否则因为无法处理引号筛选不到任何数据
- UML class diagram
- R language ggplot2 visualization: gganimate package creates dynamic histogram animation (GIF) and uses transition_ The States function displays a histogram step by step along a given dimension in the
猜你喜欢
STM32G0 USB DFU 升级校验出错-2
故障排查:kubectl报错ValidationError: unknown field \u00a0
MySQL advanced learning summary 7: MySQL data structure - Comparison of hash index, AVL tree, B tree and b+ tree
Distance measurement - Jaccard distance
M2DGR:多源多场景 地面机器人SLAM数据集(ICRA 2022 )
LightGroupButton* sender = static_ cast<LightGroupButton*>(QObject::sender());
文字编辑器 希望有错误的句子用红色标红,文字编辑器用了markdown
工业软件讲堂-三维CAD设计软件的核心技术解析----讲坛第二次讲座
电商系统中常见的 9 大坑,你踩过没?
新加坡暑假旅遊攻略:一天玩轉新加坡聖淘沙島
随机推荐
R语言dplyr包rowwise函数、mutate函数计算dataframe数据中多个数据列在每行的最大值、并生成行最大值对应的数据列(row maximum)
Industrial software lecture - core technology analysis of 3D CAD design software - the second lecture of the Forum
Installation of thingsboard, an open source IOT platform
R language dplyr package Na_ The if function converts the control in the vector value into the missing value Na, and converts the specified content into the missing value Na according to the mapping r
R language uses Cox of epidisplay package Display function obtains the summary statistical information of Cox regression model (risk rate HR, adjusted risk rate and its confidence interval, P value of
How to set vscode to delete the whole line shortcut key?
Google's official response: we have not given up tensorflow and will develop side by side with Jax in the future
问题包含哪些环节
《病人家属,请来一下》读书笔记
UML class diagram
[100 cases of JVM tuning practice] 02 - five cases of virtual machine stack and local method stack tuning
How can retail enterprises open the second growth curve under the full link digital transformation
文字编辑器 希望有错误的句子用红色标红,文字编辑器用了markdown
9D电影是怎样的?(+维度空间常识)
故障排查:kubectl报错ValidationError: unknown field \u00a0
工业软件讲堂-三维CAD设计软件的核心技术解析----讲坛第二次讲座
2022软件工程期末考试 回忆版
M2dgr: slam data set of multi-source and multi scene ground robot (ICRA 2022)
R语言dplyr包filter函数筛选dataframe数据、如果需要筛选的数据列(变量)名称中包含引号则需要使用!!sym语法处理、否则因为无法处理引号筛选不到任何数据
options should NOT have additional properties