当前位置：网站首页>[0701] [paper reading] allowing data imbalance issue with perforated input during influence

[0701] [paper reading] allowing data imbalance issue with perforated input during influence

2022-07-02 19:04:00 【xiongxyowo】

[ Address of thesis ] [ Code ] [MICCAI 21]

Abstract

Due to the data imbalance between common diseases and rare diseases , Intelligent diagnosis tends to favor common diseases . Even if the rebalancing strategy is applied during model training , This prejudice may still exist . To further alleviate this prejudice , We have come up with a new method , This method works not in the training stage but in the reasoning stage . For any test input data , Based on the difference between the temperature adjusted classifier output and the target probability distribution obtained from the inverse frequency of different diseases , The input data can be slightly disturbed in a way similar to adversarial learning . Compared with the original input , Classifier predictions of disturbed inputs will become less biased towards common diseases . The proposed reasoning stage method can be naturally combined with the rebalancing strategy of any training stage . Extensive evaluation on three different medical image classification tasks and three classifier backbones shows , Our method can continuously improve the performance of the classifier , Even after training in any rebalancing strategy . Especially in a few categories , The performance improvement is huge , This proves the effectiveness of the proposed method in alleviating the bias of the classifier against the dominant category .

Method

This paper solves the problem of unbalanced data set in medical diagnosis , namely , Samples of rare diseases are difficult to collect . The specific method belongs to a kind of post-processing based on test (Test-Time Postprocessing) Methods , Compared with the traditional pre-test pretreatment (Training-Time Preprocessing) There is a certain novelty in the way . The overall process is as follows ：
Insert picture description here
Consider that the data set contains $C$ Class training data , Among them the first $c$ The number of samples of class is $n_c$ . If a certain category $i$ Of the samples are dominant , So for any input sample $x$ , Output softmax probability $p$ It also tends to be predicted as a category $i$ . Then the practice of this article is also very simple , Disturb the sample during the test , Make it softmax The output is biased towards a few classes .

In the sample $x$ Input to the network , You can get FC Layer of logit Output vector $z = [z_1, z_2,...,z_C]^T$ . The final classification probability is calculated by taking this $z$ Input to softmax In the classifier . about softmax for , It has a temperature coefficient $T$ , In normal classification, we set it to 1 Of , In some tasks, such as knowledge distillation , We will set it to be greater than 1, Thus making softmax Output is more " smooth ". So here is also a truth , increase T, Make the prediction probability of common classes lower , The prediction probability of uncommon classes is improved $\hat{p}_{c}=\frac{\exp \left(z_{c} / T\right)}{\sum_{k=1}^{C} \exp \left(z_{k} / T\right)}$ Of course , Just do this step , It can only be said that the prediction probability gap between different classes can be shortened , Not directly " The class with the second highest probability is optimized into the class with the first highest probability ". And in order to achieve this , This paper realizes a perturbation vector ： $p_{c}^{*}=\frac{g\left(n_{c}\right)}{\sum_{k=1}^{C} g\left(n_{k}\right)}$ So the idea of this thing is actually very primitive . among $g(n_c) = log(M/n_c)$ , That is, the more frequently a class appears in the training set , So $g(n_c)$ The lower it is . Now? , We will $p_{c}^{*}$ As a true value , Then you can get $p_{c}^{*}$ And the original forecast $p_{c}$ The difference of . Based on this difference , You can deduce the corresponding noise that needs to be added ： $\tilde{\mathbf{x}}=\mathbf{x}-\varepsilon \cdot \operatorname{sign}\left(\nabla \ell\left(\hat{\mathbf{p}}, \mathbf{p}^{*}\right)\right)$ So as to achieve the effect of correction in the actual test stage .

Experiment

stay 3 Strip bias Medical data set for Skin7,OCTMNIST,X-ray6 We did experiments on .
Comparison methods include traditional class-level re-weighting,focal loss And more modern two-stage deferred re-sampling,margin-based method LDAM.

原网站

版权声明
本文为[xiongxyowo]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207021737538897.html