当前位置：网站首页>CVPR 2022 Tsinghua University proposed unsupervised domain generalization (UDG)

CVPR 2022 Tsinghua University proposed unsupervised domain generalization (UDG)

2022-06-10 20:23:00 【I love computer vision】

Official account , Find out CV The beauty of Technology

In recent days, , Cui Peng's team from Tsinghua University is in CVPR 2022 Published a work on , Generalization for traditional domains (DG) Problems require a lot of problems with tag data , Unsupervised domain generalization is proposed (UDG) problem , The purpose of this paper is to improve the generalization ability of the model in the unknown domain by using unlabeled data for pre training , And for UDG Put forward DARLING Algorithm . The algorithm uses only ImageNet Data volume 1/10 Of unlabeled data can be pre trained in DomainNet Over ImageNet The effect of pre training .

Thesis title ：Towards Unsupervised Domain Generalization
Thesis link ：https://arxiv.org/abs/2107.06219

DG brief introduction & existing DG The problem of

At present, deep learning has made unprecedented progress in many research fields, especially in the field of computer vision , Most deep learning algorithms assume training data （ Data available before application ） And test data （ Examples encountered in practical applications ） It's independent and distributed . When the distribution of training data and test data is different , The full fitting of the traditional depth model to the training data will result in the failure of the prediction on the test data , Thus, the credibility of the model is reduced when it is applied to different environments .

chart 1. Tradition DG

More and more researchers begin to study the domain generalization of model migration to unknown distributed data (Domain Generalization, DG), Pictured 1 Shown . The domain generalization problem aims to improve the prediction performance in the unknown target domain by learning from multiple source domain data .

A lot of existing DG Both methods rely on sufficient training data to learn cross domain invariant representations , However, manually labeling a large number of cross domain data is very expensive or difficult to achieve （ For example, it is difficult to label special fields such as medical pictures , The high cost ）. And there are DG All algorithms use by default ImageNet The parameters of the upper pre training are used as the initialization of the model , and ImageNet As a mixture of several domains , As a pre training, it may introduce bias to the model . For example, for DG One of the benchmark PACS Data sets ,ImageNet Equivalent to ”photo” Domain , about DomainNet Data sets ImageNet Equivalent to ”real” Domain , So this pre training process with category annotation is equivalent to DG On a field in the data , Will introduce a deviation on the domain ( For example, almost all methods are in ”photo” and ”real” The performance on the domain is the best ).

Unsupervised pre generalization (UDG) problem

In order to reduce the dependence on annotated cross domain data to improve the generalization ability of the model , This paper presents the unsupervised domain generalization problem , The purpose is to improve the generalization ability of the model on the unknown domain by using the data without class annotation . Experiments show that using heterogeneous data for unsupervised pre training is appropriate in DG Better than ImageNet Pre training strategy .

Unsupervised domain generalization (UDG) The problem is shown in the figure 2 Shown . stay UDG in , for fear of ImageNet Wait for the pre training data DG The bias caused by the problem , The models are initialized randomly . The model uses unlabeled data from different source domains for pre training , Learn to represent space . Then the model is trained on the source domain data with category labels , Fine tuning representation space and learning classifier , Finally, it is tested on the target domain that has never been seen before . There may be overlap between the fields of pre training and training data , And all the training data ( Including tagged and unlabeled data ) And test data . be used for finetuning The labeled data and the test data share the same category space to support the model learning representation to be mapped by others .

chart 2. Unsupervised domain generalization (UDG) problem

Methods to design

In recent years , Self supervised learning (self-supervised learning) Has made great progress , With SimCLR v2、MoCo v2 The self - supervised learning algorithm, represented by, uses a large number of unlabeled data easily available in the Internet to learn the representation space , It eliminates the dependence of model pre training on labeled data , And has surpassed... In many downstream tasks ImageNet The effect of pre training .

A direct idea is to apply the self supervised learning algorithm to unsupervised domain generalization (UDG) In question , Thus, the generalization ability of the model in the unknown domain is improved by using a large number of easy to obtain unlabeled data . And in the UDG in , There is strong heterogeneity in the data used for pre training , Therefore, the direct use of contrastive learning will lead to the model learning domain related features rather than object category related features , Therefore, it is impossible to accurately identify the object category when testing on an unseen target domain . say concretely , The key of contrastive learning is to distinguish different pictures in negative sample pairs , If the pictures in the negative sample pair come from different domains , And the two domains are very easy to distinguish ( For example, simple stroke field and photo field ), Then the model can easily distinguish the pictures in negative sample pairs according to the characteristics of domain correlation without learning the information that is really effective for downstream tasks ( Such as the characteristics of the object category ), Therefore, the learned feature space performs poorly in the downstream task .

Based on the above observations , This paper proposes Domain-Aware Representation LearnING (DARLING) Algorithm to solve UDG Significant and misleading cross domain heterogeneity in pre training data , Learn features that are domain independent but object related .DARLING Structure diagram 3 Shown .

chart 3. DARLING chart

As mentioned earlier , The existing contrastive learning methods are contrastive loss The effect of heterogeneity in the data is not considered , That is, the calculation method is

among q Is a negative sample queue , And The feature vector obtained by two kinds of preprocessing and encoding for the same picture . and DARLING The influence of negative samples on the mid - range difference is considered , So the picture false label (pseudo label) The generation mechanism of can be modeled as

among Is the domain d Set of sample indexes in . Furthermore, the generation mechanism of each picture field can be modeled as

The function h It can be determined by a parameter The convolutional neural network represents . Therefore, each input sample is given after , The prediction probability of the false tag can be expressed as

so DARLING The comparison loss function of can be expressed as

Intuitively , Of two samples in a negative sample pair “ Domain dependence ” The closer the characteristics of , E-learning distinguishes them and makes more use of them “ Domain independent ” Characteristics of , Therefore, the weight of training loss caused by this negative sample pair should be higher ; On the contrary, when two samples in a negative sample pair “ Domain dependence ” The difference in characteristics is significant enough , The network is more inclined to use “ Domain dependence ” The features of the further push their distance in the representation space , This is not conducive to downstream tasks , So the loss weight of this negative sample pair should be reduced .

DARLING A subnetwork is used to learn the domain similarity of negative sample pairs , And weighted the training loss . In extreme cases , If two samples in each negative sample pair come from the same domain , Then the network can only use “ Domain independent ” The characteristics of , So the learned features focus on the features related to the object category .

in addition , As an unsupervised pre training method ,DARLING The learned parameters can be used as model initialization with all existing DG Algorithm fusion further improves the generalization ability of the model .

experimental result

In this paper PACS,DomainNet and CIFAR-C And so on UDG The meaning of the problem and DARLING The effectiveness of the method .

As shown in the table 1 Shown , stay DomainNet On ,DARLING Better than all existing SOTA Unsupervised / Self supervised learning algorithm , And when the number of categories of pre training data is higher ,DARLING Compared with other methods, the improvement is more obvious .

surface 1. DomainNet Dataset results

surface 2 by DARLING With each SOTA Algorithm in CIFAR-C The result on , because CIFAR-C There are more domains in the , So we can make the pre training data , There is no domain overlap between training data and test data, and there is no category overlap between pre training data and test data , To completely avoid the leakage of domain information or category information in any case during pre training .DARLING Exceeds existing... On all test domains SOTA Algorithm .

surface 2. CIFAR-C Dataset results

surface 3 by DARLING With the existing DG After the algorithm is combined, it is in DomainNet The result on ,DARLING As a pre training model initialization, the existing DG The generalization ability of the algorithm .

surface 3. DARLING With the existing DG Method combination

chart 4 by DARLING And ImageNet Comparison of pre training , When the object category in the data participating in the pre training exceeds 100 when DARLING Better than ImageNet Preliminary training . Please note that when the number of object categories is 100 When used DARLING The data quantity and category quantity of pre training are only ImageNet Of 1/10, And these data have no category labels . This fully demonstrates the use of ImageNet Pre training as DG Initialization of the algorithm is not the best choice , Use far less than ImageNet Data quantity UDG The algorithm can surpass ImageNet The effect of pre training , This is also for the future UDG The algorithm provides a space for foundation and Prospect .

chart 4. And ImageNet Pre training comparison

summary

Unsupervised domain generalization (UDG) The problem is not only alleviated DG The dependence of the algorithm on tagged data , And only a small amount of unlabeled data (1/10) You can reach the same level as ImageNet Pre training has a similar effect , This fully explains ImageNet Pre training is not DG The optimal choice of algorithm initialization , It also provides inspiration and basis for future research on the influence of pre training methods on model generalization ability .