当前位置:网站首页>Inventory | ICLR 2022 migration learning, visual transformer article summary
Inventory | ICLR 2022 migration learning, visual transformer article summary
2022-06-11 04:54:00 【Shenlan Shenyan AI】
ICLR 2022 Some of the higher scores are more interesting , Or partial DG Of paper.
01 Oral——A Fine-Grained Analysis on Distribution Shift
Article from ICLR 2022 Oral: A Fine-Grained Analysis on Distribution Shift
Model for distribution shifts The robustness of is very important in deployment , Domain generalization (domain generalization) It is the field that specializes in this problem . although DG A lot of research work has emerged , But the vast majority of the work is to use their own methods in some common benchmark Deploy on , Rely on accuracy to verify the generalization ability of the model . How to define accurately distribution shift, And how to improve the robustness of the measurement model of the system is still unsolved . This article tries to answer the above two questions .
The author mainly tests four kinds of methods :
1. Network structure ResNet18, ResNet50, ResNet101,ViT as well as MLP.
2. Heuristic Data Augmentation. Weighted resampling will cause the sample to be reused many times , Lead to over fitting . To reduce over fitting , Heuristic data enhancement can be used to increase training data , for example color jitter. Such methods include AugMix without JSD,RandAugment,AutoAugment etc. .
3. Learned Data Augmentation: Training a conditional generation model from the source domain , New sample points are obtained by sampling . Such methods include CYCLEGAN.
4. Domain generalization Method : Including the common invariant representation. The representative method is as follows IRM,DeepCORAL,DANN,MixUP etc. .
5. Weighted resampling. That is to say, each data point in the source domain is given a weight during training . This method is generalized in the domain , The long tail distribution has a good effect . The representative methods are JTT,BN-Adapt.
6. Representation learning. Here we mainly choose β-VAE This kind of disentanglement Methods .
The author defines three different distribution shifts as shown in the figure below :

1. Spurious correlation: There are some relations between features and labels in training distribution , These relationships do not exist in the test set . Pictured above (a) Shapes and colors in .
2. Low-data drift: Uneven distribution of attribute values during training , During the test, the distribution is even
3. Unseen data shift: Some attribute values will not appear during training , But during the test, there will be .
In addition, there are two data set related shift namely Label noise And the size of the data set .
The main conclusions are as follows :
1. Although the current method can be improved ERM, But there is no method that always performs best .
2. Pre training in various data sets and distribution shift Performed quite well in all cases , Unless the test data set is very large and domain specific , For example, medical images CAMELYON17 Data sets .
3. Heuristic augmentation Not always good for generalization performance , This is related to data sets , A suitable data enhancement method is very important , For example, for rotatedMNIST Data sets , Image rotation is the best enhancement method .
4. Learned data augmentation It can stably bring performance gain in all cases .
5. Domain generalization The method can bring gain on a specific data set , Especially for Low-data drift and Unseen data shift For both offsets , But generally speaking, the improvement is not as good as the heuristic data enhancement .
6. The optimal algorithm is different under different conditions , That is to say, there is no so-called most powerful algorithm at present .
The author also gives some methods to improve generalization performance in practical applications tips:
1. Heuristic data enhancement is simple , You can try to choose different combinations several times .
2. If heuristic data enhancement does not work, You can choose to use CYCLE-GAN Such technologies to learn new data .
3. in general , Pre training has been found to be useful for learning robust representations ,
4. Current domain generalization methods ,disentanglement The method of and reweighting bring limited improvement and are not useful for all data sets . The more complex the method, the more difficult it is generalize To other datasets .
02Oral——Fine-Tuning Distorts Pretrained Features and Underperforms Out-of-Distribution
It is generally considered that there are two methods to deploy the pre training model to the downstream task ,fine-tuning( fine-tuning ) and linear probing, The former will update all parameters of the model , The latter only updates the last linear layer . It is generally believed that fine tuning will lead to better intra distribution accuracy , Better generalization . For example, this article is in Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR→STL, CIFAR10.1, FMoW 6 Experiments were carried out on two data sets , Compared with linear probing, Fine tuning is generally averaged over the distribution 2% Performance improvement of , however OOD There are 6% Performance degradation of .
In this paper, we theoretically analyze the problem of the two-layer linear network with fine tuning and parameterization tradeoff, Describes how fine tuning distorts high-quality pre training features , Which leads to OOD Low accuracy . Based on this theory , The following figure is proposed in this paper (c) The improvement strategy of . A two-stage training strategy , take linear probing As the initialization phase before fine tuning . Such a simple improvement is in ID and OOD Has achieved a very good performance improvement (1% better ID, 8% better OOD).

03 Spotlight——Towards a Unified View of Parameter-Efficient Transfer Learning
Fine tuning large pre training language models in downstream tasks has become NLP Common learning paradigms . The traditional method of fine tuning all parameters of the pre training model is too difficult , Because the parameter quantity is too large . Recent work has proposed a variety of efficient parameter transfer learning methods , These methods adjust only a small amount ( additional ) Parameters for better performance , There are the following categories :

adapter tuning: Will be called adapter A small neural module is inserted into each layer of the pre training network , And only adapter Trained in fine tuning ( Upper figure a)
prefix tuning And prompt tuning: In the input token Add extra multiple trainable token, Only train these when fine tuning token( Upper figure b).
LoRA : stay transformer A trainable low rank matrix is injected into the layer to approximate the weight update .( Upper figure c)
These methods have considerable performance compared with the overall fine-tuning of different task sets , Usually it can't be found by updating 1% The original model parameters . In addition to saving parameters , Parameter efficient tuning can also quickly adapt to new tasks without catastrophic forgetting , And usually in OOD The evaluation shows excellent robustness .
However , At present, little is known about the important factors for the success of these parameter efficient tuning methods , The connection between them is still unclear . This article aims to answer three questions : (1) How these methods relate ? (2) Whether these methods share design elements that are critical to their effectiveness , What are they ? (3) Whether the active ingredients of each method can be transferred to other methods , So as to produce more effective variants ?
In this paper, we first find adapter and prefix tuning Contact on , On this basis, a unified framework is designed , This unified framework can, to a certain extent, reveal the success factors of efficient parameter tuning , And two variants have been bred , Include text summaries in 、 Machine translation (MT)、 Four aspects of text classification and general language understanding NLP Experiments on the benchmark show that , The proposed method uses fewer parameters than the existing methods , At the same time, it is more effective .

04 Spotlight——How Do Vision Transformers Work?
In computer vision ,multi-head self-attention(MSAs) Great success has been achieved , But the reason is not clear , This article provides different explanations to help us understand Vision Transformers (ViTs) Excellent characteristics of . Firstly, the excellent characteristics found in this paper are summarized as follows :
MSAs Not only improve the accuracy , And by making the lost landscape Flattening , Improved generalization .
MSAs And convolution Convs Show opposite behavior . for example ,MSAs It's a low-pass filter , and Convs It's a high pass filter .
Multilayer neural networks behave like a series of small individual models . Besides , The last stage of MSAs Plays a key role in Forecasting .
05 Spotlight——On Predicting Generalization using GANs
Generalize boundaries for deep networks (generalization bounds) The purpose of this study is to provide a method to predict test errors using only training data sets and network parameters . Although generalization boundaries can provide a lot about architectural design 、 Training algorithm, etc intuition, But for now bound Very few good predictions can be made of actual test errors . The current paper examines a simple idea : Test whether the error can be predicted by using synthetic data , That is, using the generative countermeasure network trained on the same training data set (GAN) Data generated ? After studying several GAN Model and architecture , This paper finds that the GANs, Test errors can be predicted , Without any additional hyperparametric tuning .
At present, the upper and lower bounds of generalization error in most papers can be summarized as follows :

here S It's the training data set ,C Is a measure of complexity , This type of bound It is generally very loose and even has little relationship with generalization . This has stimulated more principled empirical studies on the effectiveness of generalized boundaries .
This article explores a very simple baseline To predict the generalization effect , Train and generate countermeasure network on training data set (GAN), The generalization is predicted by using the performance of generating synthetic data generated by the countermeasure network . Although I say GAN The generator of will render mode crash (collapse), in other words , The resulting distribution is only a small subset of the real distribution , And there are theories and experiments that , This may be difficult to avoid . But this paper finds that GAN The generated data allows a good estimate of the test error ( And generalization error ).
The algorithm in this paper is very simple , As described above .

Some experimental results are shown in the figure below , This simple method is used in most benchmark Has achieved very good results :

however GAN The data distribution generated by itself is not necessarily very diverse , And as mentioned above, it is easy to happen model collapse, So why it can be used to predict generalization in theory is unknown .
06 Poster——Uncertainty Modeling for Out-of-Distribution Generalization
Domain generalization (domain generlaization) It is a very hot research topic at present , It is considered that there is a distribution difference between the training data and the test data , At present, the commonly used method is to treat the feature as a certain value , Without considering their uncertainty . In this paper, we assume that the features consider the potential uncertainty and follow the multivariate Gaussian distribution . therefore , Each characteristic statistic is no longer a deterministic value , But probability points with different probability distributions . Statistics through uncertain characteristics , Models can be trained to mitigate domain perturbations and achieve better robustness to potential domain shifts . In terms of implementation, the algorithm is similar to data enhancement , Will determine a covariance matrix , The eigenvector itself is taken as the mean , Sample in range .

07 Poster——Gradient Matching for Domain Generalization
This paper proposes a new method to promote domain generalization . The main idea is to encourage greater inner product between gradients from different domains . The author proposes a method called FISH Optimization algorithm , Instead of adding an explicit specifier to accomplish this . The authors further show that their proposed method is effective for WILDS and DomainBED these benchmark Urban competitiveness .

author :yearn
| About Deep extension technology |

Shenyan technology was founded in 2018 year 1 month , Zhongguancun High tech enterprise , It is an enterprise with the world's leading artificial intelligence technology AI Service experts . In computer vision 、 Based on the core technology of natural language processing and data mining , The company launched four platform products —— Deep extension intelligent data annotation platform 、 Deep extension AI Development platform 、 Deep extension automatic machine learning platform 、 Deep extension AI Open platform , Provide data processing for enterprises 、 Model building and training 、 Privacy computing 、 One stop shop for Industry algorithms and solutions AI Platform services .
边栏推荐
- Huawei equipment is configured to access the virtual private network through GRE tunnel
- What is the difference between gigabit network card and 10 Gigabit network card?
- Discussion on the construction of Fuzhou nucleic acid testing laboratory
- Record of serial baud rate
- BP neural network derivation + Example
- QT method for generating QR code pictures
- lower_bound,upper_bound,二分
- Pychart displays pictures with imshow
- 关于串口波特率的的记录
- go单元测试实例;文件读写;序列化
猜你喜欢

Powerful new UI installation force artifact wechat applet source code + multiple templates support multiple traffic main modes

Deep extension technology: intelligent OCR recognition technology based on deep learning has great potential

Comparison of gigabit network card chips: who is better, a rising star or a Jianghu elder?

选择数字资产托管人时,要问的 6 个问题

Paper reproduction: expressive body capture

华为设备配置BGP/MPLS IP 虚拟专用网

Best practices and principles of lean product development system

MySQL regularly deletes expired data.

The central rural work conference has released important signals. Ten ways for AI technology to help agriculture can be expected in the future

Detailed decomposition of the shortest path problem in Figure
随机推荐
关于串口波特率的的记录
Huawei equipment is configured with cross domain virtual private network
Discussion on the construction of Fuzhou nucleic acid testing laboratory
Holiday Homework
What is the difference between a wired network card and a wireless network card?
Google drive download failed, network error
Decision tree (hunt, ID3, C4.5, cart)
How to purchase 25g optical network card
Win10+manjaro dual system installation
The solution "no hardware is configured for this address and cannot be modified" appears during botu simulation
Huawei equipment is configured to access the virtual private network through GRE
Titanic rescued - re exploration of data mining (ideas + source code + results)
Legend has it that setting shader attributes with shader ID can improve efficiency:)
Pychart displays pictures with imshow
一大厂95后程序员对部门领导不满,删库跑路被判刑
lower_ bound,upper_ Bound, two points
BP neural network derivation + Example
Codesys get System Time
Leetcode question brushing series - mode 2 (datastructure linked list) - 160:intersection of two linked list
Paper reproduction: pare