当前位置:网站首页>【Transformer】TransMix: Attend to Mix for Vision Transformers
【Transformer】TransMix: Attend to Mix for Vision Transformers
2022-07-29 06:03:00 【Dull cat】
List of articles

Code :https://github.com/Beckschen/TransMix
One 、 Background and motivation
be based on mix-up Data enhancement method of ViT This structure is very useful , Because this structure is easy to produce over fitting , however , Previous mixup-based The method has a potential a priori , That is, the linear interpolation ratio of the target is the same as the interpolation ratio of the input whole graph . This will lead to mixed image There may be no effective goal in it , But there will still be label.
In order to make up for the problems caused by the above phenomenon , The author puts forward TransMix, Can be based on attention map Come on label To deal with .
Two 、 Method
2.1 Mixup
Raw input :
Mixup Use a pair of images x A x_A xA and x B x_B xB, And the corresponding label y A y_A yA and y B y_B yB As input
Input and truth processing :
Use the above two images to get false training samples λ x A + ( 1 − λ ) x B \lambda x_A + (1-\lambda)x_B λxA+(1−λ)xB, And truth value λ y A + ( 1 − λ ) y B \lambda y_A + (1-\lambda)y_B λyA+(1−λ)yB, here λ ∈ [ 0 , 1 ] \lambda \in [0,1] λ∈[0,1] It is a slave. Beta The random number obtained by distribution .
Pictured 1 Shown , There is no way for background pixels to match the foreground label Play the same role , That is, not all pixel pairs label Their contribution is the same .
So this article focuses on how to use learnable methods to achieve input and label Unity of space .
The author found ,vision transformer Produced attention map It can be better used in this task .
Pictured 1 Shown , Author use attention map As λ \lambda λ Value ,label It can be re-weighted, The weight of each pixel is different , Therefore, all pixels in the image will not be combined with the same value . And because of the use of attention map, So this method can be applied to any ViT-based Methods , And there are no additional parameters .
2.2 TransMix
CutMix Data to enhance :
CutMix Is a simple way to enhance , Put two label Combine , Create a new label:
- M ∈ { 0 , 1 } H W M\in\{0, 1\}^{HW} M∈{ 0,1}HW, Is a binary mask, Decide where to give up , Where to use
TransMix
A A A It's from cls token To the input image token Of attention map, Represents each of the patch Importance to the final classification results . For bulls attention, The author uses the average method .
Use attention map Yes label To deal with :
The down arrow indicates the nearest neighbor interpolation , You can put M from HW Size down sampling into p Pixel .
In this case , The network can give label Each point of is based on attention map To dynamically allocate weights .
Pseudo code :
3、 ... and 、 effect
TransMix The visualization is as follows :
The first line shows area-based Of label assignment , hold image A Paste a piece of into B On ,TransMix Able to use attention map Yes label Amendment , It can improve the mutation area label Of weight,
边栏推荐
- Spring, summer, autumn and winter with Miss Zhang (4)
- 关于Flow的原理解析
- ReportingService WebService Form身份验证
- 30 knowledge points that must be mastered in quantitative development [what is individual data]?
- Ffmpeg creation GIF expression pack tutorial is coming! Say thank you, brother black fly?
- Huawei 2020 school recruitment written test programming questions read this article is enough (Part 2)
- 与张小姐的春夏秋冬(3)
- [DL] introduction and understanding of tensor
- SQL repair duplicate data
- 【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
猜你喜欢
[database] database course design - vaccination database
How to PR an open source composer project
These process knowledge you must know
【go】defer的使用
Xsan is highly available - xdfs and San are integrated with new vitality
Machine learning makes character recognition easier: kotlin+mvvm+ Huawei ml Kit
[DL] introduction and understanding of tensor
Reporting service 2016 custom authentication
Training log 7 of the project "construction of Shandong University mobile Internet development technology teaching website"
Tear the ORM framework by hand (generic + annotation + reflection)
随机推荐
与张小姐的春夏秋冬(4)
Flink, the mainstream real-time stream processing computing framework, is the first experience.
Process management of day02 operation
Synchronous development with open source projects & codereview & pull request & Fork how to pull the original warehouse
深入理解MMAP原理,让大厂都爱不释手的技术
How to PR an open source composer project
ReportingService WebService Form身份验证
Performance comparison | FASS iSCSI vs nvme/tcp
[go] use of defer
主流实时流处理计算框架Flink初体验。
Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
mysql在查询字符串类型的时候带单引号和不带的区别和原因
"Shandong University mobile Internet development technology teaching website construction" project training log V
Flink connector Oracle CDC synchronizes data to MySQL in real time (oracle19c)
mysql插入百万数据(使用函数和存储过程)
xtrabackup 的使用
SQL repair duplicate data
Centos7 silently installs Oracle
Most PHP programmers don't understand how to deploy safe code
Nailing alarm script