当前位置：网站首页>14、Transformer--VIT TNT BETR

14、Transformer--VIT TNT BETR

2022-07-05 20:29:00 【C--G】

VIT–Vision Transformer

Insert picture description here

Insert picture description here
VIT Architecture diagram

VIT For image classification tasks , It's used here transformer The encoder , Divide the picture into nine pieces , Add the position coding, convert it into one dimension, and then put it into the encoder , Encoder has 9 Inputs token, among 0 Number token And others 9 position token Interactive calculation , It integrates other 9 position token Characteristic information of , So we just need 0 Number token that will do , The back is MLP Head And classification

CNN The problem of
transformer advantage
The formula
VIT pattern
Location code
Effect analysis
Code link
https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_classification/vision_transformer

TNT-Transformer in Transformer

Insert picture description here

Basic composition
Sequence construction

Insert picture description here

Basic calculation

Insert picture description here

Location code

Insert picture description here

PatchEmbedding visualization

Insert picture description here

BETR

object detection
The basic idea

Parallel prediction 100 A coordinate box , No objects , That's the background
Network architecture

Insert picture description here
cnn Obtain one-dimensional characteristic graph ,positional encoding Get location code , And VIT Different ,BETR No, 0 Number token, With the traditional Transformer Decoder Different ,BETR By object queries How many coordinate frames are generated at a time , Each box is in parallel with encoder Output to match , Re pass prediction heads Determine whether it is the target box

Encoder The task of

encoder The result of providing attention to goals is better than cnn The result of characteristic graph , It is conducive to the decoder to quickly identify the target , As shown in the figure ,encoder It can also recognize objects well in case of occlusion
Network architecture
Output match
The role of attention
Google source code
https://github.com/google-research/bert
Data resources – Big guy's blog
https://blog.csdn.net/qq_37774399/article/details/121748163