当前位置:网站首页>CV in transformer learning notes (continuously updated)
CV in transformer learning notes (continuously updated)
2022-07-03 18:23:00 【ZRX_ GIS】
List of articles
Why is it cv Chinese research Transformer
Research background
Transformer stay CV The field is just beginning to emerge ,Transformer Put forward in NLP Good results have been achieved in the direction , Its whole Attention structure , It not only enhances the ability of feature extraction , It also maintains the characteristics of parallel computing , It can be done quickly NLP Most tasks in the field , Greatly promote its development . however , It is hardly used in CV Direction . Before that, only Obiect detection Species DETR Large scale use Transformer, Others include Semantic Segmentation It has not been substantially applied in the field of , pure Transformer The structure of the network is not .
Transformer advantage
1、 Parallel operation ;2、 Global vision ;3、 Flexible stacking capability
Transformer+classfiaction
ViT
original text 《AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE》
ViT Historical significance
1、 Show the CV Use pure Transformer Structural possibilities
2、 Pioneering work in this field
Abstract
although Transformer Architecture has become the de facto standard for natural language processing tasks , But its application in computer vision is still limited . In terms of vision , Attention is either used in conjunction with convolutional networks , Or it can be used to replace some components of convolution network , While keeping its overall structure unchanged . We prove this right cnn Dependency is unnecessary , A pure transformer directly applied to image block sequence can perform image classification tasks well . After pre training a large amount of data and transmitting it to multiple small and medium-sized image recognition benchmarks (ImageNet、CIFAR-100、VTAB etc. ) when , Compared with the most advanced convolutional network ,Vision Transformer (ViT) Excellent results were obtained , However, the computing resources required for training are greatly reduced .
Summary :1、Transformer stay NLP Has become a classic ;2, stay CV in ,Attention The mechanism is only used as a supplement ;3、 We use pure Transformer Structure can achieve good results in image classification tasks ;4、 After training on large enough data ,ViT You can get and CNN Of SODA The results are comparable
ViT structure
The core idea : Segmentation rearrangement
Attention
The core idea : weighted mean ( Calculate similarity )
advantage :1、 Parallel operation ;2、 Global vision
MultiHead-Attention
The core idea : Similarity calculation , How many? W(Q,K,A) Just repeat the operation for how many times , result concat once
Q:query;K:key;V:Value
Input adapter
The core idea : Cut the picture directly , Then enter the number into the network
Why Patch0: ** Need a vector that integrates information **: If there is only the vector of the original input , There will be a problem of selection quantity , It's not good to use any vector to classify , It takes a lot of calculation , So join a learnable vector That is to say Patch0 To integrate information .
Location code (Positional Encoding)
After image segmentation and rearrangement, the position information is lost , also Transformer The internal operation of is independent of spatial information , Therefore, it is necessary to encode the location information and retransmit it to the network ,ViT Used a learnable vector Encoding , code vector and patch vector Add directly to form the input .
Training methods
Large scale use Pre-Train, First, pre train on the big data set , Then go to the small data set Fine Tune
After the migration , Need to put the original MLP Head replace , Replace with the number of corresponding categories FC layer ( As in the past )
When dealing with different size inputs, you need to correct Positional Encoding The results are interpolated .
Attention Relationship between distance and network layers
Attention The distance can be equivalent to Conv The size of the receptive field in You can see the deeper the number of layers ,Attention The farther you cross But at the bottom , There are also head It can cover a long distance This shows that they are indeed responsible Global Information Integration
A summary of the paper
Model structure ——Transformer Encoder
Input adapter —— Segment the pictures and rearrange
Location code —— Learnable vector To express
pure Transformer Do classification tasks
Simple input adaptation can be used
A large number of experiments have revealed that pure
Transformer do CV The possibility of .
PVT
Swin Transformer
Transformer+detection
DETR
Deformable DETR
Sparse RCNN
边栏推荐
- [combinatorics] generating function (use generating function to solve the combination number of multiple sets R)
- Change the single node of Postgres database into master-slave
- 网格图中递增路径的数目[dfs逆向路径+记忆dfs]
- Theoretical description of linear equations and summary of methods for solving linear equations by eigen
- Redis core technology and practice - learning notes (IX): slicing cluster
- [combinatorics] generating function (generating function application scenario | using generating function to solve recursive equation)
- ES7 - Optimization of promise
- After the festival, a large number of people change careers. Is it still time to be 30? Listen to the experience of the past people
- win32:堆破壞的dump文件分析
- Solve the problem of inaccurate network traffic monitored by ZABBIX with SNMP
猜你喜欢
Valentine's day, send you a little red flower~
Gao Qing, Beijing University of Aeronautics and Astronautics: CIM is a natural quantum computing platform for graph data processing
English语法_形容词/副词3级 - 倍数表达
Mature port AI ceaspectus leads the world in the application of AI in terminals, CIMC Feitong advanced products go global, smart terminals, intelligent ports, intelligent terminals
PHP MySQL inserts multiple pieces of data
How to draw non overlapping bubble chart in MATLAB
What problems can cross-border e-commerce sellers solve with multi platform ERP management system
模块九作业
Class exercises
[combinatorics] generating function (generating function application scenario | using generating function to solve recursive equation)
随机推荐
English語法_名詞 - 分類
OpenSSL的SSL/BIO_get_fd
PHP MySQL create database
[combinatorics] generating function (commutative property | derivative property | integral property)
CTO and programmer were both sentenced for losing control of the crawler
Mature port AI ceaspectus leads the world in the application of AI in terminals, CIMC Feitong advanced products go global, smart terminals, intelligent ports, intelligent terminals
分布式的任务分发框架-Gearman
webcodecs
The second largest gay dating website in the world was exposed, and the status of programmers in 2022
ES7 - Optimization of promise
NFT新的契机,多媒体NFT聚合平台OKALEIDO即将上线
AcWing 271. 杨老师的照相排列【多维DP】
English语法_名词 - 分类
2022-2028 global petroleum pipe joint industry research and trend analysis report
Line by line explanation of yolox source code of anchor free series network (6) -- mixup data enhancement
Use of unsafe class
【统信UOS】扫描仪设备管理驱动安装
Nodejs (01) - introductory tutorial
Theoretical description of linear equations and summary of methods for solving linear equations by eigen
Postfix tips and troubleshooting commands