当前位置:网站首页>14、Transformer--VIT TNT BETR
14、Transformer--VIT TNT BETR
2022-07-05 20:29:00 【C--G】
VIT–Vision Transformer
VIT Architecture diagram
VIT For image classification tasks , It's used here transformer The encoder , Divide the picture into nine pieces , Add the position coding, convert it into one dimension, and then put it into the encoder , Encoder has 9 Inputs token, among 0 Number token And others 9 position token Interactive calculation , It integrates other 9 position token Characteristic information of , So we just need 0 Number token that will do , The back is MLP Head And classification
- CNN The problem of
- transformer advantage
- The formula
- VIT pattern
- Location code
- Effect analysis
- Code link
https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_classification/vision_transformer
TNT-Transformer in Transformer
- Basic composition
- Sequence construction
- Basic calculation
- Location code
- PatchEmbedding visualization
BETR
object detection
The basic idea
Parallel prediction 100 A coordinate box , No objects , That's the backgroundNetwork architecture
cnn Obtain one-dimensional characteristic graph ,positional encoding Get location code , And VIT Different ,BETR No, 0 Number token, With the traditional Transformer Decoder Different ,BETR By object queries How many coordinate frames are generated at a time , Each box is in parallel with encoder Output to match , Re pass prediction heads Determine whether it is the target box
Encoder The task of
encoder The result of providing attention to goals is better than cnn The result of characteristic graph , It is conducive to the decoder to quickly identify the target , As shown in the figure ,encoder It can also recognize objects well in case of occlusionNetwork architecture
Output match
The role of attention
Google source code
https://github.com/google-research/bertData resources – Big guy's blog
https://blog.csdn.net/qq_37774399/article/details/121748163
边栏推荐
- 2022 Beijing eye health products exhibition, eye care products exhibition, China eye Expo held in November
- [quick start of Digital IC Verification] 6. Quick start of questasim (taking the design and verification of full adder as an example)
- Welcome to the game and win rich bonuses: Code Golf Challenge officially launched
- Zero cloud new UI design
- Model method
- [Yugong series] go teaching course in July 2022 004 go code Notes
- [C language] three implementations of quick sorting and optimization details
- mongodb基操的练习
- CVPR 2022 | common 3D damage and data enhancement
- Ros2 topic [01]: installing ros2 on win10
猜你喜欢
About the priority of Bram IP reset
Pytorch 1.12 was released, officially supporting Apple M1 chip GPU acceleration and repairing many bugs
Leetcode skimming: binary tree 17 (construct binary tree from middle order and post order traversal sequence)
[record of question brushing] 1 Sum of two numbers
.Net分布式事務及落地解决方案
IC科普文:ECO的那些事儿
Leetcode skimming: binary tree 10 (number of nodes of a complete binary tree)
[quick start of Digital IC Verification] 9. Finite state machine (FSM) necessary for Verilog RTL design
Fundamentals - configuration file analysis
【数字IC验证快速入门】3、数字IC设计全流程介绍
随机推荐
A way to calculate LNX
【数字IC验证快速入门】1、浅谈数字IC验证,了解专栏内容,明确学习目标
3.3、项目评估
零道云新UI设计中
[C language] merge sort
Mongodb basic exercises
Station B up builds the world's first pure red stone neural network, pornographic detection based on deep learning action recognition, Chen Tianqi's course progress of machine science compilation MLC,
sort和投影
鸿蒙系统控制LED的实现方法之经典
1:引文;
Fundamentals - configuration file analysis
2022 Beijing eye health products exhibition, eye care products exhibition, China eye Expo held in November
【数字IC验证快速入门】9、Verilog RTL设计必会的有限状态机(FSM)
July 4, 2022 - July 10, 2022 (UE4 video tutorial MySQL)
kubernetes资源对象介绍及常用命令(五)-(ConfigMap&Secret)
Leetcode brush questions: binary tree 18 (largest binary tree)
Leetcode (347) - top k high frequency elements
小程序事件绑定
Classic implementation method of Hongmeng system controlling LED
信息学奥赛一本通 1338:【例3-3】医院设置 | 洛谷 P1364 医院设置