当前位置:网站首页>14、Transformer--VIT TNT BETR
14、Transformer--VIT TNT BETR
2022-07-05 20:29:00 【C--G】
VIT–Vision Transformer
VIT Architecture diagram
VIT For image classification tasks , It's used here transformer The encoder , Divide the picture into nine pieces , Add the position coding, convert it into one dimension, and then put it into the encoder , Encoder has 9 Inputs token, among 0 Number token And others 9 position token Interactive calculation , It integrates other 9 position token Characteristic information of , So we just need 0 Number token that will do , The back is MLP Head And classification
- CNN The problem of
- transformer advantage
- The formula
- VIT pattern
- Location code
- Effect analysis
- Code link
https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_classification/vision_transformer
TNT-Transformer in Transformer
- Basic composition
- Sequence construction
- Basic calculation
- Location code
- PatchEmbedding visualization
BETR
object detection
The basic idea
Parallel prediction 100 A coordinate box , No objects , That's the backgroundNetwork architecture
cnn Obtain one-dimensional characteristic graph ,positional encoding Get location code , And VIT Different ,BETR No, 0 Number token, With the traditional Transformer Decoder Different ,BETR By object queries How many coordinate frames are generated at a time , Each box is in parallel with encoder Output to match , Re pass prediction heads Determine whether it is the target box
Encoder The task of
encoder The result of providing attention to goals is better than cnn The result of characteristic graph , It is conducive to the decoder to quickly identify the target , As shown in the figure ,encoder It can also recognize objects well in case of occlusionNetwork architecture
Output match
The role of attention
Google source code
https://github.com/google-research/bertData resources – Big guy's blog
https://blog.csdn.net/qq_37774399/article/details/121748163
边栏推荐
- 3.3、项目评估
- Leetcode brush question: binary tree 14 (sum of left leaves)
- Some problems encountered in cocos2d-x project summary
- Mongodb basic exercises
- Document method
- Rainbond 5.7.1 支持对接多家公有云和集群异常报警
- [Yugong series] go teaching course in July 2022 004 go code Notes
- 2020 CCPC Weihai - A. golden spirit (thinking), D. ABC project (big number decomposition / thinking)
- 死信队列入门(两个消费者,一个生产者)
- Reinforcement learning - learning notes 4 | actor critical
猜你喜欢
欢迎来战,赢取丰厚奖金:Code Golf 代码高尔夫挑战赛正式启动
[Yugong series] go teaching course in July 2022 004 go code Notes
[quick start of Digital IC Verification] 9. Finite state machine (FSM) necessary for Verilog RTL design
实操演示:产研团队如何高效构建需求工作流?
IC科普文:ECO的那些事儿
Classic implementation method of Hongmeng system controlling LED
Leetcode brush question: binary tree 14 (sum of left leaves)
【数字IC验证快速入门】8、数字IC中的典型电路及其对应的Verilog描述方法
Scala basics [HelloWorld code parsing, variables and identifiers]
微信小程序正则表达式提取链接
随机推荐
银河证券在网上开户安全吗?
IC科普文:ECO的那些事儿
Leetcode brush questions: binary tree 11 (balanced binary tree)
Welcome to the game and win rich bonuses: Code Golf Challenge officially launched
计算lnx的一种方式
Solve the problem that the database configuration information under the ThinkPHP framework application directory is still connected by default after modification
Hongmeng OS' fourth learning
When JS method passes long type ID value, precision loss will occur
基金网上开户安全吗?去哪里开,可以拿到低佣金?
js方法传Long类型id值时会出现精确损失
document方法
Usaco3.4 "broken Gong rock" band raucous rockers - DP
Unity editor extended UI control
CTF逆向基础
【刷题记录】1. 两数之和
About the priority of Bram IP reset
National Eye Care Education Conference, 2022 the Fourth Beijing International Youth eye health industry exhibition
CTF reverse Foundation
小程序全局配置
Mongodb basic exercises