当前位置:网站首页>Transformer's understanding
Transformer's understanding
2022-07-28 06:11:00 【Alan and fish】
1. Dichotomous attention mechanism
- Lead in two ends
Here is the two head attention mechanism , In fact, it's just putting the front q,k,v Subdivide , Divided into two groups. , In this way, the information concerned is refined .
Weight w Split into wq,1,wq,2 Two weight parameters , And then a Dot multiply with two weight parameters respectively , Got it qq,1 and qq,2. - Calculation α \alpha α
This is the time , take q The first head with each k The first head of ,q The second head of k Calculate at the second end of , You will get two α \alpha α1, α \alpha α2 - Calculation b

The following steps are the same as the single head attention mechanism , The difference is that the multi head attention mechanism introduces multiple heads , The information is more subdivided , Multiple calculations are required , The results are more accurate .
2. Introduce location information
There is a flaw in the attention mechanism , There is no location information , So we introduce a one-hot Position matrix of structure .
Set the weight matrix W Split into WI and WP, Then with the input value x And location information p Do point multiplication , obtain ei and α \alpha αi
3.transformer Visual understanding of framework

Take machine translation for example , Enter a machine learning , First, it will be encoded , Then it is decoded , Get the information you want ,tansformer Mechanism is a process of encoding and decoding .
Information entered x There will be one with one-hot Combined with encoded location information , Then enter a self-attention The long attention mechanism . Then take the encoded result as the input of decoding , Put the input into a masked The long attention mechanism , And then pass by self-attention Attention mechanism , Finally, the final output is obtained through a series of operations .
When coding , Added a Norm layer ,Norm and Layer The difference is that ,Norm It's horizontal ,Layer It's vertical .
4. See the effect of attention mechanism through visualization

As shown in the figure :
In this paper, the it It's a pronoun , In this text ,it It means animal, So it's with animal Rely more on , The relationship between them is darker .
5. The effect comparison between single head attention mechanism and multi head attention mechanism

The green one above is the long attention mechanism , The red one below is the single head attention mechanism , As you can see from the diagram , The long attention mechanism pays more attention to information .
边栏推荐
- Ssh/scp breakpoint resume Rsync
- tensorboard可视化
- 深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning
- 无约束低分辨率人脸识别综述一:用于低分辨率人脸识别的数据集
- CertPathValidatorException:validity check failed
- The project does not report an error, operates normally, and cannot request services
- pytorch深度学习单卡训练和多卡训练
- 深度学习(自监督:MoCo V3):An Empirical Study of Training Self-Supervised Vision Transformers
- 自动定时备份远程mysql脚本
- 高端大气的小程序开发设计有哪些注意点?
猜你喜欢

tensorboard可视化

Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning

搭建集群之后崩溃的解决办法

Micro service architecture cognition and service governance Eureka

【7】 Consistency between redis cache and database data

Regular verification rules of wechat applet mobile number

3: MySQL master-slave replication setup

无约束低分辨率人脸识别综述三:同质低分辨率人脸识别方法

小程序开发如何提高效率?

强化学习——价值学习中的DQN
随机推荐
Record the problems encountered in online capacity expansion server nochange: partition 1 is size 419428319. It cannot be grown
KubeSphere安装版本问题
TensorFlow2.1基本概念与常见函数
Reinforcement learning - dqn in value learning
Self attention learning notes
transformer的理解
卷积神经网络
深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning
Reinforcement learning -- SARS in value learning
Why is the kotlin language not popular now? What's your opinion?
【3】 Redis features and functions
Small program development solves the anxiety of retail industry
【1】 Introduction to redis
There is a problem with MySQL paging
微信小程序开发语言一般有哪些?
ssh/scp断点续传rsync
Wechat applet development and production should pay attention to these key aspects
深度学习(增量学习)——ICCV2022:Contrastive Continual Learning
Dataset class loads datasets in batches
【4】 Redis persistence (RDB and AOF)