当前位置:网站首页>Transformer's understanding
Transformer's understanding
2022-07-28 06:11:00 【Alan and fish】
1. Dichotomous attention mechanism
- Lead in two ends
Here is the two head attention mechanism , In fact, it's just putting the front q,k,v Subdivide , Divided into two groups. , In this way, the information concerned is refined .
Weight w Split into wq,1,wq,2 Two weight parameters , And then a Dot multiply with two weight parameters respectively , Got it qq,1 and qq,2. - Calculation α \alpha α
This is the time , take q The first head with each k The first head of ,q The second head of k Calculate at the second end of , You will get two α \alpha α1, α \alpha α2 - Calculation b

The following steps are the same as the single head attention mechanism , The difference is that the multi head attention mechanism introduces multiple heads , The information is more subdivided , Multiple calculations are required , The results are more accurate .
2. Introduce location information
There is a flaw in the attention mechanism , There is no location information , So we introduce a one-hot Position matrix of structure .
Set the weight matrix W Split into WI and WP, Then with the input value x And location information p Do point multiplication , obtain ei and α \alpha αi
3.transformer Visual understanding of framework

Take machine translation for example , Enter a machine learning , First, it will be encoded , Then it is decoded , Get the information you want ,tansformer Mechanism is a process of encoding and decoding .
Information entered x There will be one with one-hot Combined with encoded location information , Then enter a self-attention The long attention mechanism . Then take the encoded result as the input of decoding , Put the input into a masked The long attention mechanism , And then pass by self-attention Attention mechanism , Finally, the final output is obtained through a series of operations .
When coding , Added a Norm layer ,Norm and Layer The difference is that ,Norm It's horizontal ,Layer It's vertical .
4. See the effect of attention mechanism through visualization

As shown in the figure :
In this paper, the it It's a pronoun , In this text ,it It means animal, So it's with animal Rely more on , The relationship between them is darker .
5. The effect comparison between single head attention mechanism and multi head attention mechanism

The green one above is the long attention mechanism , The red one below is the single head attention mechanism , As you can see from the diagram , The long attention mechanism pays more attention to information .
边栏推荐
猜你喜欢

Reinforcement learning -- SARS in value learning

微信小程序开发制作注意这几个重点方面

Improved knowledge distillation for training fast lr_fr for fast low resolution face recognition model training

Solution to the crash after setting up a cluster

小程序搭建制作流程是怎样的?

神经网络实现鸢尾花分类

面试官:让你设计一套图片加载框架,你会怎么设计?

深度学习——Pay Attention to MLPs

强化学习——Proximal Policy Optimization Algorithms

深度学习——Patches Are All You Need
随机推荐
深度学习(增量学习)——ICCV2021:SS-IL: Separated Softmax for Incremental Learning
Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning
tf.keras搭建神经网络功能扩展
高端大气的小程序开发设计有哪些注意点?
Using neural network to predict the weather
深度学习——MetaFormer Is Actually What You Need for Vision
Which is more reliable for small program development?
There is a problem with MySQL paging
Micro service architecture cognition and service governance Eureka
Distributed cluster architecture scenario optimization solution: distributed ID solution
D2SC-GAN:基于双深浅通道生成对抗网络的课堂场景低分辨率人脸识别
深度学习(自监督:CPC v2)——Data-Efficient Image Recognition with Contrastive Predictive Coding
微信小程序开发费用制作费用是多少?
Centos7 installing MySQL
Reinforcement learning - continuous control
Dataset类分批加载数据集
Svn incoming content cannot be updated, and submission error: svn: e155015: aborting commit: XXX remains in conflict
How to use Bert
强化学习——Proximal Policy Optimization Algorithms
How to choose an applet development enterprise