当前位置:网站首页>Transformer's understanding
Transformer's understanding
2022-07-28 06:11:00 【Alan and fish】
1. Dichotomous attention mechanism
- Lead in two ends
Here is the two head attention mechanism , In fact, it's just putting the front q,k,v Subdivide , Divided into two groups. , In this way, the information concerned is refined .
Weight w Split into wq,1,wq,2 Two weight parameters , And then a Dot multiply with two weight parameters respectively , Got it qq,1 and qq,2. - Calculation α \alpha α
This is the time , take q The first head with each k The first head of ,q The second head of k Calculate at the second end of , You will get two α \alpha α1, α \alpha α2 - Calculation b

The following steps are the same as the single head attention mechanism , The difference is that the multi head attention mechanism introduces multiple heads , The information is more subdivided , Multiple calculations are required , The results are more accurate .
2. Introduce location information
There is a flaw in the attention mechanism , There is no location information , So we introduce a one-hot Position matrix of structure .
Set the weight matrix W Split into WI and WP, Then with the input value x And location information p Do point multiplication , obtain ei and α \alpha αi
3.transformer Visual understanding of framework

Take machine translation for example , Enter a machine learning , First, it will be encoded , Then it is decoded , Get the information you want ,tansformer Mechanism is a process of encoding and decoding .
Information entered x There will be one with one-hot Combined with encoded location information , Then enter a self-attention The long attention mechanism . Then take the encoded result as the input of decoding , Put the input into a masked The long attention mechanism , And then pass by self-attention Attention mechanism , Finally, the final output is obtained through a series of operations .
When coding , Added a Norm layer ,Norm and Layer The difference is that ,Norm It's horizontal ,Layer It's vertical .
4. See the effect of attention mechanism through visualization

As shown in the figure :
In this paper, the it It's a pronoun , In this text ,it It means animal, So it's with animal Rely more on , The relationship between them is darker .
5. The effect comparison between single head attention mechanism and multi head attention mechanism

The green one above is the long attention mechanism , The red one below is the single head attention mechanism , As you can see from the diagram , The long attention mechanism pays more attention to information .
边栏推荐
- Four perspectives to teach you to choose applet development tools?
- Solution to the crash after setting up a cluster
- 神经网络实现鸢尾花分类
- Boosting unconstrained face recognition with auxiliary unlabeled data to enhance unconstrained face recognition
- Distributed lock redis implementation
- NLP中基于Bert的数据预处理
- D2SC-GAN:基于双深浅通道生成对抗网络的课堂场景低分辨率人脸识别
- Installing redis under Linux (centos7)
- transformer的理解
- Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations
猜你喜欢

Deep learning (incremental learning) - (iccv) striking a balance between stability and plasticity for class incremental learning

5、 Video processing and GStreamer

What are the detailed steps of wechat applet development?

Small program development solves the anxiety of retail industry

3: MySQL master-slave replication setup

深度学习——Pay Attention to MLPs

Deep learning - patches are all you need

Deep learning pay attention to MLPs

Boosting unconstrained face recognition with auxiliary unlabeled data to enhance unconstrained face recognition

无约束低分辨率人脸识别综述一:用于低分辨率人脸识别的数据集
随机推荐
Deep learning (self supervision: CPC V2) -- data efficient image recognition with contractual predictive coding
uView上传组件upload上传auto-upload模式图片压缩
【2】 Redis basic commands and usage scenarios
Kubesphere installation version problem
tf.keras搭建神经网络功能扩展
使用神经网络实现对天气的预测
Four perspectives to teach you to choose applet development tools?
5、 Video processing and GStreamer
NLP中常用的utils
Using neural network to predict the weather
The business of digital collections is not so easy to do
【1】 Introduction to redis
Marsnft: how do individuals distribute digital collections?
Automatic scheduled backup of remote MySQL scripts
Wechat applet development and production should pay attention to these key aspects
Dataset class loads datasets in batches
Deep learning (self supervision: Moco V2) -- improved bases with momentum contractual learning
卷积神经网络
自动定时备份远程mysql脚本
How much does it cost to make a small program mall? What are the general expenses?