当前位置：网站首页>Transformer's understanding

Transformer's understanding

2022-07-28 06:11:00 【Alan and fish】

1. Dichotomous attention mechanism

Lead in two ends
Here is the two head attention mechanism , In fact, it's just putting the front q,k,v Subdivide , Divided into two groups. , In this way, the information concerned is refined .

Weight w Split into w^q,1,w^q,2 Two weight parameters , And then a Dot multiply with two weight parameters respectively , Got it q^q,1 and q^q,2.
Calculation $\alpha$
This is the time , take q The first head with each k The first head of ,q The second head of k Calculate at the second end of , You will get two $\alpha$ ₁, $\alpha$ ₂
Calculation b

The following steps are the same as the single head attention mechanism , The difference is that the multi head attention mechanism introduces multiple heads , The information is more subdivided , Multiple calculations are required , The results are more accurate .

2. Introduce location information

There is a flaw in the attention mechanism , There is no location information , So we introduce a one-hot Position matrix of structure .

Set the weight matrix W Split into W^I and W^P, Then with the input value x And location information p Do point multiplication , obtain ei and $\alpha$ i

3.transformer Visual understanding of framework

Take machine translation for example , Enter a machine learning , First, it will be encoded , Then it is decoded , Get the information you want ,tansformer Mechanism is a process of encoding and decoding .

Information entered x There will be one with one-hot Combined with encoded location information , Then enter a self-attention The long attention mechanism . Then take the encoded result as the input of decoding , Put the input into a masked The long attention mechanism , And then pass by self-attention Attention mechanism , Finally, the final output is obtained through a series of operations .
When coding , Added a Norm layer ,Norm and Layer The difference is that ,Norm It's horizontal ,Layer It's vertical .

4. See the effect of attention mechanism through visualization

As shown in the figure :
In this paper, the it It's a pronoun , In this text ,it It means animal, So it's with animal Rely more on , The relationship between them is darker .

5. The effect comparison between single head attention mechanism and multi head attention mechanism

The green one above is the long attention mechanism , The red one below is the single head attention mechanism , As you can see from the diagram , The long attention mechanism pays more attention to information .