当前位置:网站首页>Transformer structure analysis and the principle of blocks in it

Transformer structure analysis and the principle of blocks in it

2022-07-03 20:41:00 SaltyFish_ Go

                   Long attention

transformer The architecture of

Bit by bit feedforward network

Layer normalization  


Transformer It's a use encoder-decoder framework , Pure use attention Attention mechanism , There are many in encoder and decoder transformer block , Multiple attention is used in each block , And use a bit by bit feedforward network , and layer-norm Layer normalization (batchnorm Not suitable for nlp, Because the sentences are not the same length , Different dimensions and characteristics ).

Long attention

A collection of methods through different attention mechanisms concat, That is, use the same pair key,value,query Extract different information , Then the matrix is fully connected , The dimension of the output is determined by the last fully connected output .

transformer The architecture of

The first one in the decoder masked-multi-head-attention  yes self-attention structure , The second long attention is not self-attention,attention Of key,value The input is the output of the encoder .

query From the target sequence

Bit by bit feedforward network

Bit by bit feedforward network in the architecture , It can be seen as a full connection layer for changing dimensions . 

 

Layer normalization  

Each sentence layernorm, Not like an image , Each channel or feature batchnorm( On the full connection layer or convolution layer ) 

原网站

版权声明
本文为[SaltyFish_ Go]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202150002385159.html