当前位置：网站首页>Transformer structure analysis and the principle of blocks in it

Transformer structure analysis and the principle of blocks in it

2022-07-03 20:41:00 【SaltyFish_ Go】

Transformer It's a use encoder-decoder framework , Pure use attention Attention mechanism , There are many in encoder and decoder transformer block , Multiple attention is used in each block , And use a bit by bit feedforward network , and layer-norm Layer normalization （batchnorm Not suitable for nlp, Because the sentences are not the same length , Different dimensions and characteristics ）.

Long attention

A collection of methods through different attention mechanisms concat, That is, use the same pair key,value,query Extract different information , Then the matrix is fully connected , The dimension of the output is determined by the last fully connected output .

transformer The architecture of

The first one in the decoder masked-multi-head-attention yes self-attention structure , The second long attention is not self-attention,attention Of key,value The input is the output of the encoder .

query From the target sequence