当前位置:网站首页>Transformer structure analysis and the principle of blocks in it
Transformer structure analysis and the principle of blocks in it
2022-07-03 20:41:00 【SaltyFish_ Go】
transformer The architecture of
Bit by bit feedforward network
Transformer It's a use encoder-decoder framework , Pure use attention Attention mechanism , There are many in encoder and decoder transformer block , Multiple attention is used in each block , And use a bit by bit feedforward network , and layer-norm Layer normalization (batchnorm Not suitable for nlp, Because the sentences are not the same length , Different dimensions and characteristics ).
Long attention

A collection of methods through different attention mechanisms concat, That is, use the same pair key,value,query Extract different information , Then the matrix is fully connected , The dimension of the output is determined by the last fully connected output .
transformer The architecture of

The first one in the decoder masked-multi-head-attention yes self-attention structure , The second long attention is not self-attention,attention Of key,value The input is the output of the encoder .
query From the target sequence
Bit by bit feedforward network
Bit by bit feedforward network in the architecture , It can be seen as a full connection layer for changing dimensions .
Layer normalization

Each sentence layernorm, Not like an image , Each channel or feature batchnorm( On the full connection layer or convolution layer )
边栏推荐
- Global and Chinese market of liquid antifreeze 2022-2028: Research Report on technology, participants, trends, market size and share
- Upgrade PIP and install Libraries
- The 12th Blue Bridge Cup
- Example of peanut shell inner net penetration
- 一台服务器最大并发 tcp 连接数多少?65535?
- 9 pyqt5 qscrollarea scroll area and qscrollbar scroll bar
- 11-grom-v2-04-advanced query
- Test access criteria
- Sightseeing - statistics of the number of shortest paths + state transfer + secondary small paths
- QT6 QML book/qt quick 3d/ Basics
猜你喜欢

TLS environment construction and plaintext analysis

In 2021, the global revenue of syphilis rapid detection kits was about US $608.1 million, and it is expected to reach US $712.9 million in 2028

全网都在疯传的《老板管理手册》(转)

Machine learning support vector machine SVM

In 2021, the global foam protection packaging revenue was about $5286.7 million, and it is expected to reach $6615 million in 2028

一台服务器最大并发 tcp 连接数多少?65535?

Shortest path problem of graph theory (acwing template)

2022 safety officer-c certificate examination and safety officer-c certificate registration examination
![[Yugong series] go teaching course 002 go language environment installation in July 2022](/img/47/35b4fb0354122e233977b261ef405b.png)
[Yugong series] go teaching course 002 go language environment installation in July 2022

It is discussed that the success of Vit lies not in attention. Shiftvit uses the precision of swing transformer to outperform the speed of RESNET
随机推荐
The 12th Blue Bridge Cup
TLS environment construction and plaintext analysis
SQL injection - Fundamentals of SQL database operation
Implementation of stack
Offset related concepts + drag modal box case
Use of CMD command
Qtablewidget control of QT
Test panghu was teaching you how to use the technical code to flirt with girls online on Valentine's Day 520
Q&A:Transformer, Bert, ELMO, GPT, VIT
[Tang Laoshi] C -- encapsulation: member variables and access modifiers
Operate BOM objects (key)
Print linked list from end to end
Golang type assertion and conversion (and strconv package)
[Yugong series] go teaching course 002 go language environment installation in July 2022
Kubernetes 通信异常网络故障 解决思路
Basic knowledge of dictionaries and collections
Rhcsa third day operation
The 29th day of force deduction (DP topic)
Sightseeing - statistics of the number of shortest paths + state transfer + secondary small paths
Cesiumjs 2022 ^ source code interpretation [7] - Analysis of the request and loading process of 3dfiles