当前位置:网站首页>Position encoding practice in transformer
Position encoding practice in transformer
2022-07-04 16:39:00 【Beginner Chris】
In recent years ,transformer Because it can realize parallel computing and solve the dependency problem of long sequences, it is in nlp Areas and cv The field is brilliant .
The schematic diagram is as follows :
Here we mainly focus on a small part , namely position encoding part , because transformer Eliminate circular dependency , In order to reflect the location attribute , So encode the position of each element .
The code is as follows , As for why it is written like this , You can refer to the author's original , Or refer to the article .https://zhuanlan.zhihu.com/p/338592312
The code is as follows :
class PositionalEncoding(torch.nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)#(max-len,1,d_model)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(1), :].squeeze(1)
#x = x + self.pe[:x.size(1), :]
return x
In order to test , We define two input matrices , Full respectively 0、 whole 1tensor.
d_model = 4
a=torch.zeros(2,3,4)
pos=PositionalEncoding(d_model)
b=pos(a)
c=torch.ones(2,3,4)
b1=pos(c)
Obviously , The input matrix is
Output is b,b1, As shown below :;
It can be seen that , Are based on input , Add a fixed value , And those fixed values are encoded , It's not about input , And d_model of ,d_model It can be understood as a word embedding size .
边栏推荐
- Ten clothing stores have nine losses. A little change will make you buy every day
- [book club issue 13] packaging format and coding format of audio files
- 基于check-point实现图数据构建任务
- MySQL learning notes - data type (2)
- Penetration test --- database security: detailed explanation of SQL injection into database principle
- DIY a low-cost multi-functional dot matrix clock!
- [North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure
- 科普达人丨一文看懂阿里云的秘密武器“神龙架构”
- Vscode prompt Please install clang or check configuration 'clang executable‘
- Talking about Net core how to use efcore to inject multiple instances of a context annotation type for connecting to the master-slave database
猜你喜欢
Working group and domain analysis of Intranet
Anta is actually a technology company? These operations fool netizens
Model fusion -- stacking principle and Implementation
[North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure
AutoCAD - set color
What should ABAP do when it calls a third-party API and encounters garbled code?
Model fusion -- stacking principle and Implementation
Qt---error: ‘QObject‘ is an ambiguous base of ‘MyView‘
Opencv learning -- geometric transformation of image processing
What is torch NN?
随机推荐
DC-2靶场搭建及渗透实战详细过程(DC靶场系列)
Function test - knowledge points and common interview questions
Understand asp Net core - Authentication Based on jwtbearer
Accounting regulations and professional ethics [7]
L1-072 scratch lottery
Change the mouse pointer on ngclick - change the mouse pointer on ngclick
Market trend report, technical innovation and market forecast of China's hair repair therapeutic apparatus
Accounting regulations and professional ethics [10]
Interface test - knowledge points and common interview questions
165 webmaster online toolbox website source code / hare online tool system v2.2.7 Chinese version
[North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure
Research Report on market supply and demand and strategy of China's Sodium Tetraphenylborate (cas+143-66-8) industry
Hair growth shampoo industry Research Report - market status analysis and development prospect forecast
同构图与异构图CYPHER-TASK设计与TASK锁机制
Blood cases caused by Lombok use
嵌入式软件架构设计-函数调用
QT graphical view frame: element movement
Redis: SDS source code analysis
The content of the source code crawled by the crawler is inconsistent with that in the developer mode
Research Report on plastic recycling machine industry - market status analysis and development prospect forecast