当前位置:网站首页>Transformer中position encoding实践
Transformer中position encoding实践
2022-07-04 14:54:00 【初学者chris】
近年来,transformer由于其可以实现并行计算且可以解决长序列的依赖问题在nlp领域和cv领域大放异彩。
原理图如下所示:
这里我们主要关注一个小部分,即position encoding部分,因为transformer取消了循环依赖,为了体现位置属性,所以给每个元素进行位置编码。
代码如下所示,至于为什么会这么写,可以参考作者原文,或者参考一下文章。https://zhuanlan.zhihu.com/p/338592312
代码如下:
class PositionalEncoding(torch.nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)#(max-len,1,d_model)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(1), :].squeeze(1)
#x = x + self.pe[:x.size(1), :]
return x
为了测试,我们定义两个输入矩阵,分别为全0、全1tensor。
d_model = 4
a=torch.zeros(2,3,4)
pos=PositionalEncoding(d_model)
b=pos(a)
c=torch.ones(2,3,4)
b1=pos(c)
很明显,输入矩阵为

输出为b,b1,如下所示:;


可以看出,都是在输入的基础之上,加上了固定值,而那些固定值就是编码得到的,与输入无关,与d_model有关,d_model可以理解为单词的embedding大小。
边栏推荐
- [North Asia data recovery] a database data recovery case where the disk on which the database is located is unrecognized due to the RAID disk failure of HP DL380 server
- TypeError: list indices must be integers or slices, not str
- Accounting regulations and professional ethics [7]
- Audio and video technology development weekly | 252
- Research Report on market supply and demand and strategy of China's well completion equipment industry
- What should ABAP do when it calls a third-party API and encounters garbled code?
- Can I "reverse" a Boolean value- Can I 'invert' a bool?
- 对人胜率84%,DeepMind AI首次在西洋陆军棋中达到人类专家水平
- Hair growth shampoo industry Research Report - market status analysis and development prospect forecast
- How can floating point numbers be compared with 0?
猜你喜欢

Stress, anxiety or depression? Correct diagnosis and retreatment

Penetration test --- database security: detailed explanation of SQL injection into database principle

MySQL learning notes - data type (2)

Statistical learning: logistic regression and cross entropy loss (pytoch Implementation)

DC-2靶场搭建及渗透实战详细过程(DC靶场系列)

Move, say goodbye to the past again

嵌入式软件架构设计-函数调用

Audio and video technology development weekly | 252

Opencv learning -- geometric transformation of image processing

Redis' optimistic lock and pessimistic lock for solving transaction conflicts
随机推荐
Web components series - detailed slides
DIY a low-cost multi-functional dot matrix clock!
[book club issue 13] ffmpeg common methods for viewing media information and processing audio and video files
165 webmaster online toolbox website source code / hare online tool system v2.2.7 Chinese version
Understand the rate control mode rate control mode CBR, VBR, CRF (x264, x265, VPX)
Hair and fuzz interceptor Industry Research Report - market status analysis and development prospect forecast
std::shared_ ptr initialization: make_ shared< Foo> () vs shared_ ptr< T> (new Foo) [duplicate]
Four point probe Industry Research Report - market status analysis and development prospect prediction
Unity animation day05
Research Report on market supply and demand and strategy of China's plastics and polymer industry
Game theory
Overview of convolutional neural network structure optimization
Accounting regulations and professional ethics [7]
Preliminary practice of niuke.com (10)
Vscode setting outline shortcut keys to improve efficiency
Object distance measurement of stereo vision
Nine CIO trends and priorities in 2022
TypeError: list indices must be integers or slices, not str
China tall oil fatty acid market trend report, technical dynamic innovation and market forecast
Principle and general steps of SQL injection