当前位置:网站首页>Transformer中position encoding实践
Transformer中position encoding实践
2022-07-04 14:54:00 【初学者chris】
近年来,transformer由于其可以实现并行计算且可以解决长序列的依赖问题在nlp领域和cv领域大放异彩。
原理图如下所示:
这里我们主要关注一个小部分,即position encoding部分,因为transformer取消了循环依赖,为了体现位置属性,所以给每个元素进行位置编码。
代码如下所示,至于为什么会这么写,可以参考作者原文,或者参考一下文章。https://zhuanlan.zhihu.com/p/338592312
代码如下:
class PositionalEncoding(torch.nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)#(max-len,1,d_model)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(1), :].squeeze(1)
#x = x + self.pe[:x.size(1), :]
return x
为了测试,我们定义两个输入矩阵,分别为全0、全1tensor。
d_model = 4
a=torch.zeros(2,3,4)
pos=PositionalEncoding(d_model)
b=pos(a)
c=torch.ones(2,3,4)
b1=pos(c)
很明显,输入矩阵为

输出为b,b1,如下所示:;


可以看出,都是在输入的基础之上,加上了固定值,而那些固定值就是编码得到的,与输入无关,与d_model有关,d_model可以理解为单词的embedding大小。
边栏推荐
- Opencv learning -- arithmetic operation of image of basic operation
- MySQL learning notes - data type (2)
- 《吐血整理》保姆级系列教程-玩转Fiddler抓包教程(2)-初识Fiddler让你理性认识一下
- Statistical learning: logistic regression and cross entropy loss (pytoch Implementation)
- Penetration test --- database security: detailed explanation of SQL injection into database principle
- ~89 deformation translation
- 实战:fabric 用户证书吊销操作流程
- Expression #1 of ORDER BY clause is not in SELECT list, references column ‘d.dept_ no‘ which is not i
- Unity prefab day04
- Essential basic knowledge of digital image processing
猜你喜欢

嵌入式软件架构设计-函数调用
![[native JS] optimized text rotation effect](/img/50/3c09f223e821c14e7e9e0fb47622b6.jpg)
[native JS] optimized text rotation effect

Talking about Net core how to use efcore to inject multiple instances of a context annotation type for connecting to the master-slave database

Common knowledge of unity Editor Extension

DIY a low-cost multi-functional dot matrix clock!

Book of night sky 53 "stone soup" of Apache open source community

Redis' optimistic lock and pessimistic lock for solving transaction conflicts

Neuf tendances et priorités du DPI en 2022

Function test - knowledge points and common interview questions

DC-2靶场搭建及渗透实战详细过程(DC靶场系列)
随机推荐
How was MP3 born?
Market trend report, technical innovation and market forecast of electrochromic glass and devices in China and Indonesia
Explore mongodb - mongodb compass installation, configuration and usage introduction | mongodb GUI
What encryption algorithm is used for the master password of odoo database?
Accounting regulations and professional ethics [7]
How to save the contents of div as an image- How to save the contents of a div as a image?
c# 实现定义一套中间SQL可以跨库执行的SQL语句
What should ABAP do when it calls a third-party API and encounters garbled code?
Communication mode based on stm32f1 single chip microcomputer
~89 deformation translation
[North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure
[flask] ORM one to many relationship
China tall oil fatty acid market trend report, technical dynamic innovation and market forecast
Understand the rate control mode rate control mode CBR, VBR, CRF (x264, x265, VPX)
How to decrypt worksheet protection password in Excel file
Move, say goodbye to the past again
Game theory
. Net delay queue
Salient map drawing based on OpenCV
Selenium element interaction