当前位置:网站首页>Position encoding practice in transformer
Position encoding practice in transformer
2022-07-04 16:39:00 【Beginner Chris】
In recent years ,transformer Because it can realize parallel computing and solve the dependency problem of long sequences, it is in nlp Areas and cv The field is brilliant .
The schematic diagram is as follows :
Here we mainly focus on a small part , namely position encoding part , because transformer Eliminate circular dependency , In order to reflect the location attribute , So encode the position of each element .
The code is as follows , As for why it is written like this , You can refer to the author's original , Or refer to the article .https://zhuanlan.zhihu.com/p/338592312
The code is as follows :
class PositionalEncoding(torch.nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)#(max-len,1,d_model)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(1), :].squeeze(1)
#x = x + self.pe[:x.size(1), :]
return x
In order to test , We define two input matrices , Full respectively 0、 whole 1tensor.
d_model = 4
a=torch.zeros(2,3,4)
pos=PositionalEncoding(d_model)
b=pos(a)
c=torch.ones(2,3,4)
b1=pos(c)
Obviously , The input matrix is
Output is b,b1, As shown below :;
It can be seen that , Are based on input , Add a fixed value , And those fixed values are encoded , It's not about input , And d_model of ,d_model It can be understood as a word embedding size .
边栏推荐
- Final consistency of MESI cache in CPU -- why does CPU need cache
- How to save the contents of div as an image- How to save the contents of a div as a image?
- Preliminary practice of niuke.com (10)
- JS to realize the countdown function
- 多年锤炼,迈向Kata 3.0 !走进开箱即用的安全容器体验之旅| 龙蜥技术
- .Net 应用考虑x64生成
- Functional interface, method reference, list collection sorting gadget implemented by lambda
- Market trend report, technical innovation and market forecast of electrochromic glass and devices in China and Indonesia
- [Previous line repeated 995 more times]RecursionError: maximum recursion depth exceeded
- The vscode waveform curve prompts that the header file cannot be found (an error is reported if the header file exists)
猜你喜欢
Audio and video technology development weekly | 252
Ten clothing stores have nine losses. A little change will make you buy every day
Understand asp Net core - Authentication Based on jwtbearer
@EnableAspectAutoJAutoProxy_ Exposeproxy property
Hidden communication tunnel technology: intranet penetration tool NPS
PR FAQ: how to set PR vertical screen sequence?
Unity animation day05
A trap used by combinelatest and a debouncetime based solution
What is torch NN?
Blood cases caused by Lombok use
随机推荐
函數式接口,方法引用,Lambda實現的List集合排序小工具
Understand Alibaba cloud's secret weapon "dragon architecture" in the article "science popularization talent"
Understand asp Net core - Authentication Based on jwtbearer
同构图与异构图CYPHER-TASK设计与TASK锁机制
话里话外:流程图绘制初级:六大常见错误
Stew in disorder
《吐血整理》保姆级系列教程-玩转Fiddler抓包教程(2)-初识Fiddler让你理性认识一下
Statistical learning: logistic regression and cross entropy loss (pytoch Implementation)
对人胜率84%,DeepMind AI首次在西洋陆军棋中达到人类专家水平
Proxifier global agent software, which provides cross platform port forwarding and agent functions
Hair growth shampoo industry Research Report - market status analysis and development prospect forecast
[book club issue 13] packaging format and coding format of audio files
Review of Weibo hot search in 2021 and analysis of hot search in the beginning of the year
.Net 应用考虑x64生成
[Chongqing Guangdong education] National Open University spring 2019 1396 pharmaceutical administration and regulations (version) reference questions
Understand Alibaba cloud's secret weapon "dragon architecture" in the article "science popularization talent"
[North Asia data recovery] a database data recovery case where the disk on which the database is located is unrecognized due to the RAID disk failure of HP DL380 server
Rearrange array
Web components series - detailed slides
Opencv learning -- arithmetic operation of image of basic operation