当前位置:网站首页>Position encoding practice in transformer
Position encoding practice in transformer
2022-07-04 16:39:00 【Beginner Chris】
In recent years ,transformer Because it can realize parallel computing and solve the dependency problem of long sequences, it is in nlp Areas and cv The field is brilliant .
The schematic diagram is as follows :
Here we mainly focus on a small part , namely position encoding part , because transformer Eliminate circular dependency , In order to reflect the location attribute , So encode the position of each element .
The code is as follows , As for why it is written like this , You can refer to the author's original , Or refer to the article .https://zhuanlan.zhihu.com/p/338592312
The code is as follows :
class PositionalEncoding(torch.nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0).transpose(0, 1)#(max-len,1,d_model)
self.register_buffer('pe', pe)
def forward(self, x):
x = x + self.pe[:x.size(1), :].squeeze(1)
#x = x + self.pe[:x.size(1), :]
return x
In order to test , We define two input matrices , Full respectively 0、 whole 1tensor.
d_model = 4
a=torch.zeros(2,3,4)
pos=PositionalEncoding(d_model)
b=pos(a)
c=torch.ones(2,3,4)
b1=pos(c)
Obviously , The input matrix is

Output is b,b1, As shown below :;


It can be seen that , Are based on input , Add a fixed value , And those fixed values are encoded , It's not about input , And d_model of ,d_model It can be understood as a word embedding size .
边栏推荐
- Unity animation day05
- Vscode prompt Please install clang or check configuration 'clang executable‘
- Research Report on plastic recycling machine industry - market status analysis and development prospect forecast
- The new generation of domestic ORM framework sagacity sqltoy-5.1.25 release
- Model fusion -- stacking principle and Implementation
- [Previous line repeated 995 more times]RecursionError: maximum recursion depth exceeded
- .Net 应用考虑x64生成
- Ten clothing stores have nine losses. A little change will make you buy every day
- Statistical learning: logistic regression and cross entropy loss (pytoch Implementation)
- AutoCAD - set color
猜你喜欢

Principle and general steps of SQL injection
![[North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure](/img/f0/12dd17e840a23dc9ded379e1fd7454.jpg)
[North Asia data recovery] data recovery case of database data loss caused by HP DL380 server RAID disk failure
![[Previous line repeated 995 more times]RecursionError: maximum recursion depth exceeded](/img/c5/f933ad4a7bc903f15beede62c6d86f.jpg)
[Previous line repeated 995 more times]RecursionError: maximum recursion depth exceeded

Scientific research cartoon | what else to do after connecting with the subjects?

What is torch NN?

How to decrypt worksheet protection password in Excel file

Vscode prompt Please install clang or check configuration 'clang executable‘

The 17 year growth route of Zhang Liang, an open source person, can only be adhered to if he loves it

AutoCAD - set color
Application of clock wheel in RPC
随机推荐
Principle and general steps of SQL injection
Hidden communication tunnel technology: intranet penetration tool NPS
A trap used by combinelatest and a debouncetime based solution
Stew in disorder
Actual combat | use composite material 3 in application
多年锤炼,迈向Kata 3.0 !走进开箱即用的安全容器体验之旅| 龙蜥技术
Four point probe Industry Research Report - market status analysis and development prospect prediction
Unity prefab day04
Some fields of the crawler that should be output in Chinese are output as none
Understand Alibaba cloud's secret weapon "dragon architecture" in the article "science popularization talent"
[Chongqing Guangdong education] National Open University spring 2019 1248 public sector human resource management reference questions
System.currentTimeMillis() 和 System.nanoTime() 哪个更快?别用错了!
Feature extraction and detection 15-akaze local matching
Will the memory of ParticleSystem be affected by maxparticles
Overview of convolutional neural network structure optimization
AutoCAD - set color
PR FAQ: how to set PR vertical screen sequence?
Anta is actually a technology company? These operations fool netizens
Communication mode based on stm32f1 single chip microcomputer
Understand Alibaba cloud's secret weapon "dragon architecture" in the article "science popularization talent"