当前位置:网站首页>[long time series prediction] the [4] autocorrelation mechanism of aotoformer code explanation
[long time series prediction] the [4] autocorrelation mechanism of aotoformer code explanation
2022-06-12 05:35:00 【Heart regulating and pill refining】
First look at the pictures in the paper :

obviously ,AutoCorrelation Can completely replace self-attention . The input is q,k, v , The output is a V.
Look directly at the code :
def forward(self, queries, keys, values, attn_mask):
B, L, H, E = queries.shape
_, S, _, D = values.shape
if L > S:
zeros = torch.zeros_like(queries[:, :(L - S), :]).float()
values = torch.cat([values, zeros], dim=1)
keys = torch.cat([keys, zeros], dim=1)
else:
values = values[:, :L, :, :]
keys = keys[:, :L, :, :]
# period-based dependencies
q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1) # 32, 4, 128, 49
k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1) # 32, 4, 128, 49
res = q_fft * torch.conj(k_fft) # 32, 4, 128, 49
corr = torch.fft.irfft(res, dim=-1) # 32, 4, 128, 96In general ,queries, keys, values Of shape It's all the same , such as :[32, 96, 4, 128].
Of course , The second of the decoder AC,keys and values Of shape equally , such as :[32, 96, 4, 128].queries For the [32, 48+192, 4, 128]. Pass above if The sentence will put keys and values Length complement of 0 To queries As long as .
This article takes queries, keys, values Of shape:[32, 96, 4, 128] explain .
obviously , fft There is nothing to say . The main concern is time_delay_agg_training function .
V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2) def time_delay_agg_training(self, values, corr): # 32 4 128 96
"""
SpeedUp version of Autocorrelation (a batch-normalization style design)
This is for the training phase.
"""
# Think of the batch as 1 Analyze , Be clear at a glance
head = values.shape[1] # 4
channel = values.shape[2] # 128
length = values.shape[3] # 96
# find top k
top_k = int(self.factor * math.log(length)) # 1* ln(96) = 4
mean_value = torch.mean(torch.mean(corr, dim=1), dim=1) # 32, 96 Every time mean That dimension will disappear
print(torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)) # 4 It's worth
index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1] # 4 Suppose we get :[3, 4, 5, 2] # here [1] After that, I got index ## The batch dimension is averaged to obtain shape: 96 Retake top_k
weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1) # 32 4 hold 96 The largest of the three 4 Take out a value
# update corr
tmp_corr = torch.softmax(weights, dim=-1) # 32 4 this 4 The values are normalized
# aggregation
tmp_values = values # 32 4 128 96
delays_agg = torch.zeros_like(values).float() # 32 4 128 96 All are 0
for i in range(top_k):
pattern = torch.roll(tmp_values, -int(index[i]), -1) # 32 4 128 96 Move from the sequence dimension # index The step length corresponding to the movement
delays_agg = delays_agg + pattern * \
(tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)) # pattern By that R_q,k The latter part represents the multiplication of corresponding elements The latter is the weight unsqueeze repeat For the same shape You can multiply # tmp_corr[:, i] I don't think about it here batch_size Time is a number Count this repeat 96 Time
return delays_agg # 32 4 128 96
Upper figure , hypothesis top_k = 2, Sequence length = 10.
The model has been updated .
More reference : Please refer to the references in the reading section of the previous paper . The author answered why torch.roll . Use top_k The goal is to reduce the computational complexity .
边栏推荐
- CCF noi2022 quota allocation scheme
- Index fund summary
- yolov5
- Matlab: image rotation and interpolation and comparison of MSE before and after
- How long is the company's registered capital subscribed
- Go 接口实现原理【高阶篇】
- Go interface oriented programming practice
- A solution for PHP to implement image login verification code
- Halcon 用点来拟合平面
- Go interface implementation principle [advanced level]
猜你喜欢

38. 外观数列

Computer network connected but unable to access the Internet

Detailed analysis of mathematical modeling problem a (vaccine production scheduling problem) of May Day cup in 2021

Go 面向接口编程实战

Detailed explanation of data envelopment analysis (DEA) (taking the 8th Ningxia provincial competition as an example)

Introduction to audio alsa architecture

How Wireshark decrypts WiFi data packets

Deep understanding of asynchronous programming

Project requirements specification

Performance test - performance test tool analysis
随机推荐
37. serialized binary tree
Golang idea configures the agent to improve the speed of packages downloaded by go get
Nbiot module me3616 at command mqtt connecting thingsboard
[daily question on niuke.com] two point search
Reverse linked list
Halcon 3D 深度图转换为3D图像
Wireshark filter rule
Redis cluster cluster capacity expansion and data migration
51. reverse order pairs in the array
20. string representing numeric value
59 - I. maximum value of sliding window
[go] Viper reads the configuration file in the go project
merge sort
17. print from 1 to the maximum n digits
什么是工程预付款
16. 最接近的三數之和
Performance & interface test tool - JMeter
43. Number of occurrences of 1 in 1 ~ n integers
国企为什么要上市
Servlet core technology