当前位置:网站首页>PVT's spatial reduction attention (SRA)
PVT's spatial reduction attention (SRA)
2022-07-27 08:58:00 【hxxjxw】
It can be understood as the R Points converge into one , then attention When Q And aggregated points K and V count
import torch from torch import nn class Attention(nn.Module): def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., sr_ratio=1): super().__init__() assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}." self.dim = dim self.num_heads = num_heads head_dim = dim // num_heads self.scale = qk_scale or head_dim ** -0.5 self.q = nn.Linear(dim, dim, bias=qkv_bias) self.kv = nn.Linear(dim, dim * 2, bias=qkv_bias) self.attn_drop = nn.Dropout(attn_drop) self.proj = nn.Linear(dim, dim) self.proj_drop = nn.Dropout(proj_drop) self.sr_ratio = sr_ratio # In implementation, it is equivalent to a convolution layer if sr_ratio > 1: self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio) self.norm = nn.LayerNorm(dim) def forward(self, x, H, W): B, N, C = x.shape q = self.q(x).reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3) if self.sr_ratio > 1: x_ = x.permute(0, 2, 1).reshape(B, C, H, W) x_ = self.sr(x_).reshape(B, C, -1).permute(0, 2, 1) # here x_.shape = (B, N/R^2, C) x_ = self.norm(x_) kv = self.kv(x_).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) else: kv = self.kv(x).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) k, v = kv[0], kv[1] attn = (q @ k.transpose(-2, -1)) * self.scale attn = attn.softmax(dim=-1) attn = self.attn_drop(attn) x = (attn @ v).transpose(1, 2).reshape(B, N, C) x = self.proj(x) x = self.proj_drop(x) return x x = torch.rand(4, 28672, 256) attn = Attention(dim=256, sr_ratio = 2) output = attn(x, 224, 128)
边栏推荐
- 苹果降价600元,对本就溃败的国产旗舰手机几乎是毁灭性打击
- CUDA programming-01: build CUDA Programming Environment
- JWT authentication and login function implementation, exit login
- 低成本、低门槛、易部署,4800+万户中小企业数字化转型新选择
- Is online account opening safe? Want to know how securities companies get preferential accounts?
- NIO示例
- 存储和计算引擎
- [flutter -- geTx] preparation
- Flink1.15 source code reading Flink clients client execution process (reading is boring)
- NiO example
猜你喜欢

4279. 笛卡尔树

Five kinds of 2D attention finishing (non local, criss cross, Se, CBAM, dual attention)
![2040: [Blue Bridge Cup 2022 preliminary] bamboo cutting (priority queue)](/img/76/512b7fd4db55f9f7d8f5bcb646d9fc.jpg)
2040: [Blue Bridge Cup 2022 preliminary] bamboo cutting (priority queue)

【微服务~Sentinel】Sentinel之dashboard控制面板

How to upload qiniu cloud

CUDA programming-04: CUDA memory model

NIO总结文——一篇读懂NIO整个流程

低成本、低门槛、易部署,4800+万户中小企业数字化转型新选择

What are the differences or similarities between "demand fulfillment to settlement" and "purchase to payment"?

4274. 后缀表达式
随机推荐
被三星和台积电挤压的Intel终放下身段,为中国芯片定制芯片工艺
Sequential storage and chain storage of stack implementation
CUDA programming-05: flows and events
The following license SolidWorks Standard cannot be obtained, and the use license file cannot be found. (-1,359,2)。
PyQt5快速开发与实战 4.1 QMainWindow
Hangzhou E-Commerce Research Institute released an explanation of the new term "digital existence"
Full Permutation (depth first, permutation tree)
TensorFlow损失函数
CUDA programming-04: CUDA memory model
Activation functions commonly used in deep learning
【进程间通信IPC】- 信号量的学习
Arm undefined instruction exception assembly
[penetration test tool sharing] [dnslog server building guidance]
Implementation of registration function
4276. 擅长C
4274. Suffix expression
“鼓浪屿元宇宙”,能否成为中国文旅产业的“升级样本”
693. 行程排序
2034: [Blue Bridge Cup 2022 preliminary] pruning shrubs
低成本、低门槛、易部署,4800+万户中小企业数字化转型新选择
