当前位置:网站首页>【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
2022-07-29 06:03:00 【Dull cat】
List of articles

This article is included in NeurIPS 2021
Thesis link :https://arxiv.org/pdf/2110.11945.pdf
Code link :https://github.com/fudan-zvg/SOFT
One 、 background
be based on self-attention Of transformer Although it has achieved good results , But the amount of computation and memory are the square of the input resolution .
The author believes that this calculation limitation comes from the softmax self-attention.
General self-attention They are all standardized in Computing token The inner product of the feature is calculated , Keep this softmax The operation challenges the later linearization operation .
therefore , The author of this paper puts forward softmax-free transformer, SOFT, stay self-attention Removed from softmax, Use Gaussian kernel function instead of inner product , It can be approximated by low rank matrix decomposition self-attention matrix .

Two 、 Method
2.1 Softmax-free self-attention formulation

Input is X, To carry out attention , First, we need to get Q, K, V:

self-attention The calculation of is as follows :

α \alpha α For calculating self-attention map The process of , By nonlinear function β \beta β and relation function γ \gamma γ form :
The general form is as follows :
To simplify the calculation , The author uses the following method to replace :
In order to maintain attention matrix The symmetry of , The author set Q and K The mapping function of , therefore , In this paper, the self-attention matrix as follows :
2.2 Realize low rank normalization through matrix decomposition
In order to reduce the amount of calculation , The author refers to Nystrom[38] To achieve low rank matrix approximation , You don't have to calculate all self-attention.
Standardized self-attention matrix S ^ \hat S S^ as follows :
3、 ... and 、 effect
SOFT Its variants are as follows :


边栏推荐
- Ribbon学习笔记一
- [ml] PMML of machine learning model -- Overview
- Flink connector Oracle CDC 实时同步数据到MySQL(Oracle12c)
- Spring, summer, autumn and winter with Miss Zhang (4)
- Thinkphp6 pipeline mode pipeline use
- yum本地源制作
- 【Transformer】ACMix:On the Integration of Self-Attention and Convolution
- Machine learning makes character recognition easier: kotlin+mvvm+ Huawei ml Kit
- 数组的基础使用--遍历循环数组求出数组最大值,最小值以及最大值下标,最小值下标
- centos7 静默安装oracle
猜你喜欢

并发编程学习笔记 之 Lock锁及其实现类ReentrantLock、ReentrantReadWriteLock和StampedLock的基本用法

Are you sure you know the interaction problem of activity?

Technology that deeply understands the principle of MMAP and makes big manufacturers love it

【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer

Flutter 绘制技巧探索:一起画箭头(技巧拓展)

Android Studio 实现登录注册-源代码 (连接MySql数据库)

Exploration of flutter drawing skills: draw arrows together (skill development)

Bare metal cloud FASS high performance elastic block storage solution

ssm整合

这些你一定要知道的进程知识
随机推荐
centos7 静默安装oracle
Thinkphp6 pipeline mode pipeline use
NIFI 改UTC时间为CST时间
SQL repair duplicate data
DCAT batch operation popup and parameter transfer
How does PHP generate QR code?
Training log 4 of the project "construction of Shandong University mobile Internet development technology teaching website"
Flutter正在被悄悄放弃?浅析Flutter的未来
A preliminary study on fastjason's autotype
30 knowledge points that must be mastered in quantitative development [what is level-2 data]
Super simple integration HMS ml kit face detection to achieve cute stickers
Tear the ORM framework by hand (generic + annotation + reflection)
【Clustrmaps】访客统计
Thinkphp6 output QR code image format to solve the conflict with debug
Personal learning website
isAccessible()方法:使用反射技巧让你的性能提升数倍
Training log 7 of the project "construction of Shandong University mobile Internet development technology teaching website"
Detailed explanation of tool classes countdownlatch and cyclicbarrier of concurrent programming learning notes
mysql插入百万数据(使用函数和存储过程)
Exploration of flutter drawing skills: draw arrows together (skill development)