当前位置:网站首页>【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
2022-07-29 05:21:00 【呆呆的猫】

本文收录于 NeurIPS 2021
论文链接:https://arxiv.org/pdf/2110.11945.pdf
代码链接:https://github.com/fudan-zvg/SOFT
一、背景
基于 self-attention 的 transformer 虽然取得了较好的效果,但其计算量和内存都和是输入分辨率大小的平方。
作者认为这种计算限制来源于计算概率时使用的 softmax self-attention。
一般的 self-attention 都是计算规范化的 token 特征的内积计算得到,保持这种 softmax 操作对后面的线性化操作有些挑战。
所以,本文作者提出了 softmax-free transformer, SOFT,在 self-attention 中移除了 softmax,使用高斯核函数来代替内积,能够通过低秩矩阵分解来近似得到 self-attention 矩阵。

二、方法
2.1 Softmax-free self-attention formulation

输入为 X,要进行 attention ,首先要通过映射得到 Q, K, V:

self-attention 的计算如下:

α \alpha α 为计算 self-attention map 的过程,由非线性函数 β \beta β 和 relation function γ \gamma γ 组成:
一般的形式如下:
为了简化计算,作者使用如下方式代替:
为了保持 attention matrix 的对称性,作者设定 Q 和 K 的映射函数一样,所以,本文的 self-attention matrix 如下:
2.2 通过矩阵分解来实现低秩规范化
为了降低计算量,作者参考 Nystrom[38] 来实现低秩矩阵近似,可以不用计算全部的 self-attention。
规范化的 self-attention matrix S ^ \hat S S^ 如下:
三、效果
SOFT 及其变体如下:


边栏推荐
- day02 作业之文件权限
- Show profiles of MySQL is used.
- 重庆大道云行作为软件产业代表受邀参加渝中区重点项目签约仪式
- Spring, summer, autumn and winter with Miss Zhang (3)
- [go] use of defer
- 钉钉告警脚本
- Training log II of the project "construction of Shandong University mobile Internet development technology teaching website"
- Breaking through the hardware bottleneck (I): the development of Intel Architecture and bottleneck mining
- Basic use of array -- traverse the circular array to find the maximum value, minimum value, maximum subscript and minimum subscript of the array
- Study and research the way of programming
猜你喜欢

Ribbon learning notes II

【Attention】Visual Attention Network

【TensorRT】将 PyTorch 转化为可部署的 TensorRT

【图像分类】如何使用 mmclassification 训练自己的分类模型

中海油集团,桌面云&网盘存储系统应用案例

【语义分割】Mapillary 数据集简介

datax安装

Training log 7 of the project "construction of Shandong University mobile Internet development technology teaching website"

Thinkphp6 output QR code image format to solve the conflict with debug

"Shandong University mobile Internet development technology teaching website construction" project training log I
随机推荐
Spring, summer, autumn and winter with Miss Zhang (5)
Synchronous development with open source projects & codereview & pull request & Fork how to pull the original warehouse
并发编程学习笔记 之 ReentrantLock实现原理的探究
【语义分割】SETR_Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformer
【目标检测】6、SSD
【ML】机器学习模型之PMML--概述
C # judge whether the user accesses by mobile phone or computer
Research and implementation of flash loan DAPP
Most PHP programmers don't understand how to deploy safe code
Training log II of the project "construction of Shandong University mobile Internet development technology teaching website"
【图像分类】如何使用 mmclassification 训练自己的分类模型
These process knowledge you must know
isAccessible()方法:使用反射技巧让你的性能提升数倍
Nifi changed UTC time to CST time
Xsan is highly available - xdfs and San are integrated with new vitality
Spring, summer, autumn and winter with Miss Zhang (3)
与张小姐的春夏秋冬(2)
Process management of day02 operation
Detailed explanation of MySQL statistical function count
How to PR an open source composer project