当前位置:网站首页>Replace self attention with MLP
Replace self attention with MLP
2022-07-02 07:51:00 【MezereonXP】
List of articles
use MLP Instead of Self-Attention
This is a job of Tsinghua University “Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks”
Replace... With two linear layers Self-Attention Mechanism , In the end, it can improve the speed while maintaining the accuracy .
What's surprising about this job is , We can use MLP Instead of Attention Mechanism , This makes it necessary for us to reconsider Attention The nature of the performance improvement .
Transformer Medium Self-Attention Mechanism
First , As shown in the figure below :
We give its formal result :
A = softmax ( Q K T d k ) F o u t = A V A = \text{softmax}(\frac{QK^T}{\sqrt{d_k}})\\ F_{out} = AV A=softmax(dkQKT)Fout=AV
among , Q , K ∈ R N × d ′ Q,K \in \mathbb{R}^{N\times d'} Q,K∈RN×d′ meanwhile V ∈ R N × d V\in \mathbb{R}^{N\times d} V∈RN×d
here , We give a simplified version , As shown in the figure below :
Also is to Q , K , V Q,K,V Q,K,V All based on input features F F F Instead of , It is formalized as :
A = softmax ( F F T ) F o u t = A F A = \text{softmax}(FF^T)\\ F_{out} = AF A=softmax(FFT)Fout=AF
However , The computational complexity is O ( d N 2 ) O(dN^2) O(dN2), This is a Attention A big drawback of the mechanism .
External attention (External Attention)
As shown in the figure below :
Two matrices are introduced M k ∈ R S × d M_k\in \mathbb{R}^{S\times d} Mk∈RS×d as well as $M_v \in\mathbb{R}^{S\times d} $, Instead of the original K , V K,V K,V
Here we give its formalization directly :
A = Norm ( F M k T ) F o u t = A M v A = \text{Norm}(FM_k^T)\\ F_{out} = AM_v A=Norm(FMkT)Fout=AMv
This design , Reduce the complexity to O ( d S N ) O(dSN) O(dSN), The work found that , When S ≪ N S\ll N S≪N When , Still able to maintain enough accuracy .
Among them Norm ( ⋅ ) \text{Norm}(\cdot) Norm(⋅) The operation is to start with the column Softmax, Then normalize the rows .
experimental analysis
First , The article will Transformer Medium Attention The mechanism replaced , And then test on all kinds of tasks , Include :
- Image classification
- Semantic segmentation
- Image generation
- Point cloud classification
- Point cloud segmentation
Only partial results are given here , Briefly explain the accuracy loss after replacement .
Image classification
Semantic segmentation
Image generation
You can see , On different tasks , There's basically no loss of accuracy .
边栏推荐
- 【AutoAugment】《AutoAugment:Learning Augmentation Policies from Data》
- [Sparse to Dense] Sparse to Dense: Depth Prediction from Sparse Depth samples and a Single Image
- 【雙目視覺】雙目矯正
- [binocular vision] binocular correction
- Faster-ILOD、maskrcnn_ Benchmark training coco data set and problem summary
- Faster-ILOD、maskrcnn_benchmark安装过程及遇到问题
- Faster-ILOD、maskrcnn_benchmark训练自己的voc数据集及问题汇总
- 【Wing Loss】《Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks》
- win10解决IE浏览器安装不上的问题
- ABM thesis translation
猜你喜欢
Faster-ILOD、maskrcnn_ Benchmark trains its own VOC data set and problem summary
iOD及Detectron2搭建过程问题记录
Installation and use of image data crawling tool Image Downloader
常见CNN网络创新点
Record of problems in the construction process of IOD and detectron2
自然辩证辨析题整理
Hystrix dashboard cannot find hystrix Stream solution
【Mixup】《Mixup:Beyond Empirical Risk Minimization》
程序的内存模型
用MLP代替掉Self-Attention
随机推荐
(15) Flick custom source
mmdetection训练自己的数据集--CVAT标注文件导出coco格式及相关操作
conda常用命令
Conversion of numerical amount into capital figures in PHP
MMDetection模型微调
【Batch】learning notes
Determine whether the version number is continuous in PHP
[mixup] mixup: Beyond Imperial Risk Minimization
超时停靠视频生成
Timeout docking video generation
传统目标检测笔记1__ Viola Jones
Machine learning theory learning: perceptron
Network metering - transport layer
Semi supervised mixpatch
【MobileNet V3】《Searching for MobileNetV3》
半监督之mixmatch
基于onnxruntime的YOLOv5单张图片检测实现
图片数据爬取工具Image-Downloader的安装和使用
程序的内存模型
Sorting out dialectics of nature