当前位置:网站首页>Johnson–Lindenstrauss Lemma(2)
Johnson–Lindenstrauss Lemma(2)
2022-07-02 05:01:00 【FakeOccupational】
The long attention mechanism :
stay more than Next Ministry branch No District branch d k and d v , all send use d surface in . P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of t o k e n all two two Group close Don't distinguish in the rest d_k and d_v, Both use d Express .\tiny P Part of the calculation requires that the token Both in pairs stay more than Next Ministry branch No District branch dk and dv, all send use d surface in .P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of token all two two Group close
linformer Long attention in
A comparison of two kinds of attention :
choose Choose k < < n , With Next Of empty between And when between meter count Of complex miscellaneous degree by O ( n k ) choice k<<n, The complexity of the following space and time calculation is O(nk) choose Choose k<<n, With Next Of empty between And when between meter count Of complex miscellaneous degree by O(nk)
cast shadow Moment front E i , F i ∈ R n × k , K W i K & V W i V ∈ R n × d Projection matrix E_i,F_i\in R^{n×k}, \\ KW_i^K\ \& \ VW_i^V\in R^{n×d} cast shadow Moment front Ei,Fi∈Rn×k,KWiK & VWiV∈Rn×d
reason
Johnson–Lindenstrauss Lemma — Logarithmic dimension reduction to low rank theorem .
set The reason is 1 : s e l f − a t t e n t i o n yes low Rank Of Theorem 1:self-attention It is low rank set The reason is 1:self−attention yes low Rank Of
Both however P yes low Rank Of , send use rank paragraph Of S V D near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex . since P It is low rank , Use stage SVD Approximate experimental findings , matrix P Most of the information in can be recovered by a small number of the largest singular values . Both however P yes low Rank Of , send use rank paragraph Of SVD near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex .
set The reason is 2 : k by O ( d / ϵ 2 ) when , can With With ϵ Line sex forced near Theorem 2:k by O(d /\epsilon^2) when , We can use \epsilon Linear approximation set The reason is 2:k by O(d/ϵ2) when , can With With ϵ Line sex forced near
Article address :Linformer: Self-Attention with Linear Complexity
Wang sinang 、 Li Belinda 、 Madian · casa 、 Han Fang 、 Ma Hao
large transformer The model has achieved remarkable success in many natural language processing applications . However , For long sequences , The cost of training and deploying these models can be prohibitively high , Because the standard self attention mechanism of transformer is used in terms of sequence length O(n2) Time and space . In this paper , We prove that the self attention mechanism can be approximated by a low rank matrix . We further use this discovery to propose a new self attention mechanism , This mechanism changes the overall self attention complexity from O(n2) Down to O(n). The obtained linearity Transformer, Match the standard transformer model , At the same time, it has greater storage and timeliness performance .
The problem of rank reduction after projection 《Low-Rank Bottleneck in Multi-head Attention Models》
from On mark accurate s e l f − a t t e n t i o n send use s o f t m a x in e Q K T Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest . Because of the standard self-attention Use softmax in e^{QK^T} It is possible to raise the rank , And the high rank may not be maintained after projection , Maintain more information . from On mark accurate self−attention send use softmax in eQKT Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest .
边栏推荐
- I sorted out some basic questions about opencv AI kit.
- [quick view opencv] familiar with CV matrix operation with image splicing examples (3)
- fastText文本分类
- Here comes the chicken soup! Keep this quick guide for data analysts
- [understand one article] FD_ Use of set
- One step implementation of yolox helmet detection (combined with oak intelligent depth camera)
- 数学问题(数论)试除法做质数的判断、分解质因数,筛质数
- C case of communication between server and client based on mqttnet
- Mouse events in JS
- Lm09 Fisher inverse transform inversion mesh strategy
猜你喜欢
关于Steam 教育的知识整理
How to configure PostgreSQL 12.9 to allow remote connections
解决:代理抛出异常错误
AcrelEMS高速公路微电网能效管理平台与智能照明解决方案智慧点亮隧道
Practical problem solving ability of steam Education
06 decorator mode
Let genuine SMS pressure measurement open source code
Getting started with pytest -- description of fixture parameters
Embedded-c language-8-character pointer array / large program implementation
Tawang food industry insight | current situation, consumption data and trend analysis of domestic infant complementary food market
随机推荐
Idea automatic package import and automatic package deletion settings
Mysql database learning
初学爬虫-笔趣阁爬虫
Go Chan's underlying principles
geotrust ov多域名ssl證書一年兩千一百元包含幾個域名?
关于Steam 教育的知识整理
About PROFIBUS: communication backbone network of production plant
Knowledge arrangement about steam Education
6.30 year end summary, end of student age
Map in JS (including leetcode examples)
Several methods of capturing packets under CS framework
Hcip day 17
解析少儿编程中的动手搭建教程
Go GC garbage collection notes (three color mark)
Cubemx DMA notes
Lm09 Fisher inverse transform inversion mesh strategy
农业生态领域智能机器人的应用
Rhcsa --- work on the fourth day
Interview question: do you know the difference between deep copy and shallow copy? What is a reference copy?
从数组中找出和为目标的下标