当前位置:网站首页>Johnson–Lindenstrauss Lemma(2)
Johnson–Lindenstrauss Lemma(2)
2022-07-02 05:01:00 【FakeOccupational】
The long attention mechanism :
stay more than Next Ministry branch No District branch d k and d v , all send use d surface in . P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of t o k e n all two two Group close Don't distinguish in the rest d_k and d_v, Both use d Express .\tiny P Part of the calculation requires that the token Both in pairs stay more than Next Ministry branch No District branch dk and dv, all send use d surface in .P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of token all two two Group close
linformer Long attention in
A comparison of two kinds of attention :
choose Choose k < < n , With Next Of empty between And when between meter count Of complex miscellaneous degree by O ( n k ) choice k<<n, The complexity of the following space and time calculation is O(nk) choose Choose k<<n, With Next Of empty between And when between meter count Of complex miscellaneous degree by O(nk)
cast shadow Moment front E i , F i ∈ R n × k , K W i K & V W i V ∈ R n × d Projection matrix E_i,F_i\in R^{n×k}, \\ KW_i^K\ \& \ VW_i^V\in R^{n×d} cast shadow Moment front Ei,Fi∈Rn×k,KWiK & VWiV∈Rn×d
reason
Johnson–Lindenstrauss Lemma — Logarithmic dimension reduction to low rank theorem .
set The reason is 1 : s e l f − a t t e n t i o n yes low Rank Of Theorem 1:self-attention It is low rank set The reason is 1:self−attention yes low Rank Of
Both however P yes low Rank Of , send use rank paragraph Of S V D near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex . since P It is low rank , Use stage SVD Approximate experimental findings , matrix P Most of the information in can be recovered by a small number of the largest singular values . Both however P yes low Rank Of , send use rank paragraph Of SVD near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex .
set The reason is 2 : k by O ( d / ϵ 2 ) when , can With With ϵ Line sex forced near Theorem 2:k by O(d /\epsilon^2) when , We can use \epsilon Linear approximation set The reason is 2:k by O(d/ϵ2) when , can With With ϵ Line sex forced near
Article address :Linformer: Self-Attention with Linear Complexity
Wang sinang 、 Li Belinda 、 Madian · casa 、 Han Fang 、 Ma Hao
large transformer The model has achieved remarkable success in many natural language processing applications . However , For long sequences , The cost of training and deploying these models can be prohibitively high , Because the standard self attention mechanism of transformer is used in terms of sequence length O(n2) Time and space . In this paper , We prove that the self attention mechanism can be approximated by a low rank matrix . We further use this discovery to propose a new self attention mechanism , This mechanism changes the overall self attention complexity from O(n2) Down to O(n). The obtained linearity Transformer, Match the standard transformer model , At the same time, it has greater storage and timeliness performance .
The problem of rank reduction after projection 《Low-Rank Bottleneck in Multi-head Attention Models》
from On mark accurate s e l f − a t t e n t i o n send use s o f t m a x in e Q K T Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest . Because of the standard self-attention Use softmax in e^{QK^T} It is possible to raise the rank , And the high rank may not be maintained after projection , Maintain more information . from On mark accurate self−attention send use softmax in eQKT Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest .
边栏推荐
- The reason why sizeof (ARR) / sizeof (arr[0]) is used in the function to calculate the length of the array is incorrect
- Save the CDA from the disc to the computer
- Application d'un robot intelligent dans le domaine de l'agroécologie
- [bus interface] Axi interface
- Let genuine SMS pressure measurement open source code
- How do I interview for a successful software testing position? If you want to get a high salary, you must see the offer
- Mathematical knowledge (Euler function)
- What data does the main account of Zhengda Meiou 4 pay attention to?
- [quick view opencv] familiar with CV matrix operation with image splicing examples (3)
- The El cascader echo only selects the questions that are not displayed
猜你喜欢
Detailed process of DC-1 range construction and penetration practice (DC range Series)
2022 Alibaba global mathematics competition, question 4, huhushengwei (blind box problem, truck problem) solution ideas
Application of intelligent robot in agricultural ecology
将光盘中的cda保存到电脑中
Express logistics quick query method, set the unsigned doc No. to refresh and query automatically
[common error] the DDR type of FPGA device is selected incorrectly
Record my pytorch installation process and errors
洛谷入门3【循环结构】题单题解
Mathematical problems (number theory) trial division to judge prime numbers, decompose prime factors, and screen prime numbers
Pit encountered in win11 pytorch GPU installation
随机推荐
js中的Map(含leetcode例题)
Beginner crawler - biqu Pavilion crawler
[quick view opencv] familiar with CV matrix operation with image splicing examples (3)
删除排序数组中的重复项go语言实现
oracle 存储过程与job任务设置
geotrust ov多域名ssl證書一年兩千一百元包含幾個域名?
Learn AI safety monitoring project from zero [attach detailed code]
画波形图_数字IC
Video multiple effects production, fade in effect and border background are added at the same time
Idea automatic package import and automatic package deletion settings
国产全中文-自动化测试软件Apifox
Use of Baidu map
Simple and practical accounting software, so that accounts can be checked
Mathematical problems (number theory) trial division to judge prime numbers, decompose prime factors, and screen prime numbers
案例分享|智慧化的西部机场
Mouse events in JS
解析少儿编程中的动手搭建教程
Splice characters in {{}}
Embedded-c language-8-character pointer array / large program implementation
國產全中文-自動化測試軟件Apifox