当前位置:网站首页>Johnson–Lindenstrauss Lemma(2)
Johnson–Lindenstrauss Lemma(2)
2022-07-02 05:01:00 【FakeOccupational】
The long attention mechanism :

stay more than Next Ministry branch No District branch d k and d v , all send use d surface in . P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of t o k e n all two two Group close Don't distinguish in the rest d_k and d_v, Both use d Express .\tiny P Part of the calculation requires that the token Both in pairs stay more than Next Ministry branch No District branch dk and dv, all send use d surface in .P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of token all two two Group close
linformer Long attention in
A comparison of two kinds of attention :

choose Choose k < < n , With Next Of empty between And when between meter count Of complex miscellaneous degree by O ( n k ) choice k<<n, The complexity of the following space and time calculation is O(nk) choose Choose k<<n, With Next Of empty between And when between meter count Of complex miscellaneous degree by O(nk)
cast shadow Moment front E i , F i ∈ R n × k , K W i K & V W i V ∈ R n × d Projection matrix E_i,F_i\in R^{n×k}, \\ KW_i^K\ \& \ VW_i^V\in R^{n×d} cast shadow Moment front Ei,Fi∈Rn×k,KWiK & VWiV∈Rn×d
reason
Johnson–Lindenstrauss Lemma — Logarithmic dimension reduction to low rank theorem .
set The reason is 1 : s e l f − a t t e n t i o n yes low Rank Of Theorem 1:self-attention It is low rank set The reason is 1:self−attention yes low Rank Of 
Both however P yes low Rank Of , send use rank paragraph Of S V D near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex . since P It is low rank , Use stage SVD Approximate experimental findings , matrix P Most of the information in can be recovered by a small number of the largest singular values . Both however P yes low Rank Of , send use rank paragraph Of SVD near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex .
set The reason is 2 : k by O ( d / ϵ 2 ) when , can With With ϵ Line sex forced near Theorem 2:k by O(d /\epsilon^2) when , We can use \epsilon Linear approximation set The reason is 2:k by O(d/ϵ2) when , can With With ϵ Line sex forced near 
Article address :Linformer: Self-Attention with Linear Complexity
Wang sinang 、 Li Belinda 、 Madian · casa 、 Han Fang 、 Ma Hao
large transformer The model has achieved remarkable success in many natural language processing applications . However , For long sequences , The cost of training and deploying these models can be prohibitively high , Because the standard self attention mechanism of transformer is used in terms of sequence length O(n2) Time and space . In this paper , We prove that the self attention mechanism can be approximated by a low rank matrix . We further use this discovery to propose a new self attention mechanism , This mechanism changes the overall self attention complexity from O(n2) Down to O(n). The obtained linearity Transformer, Match the standard transformer model , At the same time, it has greater storage and timeliness performance .
The problem of rank reduction after projection 《Low-Rank Bottleneck in Multi-head Attention Models》
from On mark accurate s e l f − a t t e n t i o n send use s o f t m a x in e Q K T Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest . Because of the standard self-attention Use softmax in e^{QK^T} It is possible to raise the rank , And the high rank may not be maintained after projection , Maintain more information . from On mark accurate self−attention send use softmax in eQKT Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest .
边栏推荐
- Leetcode- insert and sort the linked list
- 06 装饰(Decorator)模式
- Future trend of automated testing ----- self healing technology
- 洛谷入门3【循环结构】题单题解
- UNET deployment based on deepstream
- Mathematical knowledge (Euler function)
- Online incremental migration of DM database
- Tawang food industry insight | current situation, consumption data and trend analysis of domestic infant complementary food market
- Learn AI safety monitoring project from zero [attach detailed code]
- Lm09 Fisher inverse transform inversion mesh strategy
猜你喜欢

农业生态领域智能机器人的应用

Rhcsa --- work on the fourth day

One step implementation of yolox helmet detection (combined with oak intelligent depth camera)

Gin framework learning code

Analyze the space occupied by the table according to segments, clusters and pages

Record the bug of unity 2020.3.31f1 once

How to write a client-side technical solution

Leetcode- insert and sort the linked list

How to modify data file path in DM database

Video cover image setting, put cover images into multiple videos in the simplest way
随机推荐
Solution of DM database unable to open graphical interface
Video multiple effects production, fade in effect and border background are added at the same time
Change deepin to Alibaba image source
el-cascader回显只选中不显示的问题
Lm09 Fisher inverse transform inversion mesh strategy
UNET deployment based on deepstream
VMware installation win10 reports an error: operating system not found
Getting started with pytest -- description of fixture parameters
Leetcode- insert and sort the linked list
How to recover deleted data in disk
Record my pytorch installation process and errors
画波形图_数字IC
Virtual machine installation deepin system
Pyflink writes MySQL examples with JDBC
Pit encountered in win11 pytorch GPU installation
[quick view opencv] familiar with CV matrix operation with image splicing examples (3)
设置滚动条默认样式 谷歌浏览器
Steam教育的实际问题解决能力
JS interview collection test question 1
What data does the main account of Zhengda Meiou 4 pay attention to?