当前位置:网站首页>Johnson–Lindenstrauss Lemma(2)
Johnson–Lindenstrauss Lemma(2)
2022-07-02 05:01:00 【FakeOccupational】
The long attention mechanism :

stay more than Next Ministry branch No District branch d k and d v , all send use d surface in . P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of t o k e n all two two Group close Don't distinguish in the rest d_k and d_v, Both use d Express .\tiny P Part of the calculation requires that the token Both in pairs stay more than Next Ministry branch No District branch dk and dv, all send use d surface in .P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of token all two two Group close
linformer Long attention in
A comparison of two kinds of attention :

choose Choose k < < n , With Next Of empty between And when between meter count Of complex miscellaneous degree by O ( n k ) choice k<<n, The complexity of the following space and time calculation is O(nk) choose Choose k<<n, With Next Of empty between And when between meter count Of complex miscellaneous degree by O(nk)
cast shadow Moment front E i , F i ∈ R n × k , K W i K & V W i V ∈ R n × d Projection matrix E_i,F_i\in R^{n×k}, \\ KW_i^K\ \& \ VW_i^V\in R^{n×d} cast shadow Moment front Ei,Fi∈Rn×k,KWiK & VWiV∈Rn×d
reason
Johnson–Lindenstrauss Lemma — Logarithmic dimension reduction to low rank theorem .
set The reason is 1 : s e l f − a t t e n t i o n yes low Rank Of Theorem 1:self-attention It is low rank set The reason is 1:self−attention yes low Rank Of 
Both however P yes low Rank Of , send use rank paragraph Of S V D near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex . since P It is low rank , Use stage SVD Approximate experimental findings , matrix P Most of the information in can be recovered by a small number of the largest singular values . Both however P yes low Rank Of , send use rank paragraph Of SVD near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex .
set The reason is 2 : k by O ( d / ϵ 2 ) when , can With With ϵ Line sex forced near Theorem 2:k by O(d /\epsilon^2) when , We can use \epsilon Linear approximation set The reason is 2:k by O(d/ϵ2) when , can With With ϵ Line sex forced near 
Article address :Linformer: Self-Attention with Linear Complexity
Wang sinang 、 Li Belinda 、 Madian · casa 、 Han Fang 、 Ma Hao
large transformer The model has achieved remarkable success in many natural language processing applications . However , For long sequences , The cost of training and deploying these models can be prohibitively high , Because the standard self attention mechanism of transformer is used in terms of sequence length O(n2) Time and space . In this paper , We prove that the self attention mechanism can be approximated by a low rank matrix . We further use this discovery to propose a new self attention mechanism , This mechanism changes the overall self attention complexity from O(n2) Down to O(n). The obtained linearity Transformer, Match the standard transformer model , At the same time, it has greater storage and timeliness performance .
The problem of rank reduction after projection 《Low-Rank Bottleneck in Multi-head Attention Models》
from On mark accurate s e l f − a t t e n t i o n send use s o f t m a x in e Q K T Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest . Because of the standard self-attention Use softmax in e^{QK^T} It is possible to raise the rank , And the high rank may not be maintained after projection , Maintain more information . from On mark accurate self−attention send use softmax in eQKT Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest .
边栏推荐
- Mathematical problems (number theory) trial division to judge prime numbers, decompose prime factors, and screen prime numbers
- 2022 Alibaba global mathematics competition, question 4, huhushengwei (blind box problem, truck problem) solution ideas
- 农业生态领域智能机器人的应用
- The underlying principle of go map (storage and capacity expansion)
- Steam教育的实际问题解决能力
- Rhcsa --- work on the third day
- Cannot activate CONDA virtual environment in vscode
- DC-1靶场搭建及渗透实战详细过程(DC靶场系列)
- Let genuine SMS pressure measurement open source code
- Here comes the chicken soup! Keep this quick guide for data analysts
猜你喜欢
![[common error] the DDR type of FPGA device is selected incorrectly](/img/f3/be66bcfafeed581add6d48654dfe34.jpg)
[common error] the DDR type of FPGA device is selected incorrectly

Typescript function details

Getting started with pytest ----- confitest Application of PY

Express logistics quick query method, set the unsigned doc No. to refresh and query automatically

2022阿里巴巴全球数学竞赛 第4题 虎虎生威(盲盒问题、集卡问题)解决思路

Unity particle Foundation

Tawang food industry insight | current situation, consumption data and trend analysis of domestic infant complementary food market

06 装饰(Decorator)模式

Save the CDA from the disc to the computer

Line by line explanation of yolox source code of anchor free series network (7) -- obj in head_ loss、Cls_ Loss and reg_ Calculation and reverse transmission of loss I
随机推荐
Lay the foundation for children's programming to become a basic discipline
Starting from the classification of database, I understand the map database
Promise all()
Ansible installation and use
Summary of main account information of zhengdaliu 4
Preparation for writing SAP ui5 applications using typescript
Express logistics quick query method, set the unsigned doc No. to refresh and query automatically
Video cover image setting, put cover images into multiple videos in the simplest way
js中的Map(含leetcode例题)
Cubemx DMA notes
Change deepin to Alibaba image source
Splice characters in {{}}
Virtual machine installation deepin system
Gin framework learning code
Realize the function of data uploading
Analyzing the hands-on building tutorial in children's programming
What data does the main account of Zhengda Meiou 4 pay attention to?
Here comes the chicken soup! Keep this quick guide for data analysts
Hcip day 17
What are the rules and trading hours of agricultural futures contracts? How much is the handling fee deposit?