当前位置:网站首页>Johnson–Lindenstrauss Lemma(2)
Johnson–Lindenstrauss Lemma(2)
2022-07-02 05:01:00 【FakeOccupational】
The long attention mechanism :

stay more than Next Ministry branch No District branch d k and d v , all send use d surface in . P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of t o k e n all two two Group close Don't distinguish in the rest d_k and d_v, Both use d Express .\tiny P Part of the calculation requires that the token Both in pairs stay more than Next Ministry branch No District branch dk and dv, all send use d surface in .P Ministry branch Of meter count Need to be want hold order Column in Every time individual position Set up Of token all two two Group close
linformer Long attention in
A comparison of two kinds of attention :

choose Choose k < < n , With Next Of empty between And when between meter count Of complex miscellaneous degree by O ( n k ) choice k<<n, The complexity of the following space and time calculation is O(nk) choose Choose k<<n, With Next Of empty between And when between meter count Of complex miscellaneous degree by O(nk)
cast shadow Moment front E i , F i ∈ R n × k , K W i K & V W i V ∈ R n × d Projection matrix E_i,F_i\in R^{n×k}, \\ KW_i^K\ \& \ VW_i^V\in R^{n×d} cast shadow Moment front Ei,Fi∈Rn×k,KWiK & VWiV∈Rn×d
reason
Johnson–Lindenstrauss Lemma — Logarithmic dimension reduction to low rank theorem .
set The reason is 1 : s e l f − a t t e n t i o n yes low Rank Of Theorem 1:self-attention It is low rank set The reason is 1:self−attention yes low Rank Of 
Both however P yes low Rank Of , send use rank paragraph Of S V D near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex . since P It is low rank , Use stage SVD Approximate experimental findings , matrix P Most of the information in can be recovered by a small number of the largest singular values . Both however P yes low Rank Of , send use rank paragraph Of SVD near like real Examination Hair present , Moment front P in Of Big Ministry branch Letter Rest all can With from Less The amount most Big Of p. different value Come on Hui complex .
set The reason is 2 : k by O ( d / ϵ 2 ) when , can With With ϵ Line sex forced near Theorem 2:k by O(d /\epsilon^2) when , We can use \epsilon Linear approximation set The reason is 2:k by O(d/ϵ2) when , can With With ϵ Line sex forced near 
Article address :Linformer: Self-Attention with Linear Complexity
Wang sinang 、 Li Belinda 、 Madian · casa 、 Han Fang 、 Ma Hao
large transformer The model has achieved remarkable success in many natural language processing applications . However , For long sequences , The cost of training and deploying these models can be prohibitively high , Because the standard self attention mechanism of transformer is used in terms of sequence length O(n2) Time and space . In this paper , We prove that the self attention mechanism can be approximated by a low rank matrix . We further use this discovery to propose a new self attention mechanism , This mechanism changes the overall self attention complexity from O(n2) Down to O(n). The obtained linearity Transformer, Match the standard transformer model , At the same time, it has greater storage and timeliness performance .
The problem of rank reduction after projection 《Low-Rank Bottleneck in Multi-head Attention Models》
from On mark accurate s e l f − a t t e n t i o n send use s o f t m a x in e Q K T Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest . Because of the standard self-attention Use softmax in e^{QK^T} It is possible to raise the rank , And the high rank may not be maintained after projection , Maintain more information . from On mark accurate self−attention send use softmax in eQKT Yes can can l Rank , and cast shadow after can can nothing Law Protect a high Rank , dimension a more many Of Letter Rest .
边栏推荐
- Steam教育的实际问题解决能力
- Orthogonal test method and function diagram method for test case design
- How do I interview for a successful software testing position? If you want to get a high salary, you must see the offer
- oracle 存储过程与job任务设置
- 解决:代理抛出异常错误
- 10 minute quick start UI automation ----- puppeter
- Rhcsa --- work on the third day
- leetcode两数相加go实现
- Exercise notes 13 (effective letter ectopic words)
- Gin framework learning code
猜你喜欢

解析少儿编程中的动手搭建教程

Idea autoguide package and autodelete package Settings

洛谷入门3【循环结构】题单题解

10 minute quick start UI automation ----- puppeter

Application d'un robot intelligent dans le domaine de l'agroécologie

List of common bugs in software testing

C case of communication between server and client based on mqttnet

el-cascader回显只选中不显示的问题

Simple and practical accounting software, so that accounts can be checked

Solution: the agent throws an exception error
随机推荐
Getting started with pytest -- description of fixture parameters
Vmware安装win10报错:operating system not found
培养中小学生对教育机器人的热爱之心
Ansible installation and use
解析少儿编程中的动手搭建教程
Express logistics quick query method, set the unsigned doc No. to refresh and query automatically
Beginner crawler - biqu Pavilion crawler
Here comes the chicken soup! Keep this quick guide for data analysts
初学爬虫-笔趣阁爬虫
Rhcsa --- work on the fourth day
Basic differences between Oracle and MySQL (entry level)
[high speed bus] Introduction to jesd204b
Gin framework learning code
Interview question: do you know the difference between deep copy and shallow copy? What is a reference copy?
奠定少儿编程成为基础学科的原理
Oracle stored procedure and job task setting
Design and implementation of general interface open platform - (44) log processing of API services
List of common bugs in software testing
Tawang food industry insight | current situation, consumption data and trend analysis of domestic infant complementary food market
函数中使用sizeof(arr) / sizeof(arr[0])求数组长度不正确的原因