当前位置:网站首页>Self attention learning notes
Self attention learning notes
2022-07-28 06:06:00 【Alan and fish】
1. introduce Slef-Attention Why

In natural language processing , Use RNN( This refers to LSTM) When processing input and output data ,LSTM It can solve long text dependency , Because he can rely on the previous text , And can't do parallel computing , Resulting in very slow operation .
So many scholars will use CNN To replace RNN,CNN You need to stack many layers , You can see all the sequence information , And can calculate in parallel . But there is a problem , You need to stack many layers , This also indirectly leads to low efficiency .
So the introduction of self-attention Mechanism , We can solve these two problems :
- 1. See the dependence of each node on all nodes
- 2. Superposition operation can be carried out
As shown on the right ,b1 Can depend on a1,a2,a3,a4,b2 So it is with .
2.self-attention Principle explanation
2.1 Explain the general principle

- 1. Calculation a
x1,x2,x3,x4 Will multiply by a matrix W obtain a1,a2,a2,a3. - 2. Calculation q,k,v
adopt a With a matrix w The calculation shows that q,k,v, Three values
The function and calculation process of each value are as follows :
q:query( Used to match other values ), qi=Wqai
k:key( Used to be matched ), ki=Wkai
v: Extracted information , vi=Wvai - 3. Calculation α \alpha α

Every one of them query q To every key k do attention, In fact, that is q1 And ki Do dot multiplication
among : α \alpha α1,i=q1ki d \sqrt{d} d - 4. Calculation α ^ \widehat{\alpha} α
This algorithm is to put all α < s u b > 1 , i < / s u b > \alpha<sub>1,i</sub> α<sub>1,i</sub> Add them together , And then there's a soft-max Output , Get every one α \alpha α Probability distribution of .
- 5. Calculation b
Will be α ^ \widehat{\alpha} α With each vi Do a dot product , And then add it up , Got it. b, That is, the final output .
The whole process is self-attention Mechanism , Calculate the dependencies between each node and other nodes .
2.2 Mathematical calculation
- q,k,v Matrix calculation of

because q yes wq With every one a The result of dot multiplication , So you can put all a As a matrix , Namely wq And a The result of matrix calculation , In this way, parallel computing is achieved .
k,v So is the calculation process of . - Calculation α \alpha α

α \alpha α By q1 With every one k The result of the calculation ( Ignore d \sqrt{d} d), So you can put all k As a matrix , This is the k Matrix and q Matrix calculation of . - Calculation α ^ \widehat{\alpha} α

Put the previous calculation α \alpha α Put one in soft-max Function to get α ^ \widehat{\alpha} α - Calculation b

take α ^ \widehat{\alpha} α And v Matrix dot multiplication , Then add up all the dot multiplication results to get b
The whole process is abstracted as shown in the figure below :
边栏推荐
- Sort method for sorting
- 【六】redis缓存策略
- CentOS7 安装Mysql
- Installing redis under Linux (centos7)
- 速查表之各种编程语言小数|时间|base64等操作
- Micro service architecture cognition and service governance Eureka
- 【二】redis基础命令与使用场景
- trino函数随记
- 区分实时数据、离线数据、流式数据以及批量数据的区别
- The combination of cultural tourism and digital collections has a significant effect, but how to support users' continuous purchasing power
猜你喜欢

4个角度教你选小程序开发工具?

微信团购小程序怎么做?一般要多少钱?

记录下在线扩容服务器遇到的问题 NOCHANGE: partition 1 is size 419428319. it cannot be grown

小程序开发系统有哪些优点?为什么要选择它?

On how digital collections and entities can empower each other

微服务架构认知、服务治理-Eureka

Redis 主从架构的搭建

Dataset类分批加载数据集

Installing redis under Linux (centos7)

2: Why read write separation
随机推荐
1: Enable slow query log and find slow SQL
Sort method for sorting
Two methods of covering duplicate records in tables in MySQL
Digital collections "chaos", 100 billion market change is coming?
raise RuntimeError(‘DataLoader worker (pid(s) {}) exited unexpectedly‘.format(pids_str))RuntimeErro
1: Why should databases be divided into databases and tables
【七】redis缓存与数据库数据一致性
Use Python to encapsulate a tool class that sends mail regularly
分布式集群架构场景优化解决方案:分布式调度问题
Mars number * word * Tibet * product * Pingtai defender plan details announced
微信小程序开发费用制作费用是多少?
面试官:让你设计一套图片加载框架,你会怎么设计?
速查表之转MD5
2: Why read write separation
How digital library realizes Web3.0 social networking
【四】redis持久化(RDB与AOF)
At the moment of the epidemic, online and offline travelers are trapped. Can the digital collection be released?
JS simple publish and subscribe class
Construction of redis master-slave architecture
Notice of attack: [bean Bingbing] send, sell, cash, draw, prize, etc