当前位置:网站首页>How to understand query, key and value in transformer
How to understand query, key and value in transformer
2022-06-28 01:33:00 【coast_ s】
-------------------------------------
Reprint : The original author yafee123
-------------------------------------
Transformer Come of 2017 An article in google brain Another divine article of 《Attention is all you need》, So far, you have led the way in NLP and CV Another research hotspot .
stay Transformer One of the most critical contributions of the is self-attention. It is to use the relationship between input samples to build an attention model .
self-attention Three very important elements have been introduced in : Query 、Key and Value.
hypothesis
Is the characteristic of an input sample sequence , among n Enter the number of samples for ( Sequence length ),d Is the latitude of a single sample .
Query、Key & Value The definition is as follows :
Query:
, among
, This matrix can be considered as proof of spatial transformation , The same below
Key:
, among 
Value:
, among 
For many people , Seeing these three concepts, I was confused . What are the three concepts and self-attention By what relationship , Why did you choose this name ?
【 Be careful : It's important to be careful here X 、Q、K、V Each line of represents an input sample , This is different from the definition that each column of a sample matrix is a sample , This is very important for understanding the following content .】
So this blog is to briefly explain the reasons for these three names .
To understand the meaning of these three concepts , First of all, understand self-attention What do you want in the end ?
The answer is : Given the current input sample
( Just to understand , We disassemble the input ), Produce an output , This output is the weighted sum of all samples in the sequence . Because it is assumed that this output can see all the input sample information , Then choose your own attention points according to different weights .
If you agree with this answer , Then it's easy to explain .
query 、 key & value The concept of is actually derived from the recommendation system . The basic principle is : Given a query, Calculation query And key The relevance of , And then according to query And key To find the most appropriate value. for instance : In the movie recommendation .query It's someone's preference for movies ( For example, points of interest 、 Age 、 Gender, etc )、key It's the type of film ( comedy 、 Age, etc )、value It's the movie to be recommended . In this case ,query, key and value Each attribute of the is in a different space , In fact, they have a certain potential relationship , That is to say, through some kind of transformation , It can make the attributes of the three in a similar space .
stay self-attention In the principle of , Current input sample
, Through spatial transformation, it becomes a query,
,
. Search items in analogy and recommendation system , We have to rely on query And key The relevance to retrieve what is needed value. that
Why key Well ?
Because according to the process of the recommended system , We are going to find query and key The relevance of , The simplest way is to dot product , Get the current sample and relation vector . And in the self-attention In operation , Will do the following
, such
Each element can be regarded as the current sample
And other samples in the sequence .
After obtaining the relationship between samples , It's natural , Only need to
Normalized and multiplied by V matrix , You can get self-attention The final weighted output of :
.
V Medium Every line Is a sample of the sequence .
, among O Output per dimension of , It is equivalent to the weighted sum of the corresponding latitudes of all input sequence samples , And weight is the relation vector
.( This matrix multiplication can be drawn by yourself ).
Since then , It can be concluded that :
1. self-attention The reason for this is that... In the recommendation system query、key 、value Three concepts , It uses a process similar to the recommendation system . however self-attention Not for query Look for value, But according to the present query obtain value Weighted sum of . This is a self-attention The task of , Want to find a better weighted output for the current input , The output should contain all visible input sequence information , Attention is controlled by weight .
2. self-attention Middle here key and value Is a transformation of the input sequence itself , Maybe it's also self-attention Another meaning of : Act at the same time as key and value. Actually, it's very reasonable , Because in the recommendation system , although key and value The original feature space of attributes is different , But they are strongly related , So they go through certain spatial transformations , Can be unified into a feature space . That's why self-attention To multiply by W One of the reasons .
--
Above contents , Continuous modification and optimization , Welcome to exchange and discussion
---
Reference material :
Attention is all you need:https://arxiv.org/pdf/1706.03762.pdf
Transformers in Vision: A Survey: https://arxiv.org/abs/2101.01169 [ Note in this article , About W^Q , W^K and W^V The definition of latitude is wrong , Don't be misled ]
A Survey on Visual Transformer:https://arxiv.org/abs/2012.12556
边栏推荐
- Overview and construction of redis master-slave replication, sentinel mode and cluster
- Huawei partners and Developers Conference 2022 | Kirin software cooperates with Huawei to jointly build the computing industry and create a digital intelligence future
- Flutter SliverAppBar全解析,你要的效果都在这了!
- DeepMind | 通过去噪来进行分子性质预测的预训练
- 力扣今日题-522. 最长特殊序列
- Transformer论文逐段精读
- JVM的内存模型简介
- Is it reliable to invest in exchange traded ETF funds? Is it safe to invest in exchange traded ETF funds
- 打新债注册账户安全吗,会有风险吗?
- Form forms and form elements (input, select, textarea, etc.)
猜你喜欢

什么是数字化?什么是数字化转型?为什么企业选择数字化转型?

给女朋友看的消息中间件

Squid proxy server (Web cache layer for cache acceleration)

Ten thousand words long article understanding business intelligence (BI) | recommended collection

SPuG - lightweight automatic operation and maintenance platform

【无标题】

Flutter SliverAppBar全解析,你要的效果都在这了!

Leetcode 720. 词典中最长的单词(为啥感觉这道题很难?)

Taro---day2---编译运行

Solon 1.8.3 发布,云原生微服务开发框架
随机推荐
投资场内ETF基金是靠谱吗,场内ETF基金安全吗
#796 Div.2 C. Manipulating History 思维
深入解析kubernetes controller-runtime
信息学奥赛一本通 1359:围成面积
Acwing game 57 [unfinished]
打新债注册账户安全吗,会有风险吗?
Proe/Creo产品结构设计-钻研不断
網頁鼠標點擊特效案例收集(直播間紅心同理)
[untitled]
Huawei partners and Developers Conference 2022 | Kirin software cooperates with Huawei to jointly build the computing industry and create a digital intelligence future
【无标题】
同花顺股票开户是会有什么风险吗?同花顺开户安全吗
独立站卖家都在用的五大电子邮件营销技巧,你知道吗?
Transformer论文逐段精读
Taro--- day1--- construction project
如何理解 Transformer 中的 Query、Key 与 Value
PostgreSQL设置自增字段
Is it reliable to invest in exchange traded ETF funds? Is it safe to invest in exchange traded ETF funds
From small to large, why do you always frown when talking about learning
同花顺上能炒股开户吗?安全吗?