当前位置:网站首页>Principle of attention mechanism
Principle of attention mechanism
2022-07-07 03:22:00 【Master Ma】
Attention Mechanisms have been used in images in recent years , Important breakthroughs have been made in natural language processing and other fields , It is proved to be beneficial to improve the performance of the model .Attention The mechanism itself is also in line with the perception mechanism of the human brain and human eyes , This time we mainly take the field of computer vision as an example , about Attention The principle of mechanism , Application and model development .
1 Attention Mechanism and saliency diagram
1.1 What is the Attention Mechanism
So-called Attention Mechanism , That is, the mechanism of focusing on local information , For example, an image area in an image . As the task changes , Areas of attention tend to change .
Facing the picture above , If you just look at it as a whole , I only saw a lot of heads , But you can't look closer one by one , All talented scientists .
In fact, all the information in the picture except the face is useless , Can't do anything ,Attention The mechanism is to find the most useful information , You can imagine that the simplest scene is to detect faces from photos .
1.2 be based on Attention Significant target detection
One task that accompanies attention mechanism is salient target detection , namely salient object detection. Its input is a graph , The output is a probability map , The greater the probability , The greater the probability that the representative is an important target in the image , That is, the focus of human eyes , A typical saliency diagram is as follows :
The right figure is the salient figure of the left figure , In the head position, the probability is the greatest , And the legs , The tail also has a high probability , This is the really useful information in the figure .
Significant target detection requires a data set , The collection of such data sets is obtained by tracking the attention direction of multiple experimenters' eyeballs in a certain period of time , Typical steps are as follows :
(1) Let the subject observe the diagram .
(2) use eye tracker Record the eye's attention position .
(3) Gaussian filtering is used to synthesize the attention positions of all testers .
(4) The result is 0~1 The probability of .
So you can get the following picture , The second line is the result of eye tracking , The third line is the probability diagram of significant target .
The above is all about the spatial attention mechanism , That is, focus on different spatial locations , And in the CNN In structure , There are different characteristic channels , Therefore, different characteristic channels have similar principles , Let's talk about .
2 Attention Model architecture
The essence of attention mechanism is to locate the information of interest , Suppress useless information , The results are usually displayed in the form of probability map or probability eigenvector , In principle , It is mainly divided into spatial attention model , Channel attention model , Three kinds of spatial and channel mixed attention models , There is no distinction between soft and hard attention.
2.1 Spatial attention model (spatial attention)
Not all areas in the image contribute equally to the task , Only task related areas need to be concerned , For example, the main body of the classification task , Spatial attention model is to find the most important part of the network for processing .
Here we introduce two representative models , The first one is Google DeepMind Proposed STN The Internet (Spatial Transformer Network[1]). It learns the deformation of input , So as to complete the preprocessing operation suitable for the task , It's a space-based Attention Model , The network structure is as follows :
there Localization Net Used to generate affine transformation coefficients , Input is C×H×W The image of dimension , The output is a spatial transformation coefficient , Its size depends on the type of transformation to learn , If it's an affine transformation , It is a 6 Dimension vector .
The effect of such a network is shown in the figure below :
That is, locate the position of the target , Then perform operations such as rotation , Make the input sample easier to learn . This is a one-step adjustment solution , Of course, there are many iterative adjustment schemes
Compared with Spatial Transformer Networks One step to complete the target positioning and affine transformation adjustment ,Dynamic Capacity Networks[2] Two sub networks are used , They are low-performance subnetworks (coarse model) And high-performance subnetworks (fine model). Low performance subnetworks (coarse model) Used to process the whole picture , Locate the region of interest , As shown in the following figure fc. High performance subnetworks (fine model) Then the region of interest is refined , As shown in the following figure ff. Use both together , Lower computational cost and higher accuracy can be obtained .
Because in most cases, the region of interest is only a small part of the image , Therefore, the essence of spatial attention is to locate the target and make some changes or obtain weights .
2.2 Channel attention mechanism
For input 2 Dimensional image of CNN Come on , One dimension is the scale space of the image , Length and width , Another dimension is the channel , Therefore, channel based Attention It is also a very common mechanism .
SENet(Sequeeze and Excitation Net)[3] yes 2017 the ImageNet The champion network of classification competition , It is essentially a channel based Attention Model , It models the importance of each feature channel , Then enhance or suppress different channels for different tasks , Schematic diagram is as follows .
After normal convolution operation, a bypass branch is separated , First of all to Squeeze operation ( In the picture Fsq(·)), It compresses the spatial dimension , That is, each two-dimensional characteristic graph becomes a real number , It is equivalent to pooling operation with global receptive field , The number of characteristic channels remains unchanged .
And then there was Excitation operation ( That is... In the picture Fex(·)), It's through parameters w Generate weights for each feature channel ,w It is learned to explicitly model the correlation between feature channels . in an article , Used a 2 layer bottleneck structure ( First reduce the dimension and then increase the dimension ) The full connection layer of +Sigmoid Function to implement .
After getting the weight of each feature channel , This weight is applied to each original feature channel , Based on specific tasks , You can learn the importance of different channels .
The mechanism is applied to several benchmark models , With a small amount of calculation added , More significant performance improvement . As a general design idea , It can be used in any existing network , It has strong practical significance . Then SKNet[4] The idea of weighting such channels and Inception The multi branch network structure in is combined , It also improves the performance .
The essence of channel attention mechanism , Is to model the importance of each feature , For different tasks, the characteristics can be assigned according to the input , Simple and effective .
The foregoing Dynamic Capacity Network From the spatial dimension Attention,SENet From the channel dimension Attention, Naturally, space can also be used at the same time Attention And channel Attention Mechanism .
CBAM(Convolutional Block Attention Module)[5] Is one of the representative networks , The structure is as follows :
In the direction of the passage Attention Modeling is about the importance of features , The structure is as follows :
Use the maximum at the same time pooling Sum mean pooling Algorithm , Then after a few MLP Layer to obtain the transformation result , Finally, it is applied to two channels , Use sigmoid Function to get the of the channel attention result .
In the direction of space Attention Modeling is the importance of spatial location , The structure is as follows :
First, reduce the dimension of the channel itself , Obtain the maximum pooling and mean pooling results respectively , Then it is spliced into a feature map , Then use a convolution layer to learn .
These two mechanisms , The importance of channel and space are studied respectively , It can also be easily embedded into any known framework .
besides , There are also many studies related to the mechanism of attention , Such as residual attention mechanism , Multiscale attention mechanism , Recursive attention mechanism, etc .
3 Attention Typical application scenarios of mechanism
In principle , Attention mechanism can improve the performance of the model in all computer vision tasks , But there are two types of scenarios that benefit in particular .
3.1 Fine grained classification
We know that the real problem in fine-grained classification tasks is how to locate local areas that are really useful for the task , The head of the bird in the above diagram .Attention The mechanism happens to be very suitable in principle , writing [1],[6] Attention mechanisms are used in , The improvement effect on the model is obvious .
3.2 Salient object detection / Thumbnail generation / Automatic composition
We're back to the beginning , you 're right ,Attention The essence of is important / Significant area positioning , So it is very useful in the field of target detection .
The above figure shows the results of several significant target detection , It can be seen that for graphs with significant goals , The probability map is very focused on the target subject , Add attention mechanism module to the network , The model of this kind of task can be further improved .
reference
[1] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]//Advances in neural information processing systems. 2015: 2017-2025.
[2] Almahairi A, Ballas N, Cooijmans T, et al. Dynamic capacity networks[C]//International Conference on Machine Learning. 2016: 2549-2558.
[3] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
[4] Li X, Wang W, Hu X, et al. Selective Kernel Networks[J]. 2019.
[5] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19.
[6] Fu J, Zheng H, Mei T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4438-4446.
reference :https://blog.csdn.net/qq_42722197/article/details/123039018
边栏推荐
- Stored procedures and functions (MySQL)
- The version control of 2021 version is missing. Handling method
- Jericho is in non Bluetooth mode. Do not jump back to Bluetooth mode when connecting the mobile phone [chapter]
- leetcode-02(链表题)
- Netperf and network performance measurement
- Install torch 0.4.1
- HMS Core 机器学习服务打造同传翻译新“声”态,AI让国际交流更顺畅
- 杰理之播内置 flash 提示音控制播放暂停【篇】
- input_delay
- Le tube MOS réalise le circuit de commutation automatique de l'alimentation principale et de l'alimentation auxiliaire, et la chute de tension "zéro", courant statique 20ua
猜你喜欢
mos管实现主副电源自动切换电路,并且“零”压降,静态电流20uA
函数重入、函数重载、函数重写自己理解
从0开始创建小程序
【基于 RT-Thread Studio的CPK-RA6M4 开发板环境搭建】
The solution of unable to create servlet file after idea restart
Significance and measures of source code confidentiality
知识图谱构建全流程
How to replace the backbone of the model
美国空军研究实验室《探索深度学习系统的脆弱性和稳健性》2022年最新85页技术报告
OC, OD, push-pull explanation of hardware
随机推荐
商城商品的知识图谱构建
从 1.5 开始搭建一个微服务框架——日志追踪 traceId
Jerry's question about DAC output power [chapter]
Codeforces Round #264 (Div. 2) C Gargari and Bishops 【暴力】
知识图谱构建全流程
How to replace the backbone of the model
Oracle connection pool is not used for a long time, and the connection fails
How to find file accessed / created just feed minutes ago
Stored procedures and functions (MySQL)
HDU ACM 4578 Transformation->段树-间隔的变化
如何替换模型的骨干网络(backbone)
Laravel php artisan 自动生成Model+Migrate+Controller 命令大全
Shell 编程基础
Matlab Error (Matrix dimensions must agree)
数学归纳与递归
Jerry's FM mode mono or stereo selection setting [chapter]
[cpk-ra6m4 development board environment construction based on RT thread studio]
mos管实现主副电源自动切换电路,并且“零”压降,静态电流20uA
leetcode
SQL Tuning Advisor一个错误ORA-00600: internal error code, arguments: [kesqsMakeBindValue:obj]