当前位置:网站首页>Principle of attention mechanism
Principle of attention mechanism
2022-07-07 03:22:00 【Master Ma】
Attention Mechanisms have been used in images in recent years , Important breakthroughs have been made in natural language processing and other fields , It is proved to be beneficial to improve the performance of the model .Attention The mechanism itself is also in line with the perception mechanism of the human brain and human eyes , This time we mainly take the field of computer vision as an example , about Attention The principle of mechanism , Application and model development .
1 Attention Mechanism and saliency diagram
1.1 What is the Attention Mechanism
So-called Attention Mechanism , That is, the mechanism of focusing on local information , For example, an image area in an image . As the task changes , Areas of attention tend to change .
Facing the picture above , If you just look at it as a whole , I only saw a lot of heads , But you can't look closer one by one , All talented scientists .
In fact, all the information in the picture except the face is useless , Can't do anything ,Attention The mechanism is to find the most useful information , You can imagine that the simplest scene is to detect faces from photos .
1.2 be based on Attention Significant target detection
One task that accompanies attention mechanism is salient target detection , namely salient object detection. Its input is a graph , The output is a probability map , The greater the probability , The greater the probability that the representative is an important target in the image , That is, the focus of human eyes , A typical saliency diagram is as follows :
The right figure is the salient figure of the left figure , In the head position, the probability is the greatest , And the legs , The tail also has a high probability , This is the really useful information in the figure .
Significant target detection requires a data set , The collection of such data sets is obtained by tracking the attention direction of multiple experimenters' eyeballs in a certain period of time , Typical steps are as follows :
(1) Let the subject observe the diagram .
(2) use eye tracker Record the eye's attention position .
(3) Gaussian filtering is used to synthesize the attention positions of all testers .
(4) The result is 0~1 The probability of .
So you can get the following picture , The second line is the result of eye tracking , The third line is the probability diagram of significant target .
The above is all about the spatial attention mechanism , That is, focus on different spatial locations , And in the CNN In structure , There are different characteristic channels , Therefore, different characteristic channels have similar principles , Let's talk about .
2 Attention Model architecture
The essence of attention mechanism is to locate the information of interest , Suppress useless information , The results are usually displayed in the form of probability map or probability eigenvector , In principle , It is mainly divided into spatial attention model , Channel attention model , Three kinds of spatial and channel mixed attention models , There is no distinction between soft and hard attention.
2.1 Spatial attention model (spatial attention)
Not all areas in the image contribute equally to the task , Only task related areas need to be concerned , For example, the main body of the classification task , Spatial attention model is to find the most important part of the network for processing .
Here we introduce two representative models , The first one is Google DeepMind Proposed STN The Internet (Spatial Transformer Network[1]). It learns the deformation of input , So as to complete the preprocessing operation suitable for the task , It's a space-based Attention Model , The network structure is as follows :
there Localization Net Used to generate affine transformation coefficients , Input is C×H×W The image of dimension , The output is a spatial transformation coefficient , Its size depends on the type of transformation to learn , If it's an affine transformation , It is a 6 Dimension vector .
The effect of such a network is shown in the figure below :
That is, locate the position of the target , Then perform operations such as rotation , Make the input sample easier to learn . This is a one-step adjustment solution , Of course, there are many iterative adjustment schemes
Compared with Spatial Transformer Networks One step to complete the target positioning and affine transformation adjustment ,Dynamic Capacity Networks[2] Two sub networks are used , They are low-performance subnetworks (coarse model) And high-performance subnetworks (fine model). Low performance subnetworks (coarse model) Used to process the whole picture , Locate the region of interest , As shown in the following figure fc. High performance subnetworks (fine model) Then the region of interest is refined , As shown in the following figure ff. Use both together , Lower computational cost and higher accuracy can be obtained .
Because in most cases, the region of interest is only a small part of the image , Therefore, the essence of spatial attention is to locate the target and make some changes or obtain weights .
2.2 Channel attention mechanism
For input 2 Dimensional image of CNN Come on , One dimension is the scale space of the image , Length and width , Another dimension is the channel , Therefore, channel based Attention It is also a very common mechanism .
SENet(Sequeeze and Excitation Net)[3] yes 2017 the ImageNet The champion network of classification competition , It is essentially a channel based Attention Model , It models the importance of each feature channel , Then enhance or suppress different channels for different tasks , Schematic diagram is as follows .
After normal convolution operation, a bypass branch is separated , First of all to Squeeze operation ( In the picture Fsq(·)), It compresses the spatial dimension , That is, each two-dimensional characteristic graph becomes a real number , It is equivalent to pooling operation with global receptive field , The number of characteristic channels remains unchanged .
And then there was Excitation operation ( That is... In the picture Fex(·)), It's through parameters w Generate weights for each feature channel ,w It is learned to explicitly model the correlation between feature channels . in an article , Used a 2 layer bottleneck structure ( First reduce the dimension and then increase the dimension ) The full connection layer of +Sigmoid Function to implement .
After getting the weight of each feature channel , This weight is applied to each original feature channel , Based on specific tasks , You can learn the importance of different channels .
The mechanism is applied to several benchmark models , With a small amount of calculation added , More significant performance improvement . As a general design idea , It can be used in any existing network , It has strong practical significance . Then SKNet[4] The idea of weighting such channels and Inception The multi branch network structure in is combined , It also improves the performance .
The essence of channel attention mechanism , Is to model the importance of each feature , For different tasks, the characteristics can be assigned according to the input , Simple and effective .
The foregoing Dynamic Capacity Network From the spatial dimension Attention,SENet From the channel dimension Attention, Naturally, space can also be used at the same time Attention And channel Attention Mechanism .
CBAM(Convolutional Block Attention Module)[5] Is one of the representative networks , The structure is as follows :
In the direction of the passage Attention Modeling is about the importance of features , The structure is as follows :
Use the maximum at the same time pooling Sum mean pooling Algorithm , Then after a few MLP Layer to obtain the transformation result , Finally, it is applied to two channels , Use sigmoid Function to get the of the channel attention result .
In the direction of space Attention Modeling is the importance of spatial location , The structure is as follows :
First, reduce the dimension of the channel itself , Obtain the maximum pooling and mean pooling results respectively , Then it is spliced into a feature map , Then use a convolution layer to learn .
These two mechanisms , The importance of channel and space are studied respectively , It can also be easily embedded into any known framework .
besides , There are also many studies related to the mechanism of attention , Such as residual attention mechanism , Multiscale attention mechanism , Recursive attention mechanism, etc .
3 Attention Typical application scenarios of mechanism
In principle , Attention mechanism can improve the performance of the model in all computer vision tasks , But there are two types of scenarios that benefit in particular .
3.1 Fine grained classification
We know that the real problem in fine-grained classification tasks is how to locate local areas that are really useful for the task , The head of the bird in the above diagram .Attention The mechanism happens to be very suitable in principle , writing [1],[6] Attention mechanisms are used in , The improvement effect on the model is obvious .
3.2 Salient object detection / Thumbnail generation / Automatic composition
We're back to the beginning , you 're right ,Attention The essence of is important / Significant area positioning , So it is very useful in the field of target detection .
The above figure shows the results of several significant target detection , It can be seen that for graphs with significant goals , The probability map is very focused on the target subject , Add attention mechanism module to the network , The model of this kind of task can be further improved .
reference
[1] Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks[C]//Advances in neural information processing systems. 2015: 2017-2025.
[2] Almahairi A, Ballas N, Cooijmans T, et al. Dynamic capacity networks[C]//International Conference on Machine Learning. 2016: 2549-2558.
[3] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
[4] Li X, Wang W, Hu X, et al. Selective Kernel Networks[J]. 2019.
[5] Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19.
[6] Fu J, Zheng H, Mei T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4438-4446.
reference :https://blog.csdn.net/qq_42722197/article/details/123039018
边栏推荐
- input_delay
- Variables, process control and cursors (MySQL)
- Shangsilicon Valley JVM Chapter 1 class loading subsystem
- Lingyun going to sea | yidiantianxia & Huawei cloud: promoting the globalization of Chinese e-commerce enterprise brands
- How does C language (string) delete a specified character in a string?
- Leetcode-02 (linked list question)
- 知识图谱构建全流程
- New benchmark! Intelligent social governance
- 杰理之FM 模式单声道或立体声选择设置【篇】
- Jerry's broadcast has built-in flash prompt tone to control playback pause [chapter]
猜你喜欢
编译常量、ClassLoader类、系统类加载器深度探析
2022.6.28
Starting from 1.5, build a micro Service Framework -- log tracking traceid
Laravel php artisan 自动生成Model+Migrate+Controller 命令大全
HMS Core 机器学习服务打造同传翻译新“声”态,AI让国际交流更顺畅
input_ delay
centerX: 用中国特色社会主义的方式打开centernet
mos管實現主副電源自動切換電路,並且“零”壓降,靜態電流20uA
硬件之OC、OD、推挽解释
「小样本深度学习图像识别」最新2022综述
随机推荐
Appx code signing Guide
Codeforces Round #264 (Div. 2) C Gargari and Bishops 【暴力】
VHDL实现任意大小矩阵乘法运算
Experience design details
[cpk-ra6m4 development board environment construction based on RT thread studio]
如何分析粉丝兴趣?
Jerry's phonebook acquisition [chapter]
从0开始创建小程序
Optimization of application startup speed
An error in SQL tuning advisor ora-00600: internal error code, arguments: [kesqsmakebindvalue:obj]
“零售为王”下的家电产业:什么是行业共识?
尚硅谷JVM-第一章 类加载子系统
oracle连接池长时间不使用连接失效问题
2022年信息安全工程师考试大纲
杰理之FM 模式单声道或立体声选择设置【篇】
函数重入、函数重载、函数重写自己理解
DOMContentLoaded和window.onload
Decoration design enterprise website management system source code (including mobile source code)
掘金量化:通过history方法获取数据,和新浪财经,雪球同用等比复权因子。不同于同花顺
Domcontentloaded and window onload