当前位置:网站首页>Pan for in-depth understanding of the attention mechanism in CV
Pan for in-depth understanding of the attention mechanism in CV
2022-07-03 18:37:00 【Strawberry sauce toast】
CV Medium Attention Mechanism summary ( 3、 ... and ):PAN
PAN: Pyramid Attention Network
Thesis link :《Pyramid Attention Network for Segmantic Segmentation》
One 、 Abstract
PAN Network structure :
Pyramid Attention Network (PAN) Combine attention mechanism with spatial pyramid to extract dense features more accurately , Improve the accuracy of semantic segmentation . The problem to be solved by this method (motivation) And major contributions include :
1.1 Put forward Feature Pyramid Attention(FPA) Module
- Motivation
The existence of objects at multiple scales cause difficulty in classification of categories.
The different sizes of objects in the same category increase the difficulty of classification , The existing algorithm uses dilated convolution (ASPP) Or pyramids (PSPNet) To increase the accuracy of classification , However, there are problems of grid effect and loss of pixel level location information respectively . - FPA module
FPA module performs spatial pyramid attention structure on high-level output and combine global pooling to learn a better feature representation. ( Including spatial pyramids and global convergence , The output acting on the deep network )
1.2 Put forward Global Attention Upsample(GAU) Module
- Motivation
high-level features are skilled in making category classification, while weak in restructuring original resolution binary prediction.
Deep network can extract more accurate category information , But the location information is lost , This is unfavorable for predicting the position of certain objects . - GAU module
GAU module acts on each decoder layer to provide global context as a guidance of low-level features to select category localization details, ( The feeling field of deep network is larger , Include more accurate category information , Shallow networks contain more accurate location information , The characteristics of the two can be combined to improve the segmentation effect .
Two 、 Module details
2.1 Feature Pyramid Attention

① Extracting local features (pixel-wise)
FPA Modules are used separately 3 × 3 , 5 × 5 , 7 × 7 3\times 3, 5\times 5, 7\times 7 3×3,5×5,7×7 The convolution kernel with three sizes further extracts the features contained in the feature graph , Then add and fuse the feature maps of the three sizes in turn ;
② Extract global features (channel-wise)
increase Global Pooling Access Rd , Extract channel features .( The idea here is similar SE modular , The difference is that only one layer of convolution is used ,SE Two layers were used )
2.2 Global Attention Upsample
The main idea :
The main character of decoder module is to repair category pixel localization. Furthermore, high-level features with abundant category information can be used to weight low-level information to select precise resolution details.
“ decode ” The purpose of is to get the pixels contained in each category , Achieve pixel level classification . The output of the deep network contains richer category information , The output of the shallow network contains more prepared location information , therefore , We can use the category features extracted from the deep network to guide the shallow network to achieve more accurate pixel level classification .
Implementation details :
- Use 3 × 3 3\times 3 3×3 The convolution of low-level The number of channels of the characteristic graph ;
- Use global average pooling extract high-level The overall information of , 1 × 1 1\times 1 1×1 Convolution plus BN Layer and ReLU Activation function ;
- high-level Output after up sampling and weighting low-level The output characteristic image of is added pixel by pixel .
3、 ... and 、PyTorch Code implementation
Reference resources :https://github.com/JaveyWang/Pyramid-Attention-Networks-pytorch
边栏推荐
- Typescript official website tutorial
- Torch learning notes (5) -- autograd
- How to analyze the rising and falling rules of London gold trend chart
- What is SQL get connection
- The number of incremental paths in the grid graph [dfs reverse path + memory dfs]
- JS_ Array_ sort
- Torch learning notes (2) -- 11 common operation modes of tensor
- Read the paper glodyne global topology preserving dynamic network embedding
- Database creation, addition, deletion, modification and query
- 网格图中递增路径的数目[dfs逆向路径+记忆dfs]
猜你喜欢
知其然,而知其所以然,JS 对象创建与继承【汇总梳理】

Three gradient descent methods and code implementation

Su embedded training - Day10

What kind of experience is it when the Institute earns 20000 yuan a month?

Okaleido, a multimedia NFT aggregation platform, is about to go online, and a new NFT era may come
![[Yu Yue education] theoretical mechanics reference materials of Shanghai Jiaotong University](/img/52/b97c618a8f2eb29ad0ccca221bb5c1.jpg)
[Yu Yue education] theoretical mechanics reference materials of Shanghai Jiaotong University

NFT新的契机,多媒体NFT聚合平台OKALEIDO即将上线

12、 Service management

2022-2028 global scar care product industry research and trend analysis report

How to expand the capacity of golang slice slice
随机推荐
[Yu Yue education] theoretical mechanics reference materials of Shanghai Jiaotong University
[combinatorics] generating function (generating function application scenario | using generating function to solve recursive equation)
Data analysis is popular on the Internet, and the full version of "Introduction to data science" is free to download
Opencv learning notes (continuously updated)
What does foo mean in programming?
图像24位深度转8位深度
Day-27 database
企业级自定义表单引擎解决方案(十二)--表单规则引擎2
多媒体NFT聚合平台OKALEIDO即将上线,全新的NFT时代或将来临
毕业总结
2022-2028 global sepsis treatment drug industry research and trend analysis report
[combinatorics] generating function (positive integer splitting | unordered | ordered | allowed repetition | not allowed repetition | unordered not repeated splitting | unordered repeated splitting)
Xception for deeplab v3+ (including super detailed code comments and original drawing of the paper)
This diversion
Three gradient descent methods and code implementation
Usage of laravel conditional array in
What London Silver Trading software supports multiple languages
There are several levels of personal income tax
Use of unsafe class
Golang string (string) and byte array ([]byte) are converted to each other
